soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

What is the best encoding (experiences...) for unicode? - 2 Updates
About programming.... - 3 Updates
Why all tutorials/books use non-unicode string? - 6 Updates
thread interruption points - 3 Updates
size of a variable - 5 Updates
mysterious destructors - 1 Update
OT: New, (post 20th century) M$ compiler question. - 1 Update
something happened to malloc? - 1 Update
mysterious destructors - 1 Update
compiler bug operator>> matching? - 2 Updates

What is the best encoding (experiences...) for unicode?

JiiPee <no@notvalid.com>: Feb 22 02:15PM

We already started talking about but I will start a new one as this is
a separate issue.

So what encoding you guys use? UTF-8 or UTF-16? What is the
recommendation and your experiences.
I read on the web and there was argument whether UTF-8 or UTF-16 was
better and they both had strong arguments. But seems like here people
prefer UTF-8? And can you please shortly tell how to practically use
UTF-8? Like how to get its length, find a certain character, how to
store it (well, i guess just a char [] array does the job, or even
std::string).

Does UTF-8 work with all normal string functions like find, replace etc.
If not, how do you deal with these and what needs to be done so they can
be used. Say I use Russian letters, how I practically find a certain
letter and use all the SDT functions/classes.

I am just quite new to this and trying to implement it to my first
projects really. So looking for direction.

People already gave instructions and I read them, but just asking if
there is more.

"Öö Tiib" <ootiib@hot.ee>: Feb 22 07:34AM -0800

On Sunday, 22 February 2015 16:15:28 UTC+2, JiiPee wrote:
> UTF-8? Like how to get its length, find a certain character, how to
> store it (well, i guess just a char [] array does the job, or even
> std::string).

On general case UTF-8 is superior since it:
1) is compatible with ASCII (ASCII text is subset of UTF-8)
2) does not have alignment issues (UTF-16 code point may need to be at even address)
3) does not have endianness issues (UTF-16 may be LE or BE)
4) fits into std::string (std::wstring is unspecified if it is UTF-16 or
UTF-32 or something else entirely)
5) is encoding of majority of internet text content

UTF-16 may be more convenient on Windows or with Qt. Still if significant
part of input or output goes in UTF-8 (I already mentioned internet) then
I would pick UTF-8 as internal representation for texts in your application.

> Does UTF-8 work with all normal string functions like find, replace etc.

You just have to accept that 'char' is a byte (not text character) and
'std::string' is continuous container of such bytes (specific encoding
of possible text in it is not guaranteed by it). When that is accepted
then everything works.

> If not, how do you deal with these and what needs to be done so they can
> be used. Say I use Russian letters, how I practically find a certain
> letter and use all the SDT functions/classes.

Not sure what you mean by "SDT". You have to make sure that when your
program receives some text from somewhere then it may need to be
converted to UTF-8 or at least checked if it *is* UTF-8 and when your
program outputs text to somewhere then it may need to be converted to
what is expected at other side (plus inevitable error handling). C++
itself offers too few and inconvenient methods for that so we typically
seek help for converting and checking from outside of C++ standard.

> projects really. So looking for direction.

> People already gave instructions and I read them, but just asking if
> there is more.

The other tricky thing you eventually stumble upon is that sometimes
people expect your program to ignore case of characters or to convert
to upper-case or to convert to lower case (or even to title case) and
how such things are done may be specific to local traditions.
Again the implementations of C++ tend to be quite unhelpful with it.

About programming....

"Osmium" <r124c4u102@comcast.net>: Feb 22 06:51AM -0600

"Wouter van Ooijen" wrote:

> If you expect me to follow a link to your wonderfull creations I'd first
> like a line or two that explains what it will do for me. (That also gives
> me a hint of how good you are at expressing yourself efficiently.)

Perhaps someone could compute and post the fog index for Ramine's post -
that would provide some insight as to just *how* wonderful he is..

"Öö Tiib" <ootiib@hot.ee>: Feb 22 05:39AM -0800

On Sunday, 22 February 2015 14:51:21 UTC+2, Osmium wrote:
> > me a hint of how good you are at expressing yourself efficiently.)

> Perhaps someone could compute and post the fog index for Ramine's post -
> that would provide some insight as to just *how* wonderful he is..

It is his Turbo Pascal code that he talks about. It is non-topical in
comp.lang.c++. It will never be clear why he posts it here. By posting
his rants he has already ramined number of Usenet groups (like comp.programming, comp.programming.threads, alt.comp.lang.borland-delphi,
comp.lang.pascal.misc etc.) into his personal blogs. So he is sort of
wonderful in ruining low traffic Usenet groups.

"Osmium" <r124c4u102@comcast.net>: Feb 22 07:53AM -0600

"嘱 Tiib" wrote:

> comp.programming, comp.programming.threads, alt.comp.lang.borland-delphi,
> comp.lang.pascal.misc etc.) into his personal blogs. So he is sort of
> wonderful in ruining low traffic Usenet groups.

I dropped comp.programming from my list of groups several months ago,
basically the only signal was mostly noise. Looking at the group today I see
that Raimine is a big poster there. The name meant nothing to me earlier
this morning.

Why all tutorials/books use non-unicode string?

jt@toerring.de (Jens Thoms Toerring): Feb 21 11:26PM

> > on) - but which does the really interesting bits of work that
> > make it something people may be motivated to pay for.

> I believe you are deranged.

Thank you;-) But I don't know how I deserve that distinction.
Would you care to elaborate a bit about were you consider me
to be completely wrong?
Regards, Jens

--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de

Richard Damon <Richard@Damon-Family.org>: Feb 21 06:32PM -0500

On 2/21/15 10:05 AM, JiiPee wrote:
> (1 byte)? Why use examples which are not used in real world? This I do
> not understand.

> And even top C++ people like Bjorn does that.

There are some subtle issues with using unicode, which if you are doing
a simple tutorial may not be important.

Things like what do you mean by the length() function, if you want to
know how much storage it take, it works just fine with Unicode. If you
want to know how many characters have been displayed, this is actually
quite tricky in Unicode (Even using UTF-32/UCS-4 doesn't save you as
there are combining code points to allow you to build some glyphs that
don't have an assigned code point).

It actually turns out that much code written for Ascii, will just work
for UTF-8 encoded unicode by following just a few basic guidelines
(things like you need to handle the high bit of the character set, which
might cause issues with signed char, and you need to manipulate strings
at "known" points so you don't break apart multi-byte sequences, and you
can't assume that N bytes are N characters).

UTF-16 is mostly just used in Microsoft environments, and actually is
mostly a mistake. They adopted it when Unicode thought 16 bits were
going to be "big enough", and when the changed their mind UTF-16 became
an awkward orphan, it normally takes more space than UTF-8, and you
still need to worry about multi-unit characters (only the exceptions are
much rarer so you might not catch the problem in testing). If I remember
right, UTF-16 might make sense in the case of some asian languages,
where most "characters" will take 1 unit (2 bytes) in UTF-16, but might
take 3 in UTF-8.

Kai Bojens <kb@kbojens.de>: Feb 22 12:42PM +0100

> And difficult to find a guidelines how to do it. So still searching
> (some say use UTF-8 , some UTF-16. but using UTF-8 in a code would make
> life difficult as many functions like lenght would not work).

A very good starting point:

https://github.com/boostcon/cppnow_presentations_2014/blob/master/files/unicode-cpp.pdf

JiiPee <no@notvalid.com>: Feb 22 12:11PM

On 22/02/2015 11:42, Kai Bojens wrote:
>> life difficult as many functions like lenght would not work).
> A very good starting point:

> https://github.com/boostcon/cppnow_presentations_2014/blob/master/files/unicode-cpp.pdf

yes, finally examples also (like how to add snowman to a char-string).
In many of these sites they have theories but no practical examples...
not really a good way to teach things. When I read C++ books the
examples tell me almost everything even without knowing the theory!!
Thats why I many times read first the examples and after that the theory
becouse then I also undertand the theory.
Thats what is really needed with these. But lets see if it has enough
examples.... it has some....

JiiPee <no@notvalid.com>: Feb 22 12:13PM

he seems to be from Finland... even that (me also) :)

On 22/02/2015 11:42, Kai Bojens wrote:

"Lőrinczy Zsigmond" <nospam@for.me>: Feb 22 01:52PM +0100

On 2015.02.21. 16:05, JiiPee wrote:> I am trying to learn how to use
unicode string.. its not so easy really.
> And difficult to find a guidelines how to do it. So still searching
> (some say use UTF-8 , some UTF-16. but using UTF-8 in a code would make
> life difficult as many functions like lenght would not work).

When using utf8, strlen does work, it returns the number of bytes;
mbslen returns the number of characters.

thread interruption points

Ian Collins <ian-news@hotmail.com>: Feb 22 05:40PM +1300

Melzzzzz wrote:

> Problem is that memory allocators (especially GC) tend to reserve huge
> amount of RAM, (not to mention forks) therefore overcommit and OOM
> killer....

Not all operating systems are foolhardy enough to allow memory over commit.

--
Ian Collins

Marcel Mueller <news.5.maazl@spamgourmet.org>: Feb 22 09:59AM +0100

On 21.02.15 22.41, Paavo Helde wrote:
> random fashion. Even if they are not failing, the computer is pretty much
> unusable anyway. Depending on the OS and running programs, a computer
> restart might be the best option to come out of the trashing mode. I am

You are right. I can confirm this. Win7 discards the disk cache on
suspend to disk. This is comparable to swapping after resume. About 1GB
of data is read with heavy disk activity in the first few minutes after
resume - probably in 4k blocks due to page faults. In this time the
system is almost unresponsive and random faults occur from time to time.
E.g. drivers that do no longer recognize there devices or program
windows can no longer rearrange their Z order (This can happen to any
window including simple explorer windows.). Of course, nothing bad
happens as long as you have only a few application windows open at
suspend and the cache is quite small. So the concept is well designed to
survive a feature presentation, no more no less. (What the hell came
over them when they decided to discard the cache.)

I once run into a similar problem on a Linux VM server too. I started
one VM too much and the memory got very low. It was impossible to get a
shell to suspend one of the VMs in a reasonable amount of time. So I
decided to prefer a hard reset.

> starting to think that turning the pagefile off completely might be the
> best approach.

Unfortunately you have to be careful here. Depending on the OS this
might have unwanted side effects. Some OS refuse overcommitment of
memory when there is absolutely no swap. Maybe some reliable operating
mode intended for cash terminals or something like that. This will
likely throw out of memory exceptions very soon when using ordinary
desktop applications.
Other OS simply ignore your configuration and create a temporary swap
file on the system volume in this case.

> That's why throw specifications is a deprecated antifeature.

Is it?

> functions with an empty throw clause should not call anything non-
> trivial, if they do there is a large problem between the keyboard and
> chair.

Well that's the old discussion whether to have checked exceptions or
not. Unfortunately when using generic functors or lambdas you have
almost no choice. You cannot reasonably use checked exceptions with
them, as it would require the throws declaration to be a type argument.
(Is this allowed at all?)

> the allocation. Alternatively, if the program itself is the memory hog,
> then it can probably release a lot of it by stack unwinding (in the
> correct stack!), then report a failure.

I think it always depend on the individual case. And the basic question
is simply who pays to cover all this cases. Probably no-one.

> Dynamic memory allocation can be handled relatively well in C++.

In the language: yes. In C++ libraries, well, it depends.

> overflow is a different beast altogether, there are no standard
> mechanisms for dealing with that and most program(mer)s just ignore the
> problem and hope they get lucky.

Indeed.

Marcel

Paavo Helde <myfirstname@osa.pri.ee>: Feb 22 04:16AM -0600

Marcel Mueller <news.5.maazl@spamgourmet.org> wrote in

> On 21.02.15 22.41, Paavo Helde wrote:

>> That's why throw specifications is a deprecated antifeature.

> Is it?

Yes. See Annex D (normative) - Compatibility features:

"D.4 Dynamic exception specifications [depr.except.spec]
The use of dynamic-exception-specifications is deprecated."

Instead, one should use the new C++11 'noexcept' specification.

For motivations, see: http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2010/n3051.html

Cheers
Paavo

size of a variable

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 16 11:20PM

On 16/02/2015 22:51, Christopher Pisz wrote:

> Or you're just trolling, since you can't seem to reply in the
> appropriate thread, much less quote someone when making false claims
> about what they said and didn't say.

False claims?

Evidence:

(It was actually uint8_t not uint16_t)

"Calling everything a uint8_t rather than an unsigned long long
accomplishes what in 2015?! "

"Is it because you want to remind yourself that an unsigned long long is
8 bytes? Is it guaranteed to be 8 bytes by the standard anyway?"

Now stfu and gtfo you little liar.

/Flibble

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 16 07:46PM

On 16/02/2015 19:15, Christopher Pisz wrote:

>> /Flibble

> Can you at least go argue your nonsensical points of view in the
> appropriate thread?

What is nonsense exactly? First you made the mistake of thinking
uint16_t was 16 bytes not 16 bits and then you posted a diatribe
compounding your mistake rather that admitting to it. Am I wrong?

/Flibble

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 16 07:04PM

On 16/02/2015 17:39, Christopher Pisz wrote:
>> typedefs? Progress.

>> /Flibble

> He isn't mixing and matching types here is he?

That's a yes then. Next time think before you post tons of bullshit.

/Flibble

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 16 10:49PM

On 16/02/2015 22:44, Christopher Pisz wrote:

> I said no such thing.

> Go quote me in the appropriate post and I will try my best to break it
> down to your level of understanding.

Either you have memory problems or you are a bald faced liar; either way
the evidence is there.

/Flibble

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 16 05:19PM

On 16/02/2015 16:30, Christopher Pisz wrote:
> instead. Or you can calculate it yourself if you want to use the raw
> array by multiplying the size of the type (uint8_t) by the number of
> elements.

So you've finally accepted that there is nothing wrong with the sized
typedefs? Progress.

/Flibble

mysterious destructors

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 18 12:55AM

On 18/02/2015 00:50, Christopher Pisz wrote:

> words: "Rick C. Hodgins", "Flibble"
> So, I won't be able to see or respond to any such messages
> -----

Mr Flibble is very cross.

OT: New, (post 20th century) M$ compiler question.

DSF <notavalid@address.here>: Feb 14 01:13AM -0500

On Fri, 13 Feb 2015 18:09:38 -0600, Christopher Pisz
>on my shelf, licensed for my lifetime. They are all out to squeeze more
>money out of Joe User. Businesses already subscribe to MSDN anyway and
>have access to whatever they like.

I *HATE* subscription models. Not only for the reasons you mention,
but also because you have no choice when it comes to upgrading. If
they've removed a feature you use often (but maybe most users don't),
or changed how every operation works because they've discovered a
"better" way - Tough Luck! You can't fall back on your previous
version because it doesn't exist.

>At any rate. If you are looking for a new version of visual studio, I'd
>say now is not the time to buy, as MSVS 2014 is very near to release.

As you noted in your follow-up post, the Community version is free.
Hence my disbelief that it is identical to the Pro version, save for
licensing.

I already have downloaded it, but I am having VirtualBox problems. I
will not install it into my main system. "C:" is not my programming
drive, "E:" is. The last time I installed a version of VS Express, I
set every path I could change to E:\etc... It installed about 20% of
itself on E: and the other 80% in C:\Windows and C:\Program Files. And
after uninstallation, it left so much of itself on C: that I wound up
deleting/reinstalling Windows to get rid of the bloat.

By the way, I guess that's another reason I've stuck with the
Borland compiler for so long. It stores very little in the Windows
directory and, because of that, is fairly portable.

Thanks again,
DSF
"'Later' is the beginning of what's not to be."
D.S. Fiscus

something happened to malloc?

Robert Wessel <robertwessel2@yahoo.com>: Feb 19 11:32AM -0600

On Wed, 18 Feb 2015 09:29:15 +0100, Torsten Mueller

>In my application I use also boost (1.57). And boost has an own cstdlib
>header. Could this be the reason?

>Has anyone had this problem in the last time, malloc is undefined?

Could you be seeing a namespace problem? While cstdlib mostly
includes stdlin.h, it does put most items into the std: namespace.
Perhaps the headers got tightened up recently to not put those
functions in both std: and the global namespace?

mysterious destructors

ram@zedat.fu-berlin.de (Stefan Ram): Feb 19 04:04AM

> o->print();
> o = std::unique_ptr<c>(new c( 2 )); /* overwrite */
> o->print(); }

Actually, I wanted to observe how C++ interprets an example
someone posted into the C newsgroup recently. Here is my
attempt of a translation into C++:

#include <iostream>
#include <ostream>
#include <memory>

struct c /* this struct is as above (as before) */
{ int v;
c( int const x ): v( x )
{ ::std::cout << "constructor of instance #" << v << ".\n"; }
~c(){ ::std::cout << "destructor of instance #" << v << ".\n"; }
void print(){ ::std::cout << "I am instance #" << v << ".\n"; }};

::std::unique_ptr< c >f( ::std::unique_ptr< c >* p )
{ *p = ::std::make_unique< c >( 2 ); /* <- sequence point! (semicolon) */
return ::std::make_unique< c >( 1 ); }

int main()
{ ::std::unique_ptr< c >o( f( &o )); o->print(); }

Does this program violate any rule of C++?

compiler bug operator>> matching?

"Norman J. Goldstein" <normvcr@telus.net>: Feb 14 09:23AM -0800

On 02/13/2015 09:35 PM, Pavel wrote:
> (in the OP's example specifically, 3 characters). I am curious about the real
> intent, too, though.

> -Pavel

operator>>( istream&,const char* ) succeeds if the extracted characters
exactly match the supplied const char*. Leading white space is ignored.
I find this a convenient way to help parse a file.

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Feb 14 12:35AM -0500

> your example program even to a string literal, which may
> reside in read-only memory)?

> Best regards, Jens

There can be different implementations of operator>>. For example, one might
want to use it to try to extract from the stream and throw away as many
characters as the length of the C string pointed to by the const char* parameter
(in the OP's example specifically, 3 characters). I am curious about the real
intent, too, though.

-Pavel

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Sunday, February 22, 2015

Digest for comp.lang.c++@googlegroups.com - 25 updates in 10 topics

No comments:

Blog Archive

About Me