Friday, September 30, 2016

Digest for comp.lang.c++@googlegroups.com - 12 updates in 1 topic

Paavo Helde <myfirstname@osa.pri.ee>: Sep 30 09:02AM +0300

On 30.09.2016 0:48, Richard wrote:
 
> We've already been over this in this thread, you're just repeating
> yourself here without adding any new information.
 
Yes, I'm feeling so myself too!
 
Cheers
Paavo
Robert Wessel <robertwessel2@yahoo.com>: Sep 30 02:44AM -0500

On Thu, 29 Sep 2016 00:50:59 +0200, "Alf P. Steinbach"
>that it's really about each encoding value, including those that are
>just half of surrogate pairs. That's far fetched. But since standards
>use all kinds of weird meanings of words, it can't be just dismissed.
 
 
Early versions of the Unicode standard actually included "Unicode
character codes have a uniform width of 16 bits." as one of "The Ten
Unicode Design Principles" (quotes from my hardcopy Unicode 2.0
standard - circa 1996). While also defining the surrogate pair
mechanism. The ten principles have changed since then (and the 16-bit
character thing is no longer one of them). So were Unicode characters
16 bits in those days, even in the presence of (possible) surrogate
pairs? And how does that relate to the much later C tightening of the
definition of wchar_t?
"Öö Tiib" <ootiib@hot.ee>: Sep 30 04:08AM -0700

On Wednesday, 28 September 2016 03:42:45 UTC+3, Melzzzzz wrote:
 
> Whenever one needs performance raw pointers have to be used...
> eg implementing tree structure, or linked list for that matter...
> Imagine implementing AVL tree with smart pointers ;)
 
When someone implements AVL tree for performance reasons
then he most likely keeps the nodes in some other container
(like 'vector' or 'deque') for performance reasons and so uses
the things for indexing in that container (like
'std::deque<node>::iterator' or 'std::vector<node>::size_type')
instead of 'node*' for performance reasons. ;)
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Sep 30 06:24PM +0100

On 27/09/2016 03:59, Stefan Ram wrote:
> and benchmarksgame.alioth.debian.org, and commented: C++
> is popular, but not as popular as C, and it's fast, but not
> as fast as C.
 
Nonsense, C++ is faster than C. Why? Two reasons why:
 
* C is mostly a subset of C++ and the parts that aren't have no bearing
on performance.
* C++ alternatives are often faster than their C equivalents, for
example std::sort() is faster than qsort() when passed a functor which
allows comparisons to be inlined.
 
/Flibble
Tim Rentsch <txr@alumni.caltech.edu>: Sep 30 10:52AM -0700

> Windows, wchar_t encodes individual 16-bit elements of UTF-16
> "surrogate pairs" which cannot be considered as members of any
> character set, IMHO.
 
This question came up in the newsgroup here a couple weeks ago.
Alf was kind enough to post a question on stackoverflow, which
provided some interesting and useful reading. Let me cut to the
chase. I believe the Microsoft implementation (ie, MSVC) is
technically conforming. It boils down to what characters are in
"the execution wide-character set" (this from [lex.ccon p5] in
C++14, specifically N4296, but I think there is similar wording
in C++11 and C++03). So how does the implementation define "the
execution wide-character set"? At least one source of Microsoft
documentation says this (note the part after "except"):
 
"The set of available locale names, languages,
country/region codes, and code pages includes all those
supported by the Windows NLS API except code pages that
require more than two bytes per character [...]"
 
Here is the link for that:
 
https://msdn.microsoft.com/en-us/library/x99tb11d.aspx
 
which is documentation for setlocale(), which seems pretty
definitive.
 
My conclusion is that MSVC is technically conforming, but only
weaselly conforming. They wimped out by artifically defining the
set of characters officially supported to be only those whose
code points fit in two bytes, even though the compiler obviously
knows how to deal with code points that need surrogate pairs.
Technically, they are within their rights. But they deserve to
be taken to task for running roughshod over the spirit of what
the standards (both C and C++) obviously intend.
legalize+jeeves@mail.xmission.com (Richard): Sep 30 06:00PM

[Please do not mail me a copy of your followup]
 
Tim Rentsch <txr@alumni.caltech.edu> spake the secret code
>Technically, they are within their rights. But they deserve to
>be taken to task for running roughshod over the spirit of what
>the standards (both C and C++) obviously intend.
 
Thanks for that.
 
After reading the latest messages on this thread today, I wondered
what would be the effect of wchar_t being 32-bits on Windows. My
overall conclusion is that it would be worthless.
 
- All existing code using L"" literals or wchar_t would break.
- You can't pass 32-bit wchar_t's to any Win32 APIs, you'd have to
convert to UTF-16 first and/or use char16_t everywhere instead of
wchar_t.
- The compiler could only support sizeof(wchar_t)==4 as a non-default
compiler option due to the above breakage.
- I can't think of anything useful I would do with 32-bit wchar_t on
Windows. Can anyone provide an example? It has to be other than
internal manipulation of 32-bit Unicode characters which you can
already do with char32_t.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
Tim Rentsch <txr@alumni.caltech.edu>: Sep 30 11:09AM -0700

> 16 bits in those days, even in the presence of (possible) surrogate
> pairs? And how does that relate to the much later C tightening of the
> definition of wchar_t?
 
Was it that much later, or even later at all? I guess I should also ask
which tightening you are referring to. AFAIK wchar_t emerged in more or
less its current form in Amendment 1 in 1995, ie what is commonly called
C95, and has been more or less unchanged since then. Certainly wchar_t
is described in terms like those of the present day in drafts predating
C99. (In C11 the types char16_t and char32_t were added, but that seems
independent of wchar_t.)
Bo Persson <bop@gmb.dk>: Sep 30 08:23PM +0200

On 2016-09-30 19:52, Tim Rentsch wrote:
> Technically, they are within their rights. But they deserve to
> be taken to task for running roughshod over the spirit of what
> the standards (both C and C++) obviously intend.
 
What?!
 
The documentation states that you can only call the standard library
function setlocale with locales that the C and C++ standards allow.
 
The fact that the operating system happens to support additional locales
is hardly something to complain about.
 
 
Bo Persson
Paavo Helde <myfirstname@osa.pri.ee>: Oct 01 12:07AM +0300

On 30.09.2016 21:00, Richard wrote:
 
> Windows. Can anyone provide an example? It has to be other than
> internal manipulation of 32-bit Unicode characters which you can
> already do with char32_t.
 
The reality is that definition of wchar_t is platform-dependent so it
would be unwise to use it for anything intended to be portable. So the
only use I can see for wchar_t is in platform-specific code parts meant
for interfacing Windows SDK functions. char16_t is a different type so
one cannot use it for that purpose on Windows (without a lot of casting,
that is).
 
On Linux the system calls use char and UTF-8, so I have never had a need
for wchar_t there (or anything else than plain char, for that matter).
 
Cheers
Paavo
Paavo Helde <myfirstname@osa.pri.ee>: Oct 01 12:44AM +0300

On 1.10.2016 0:07, Paavo Helde wrote:
> that is).
 
> On Linux the system calls use char and UTF-8, so I have never had a need
> for wchar_t there (or anything else than plain char, for that matter).
 
I noticed that I actually did not answer the question, so in short: if
wchar_t changed to 32-bit in MSVC (with Windows SDK remaining 16-bit),
then this wchar_t would lose the single use case it had so far. OTOH, it
might now become easier to port other programs using wchar_t to Windows.
legalize+jeeves@mail.xmission.com (Richard): Sep 30 09:57PM

[Please do not mail me a copy of your followup]
 
Paavo Helde <myfirstname@osa.pri.ee> spake the secret code
>wchar_t changed to 32-bit in MSVC (with Windows SDK remaining 16-bit),
>then this wchar_t would lose the single use case it had so far. OTOH, it
>might now become easier to port other programs using wchar_t to Windows.
 
So, a huge negative impact on all existing code in return for the possible
benefit of porting some code.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
Robert Wessel <robertwessel2@yahoo.com>: Sep 30 05:32PM -0500

On Fri, 30 Sep 2016 11:09:20 -0700, Tim Rentsch
>is described in terms like those of the present day in drafts predating
>C99. (In C11 the types char16_t and char32_t were added, but that seems
>independent of wchar_t.)
 
 
I don't have a copy of the C95 TC, but:
 
http://www.lysator.liu.se/c/na1.html
 
describes Unicode characters as 16 bit. The surrogate pair mechanism
was actually introduced with Unicode 2.0, and Unicode was purely a
16-bit character set in the two earlier standard versions (and in the
preceding drafts).
 
But I don't know the wording in the TC. MS used a pre-standard
version of wchar_t (which they originally typedef'd to the [16-bit]
Windows type TCHAR), way back in 1993 (and well before that if you
count the pre-release versions of WinNT).
 
So we have:
 
- Circa 1991: Unicode 1.0
- Circa 1992/1993: Unicode 1.01 and 1.1, mainly add CJK stuff
- Circa 1993: WinNT ships with (16-bit only, what we'd now call UCS-2)
Unicode 1.x support, pre-standard (16-bit) wchar_t
- Circa 1993: UTF-8 proposal made public - I'm not clear on when it
was added as a standard encoding, but it was not in unicode 1.1
(c.1993, where a earlier scheme, FSS-UTF was described), and UTF-8 was
definitely in 2.0), but there was a minor revision between those two
("1.15", c.1995), which might have added it.
- Circa 1995: TC95 standardizes wchar_t
- Circa 1995: MS Ships Win95 (first "consumer" Win32 platform) with
UCS-2 support
- Circa 1996: Unicode 2.0 adds surrogate pair support, "breaking" the
"16-bit" nature of Unicode.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: