Paavo Helde <myfirstname@osa.pri.ee>: Sep 30 09:02AM +0300 On 30.09.2016 0:48, Richard wrote: > We've already been over this in this thread, you're just repeating > yourself here without adding any new information. Yes, I'm feeling so myself too! Cheers Paavo |
Robert Wessel <robertwessel2@yahoo.com>: Sep 30 02:44AM -0500 On Thu, 29 Sep 2016 00:50:59 +0200, "Alf P. Steinbach" >that it's really about each encoding value, including those that are >just half of surrogate pairs. That's far fetched. But since standards >use all kinds of weird meanings of words, it can't be just dismissed. Early versions of the Unicode standard actually included "Unicode character codes have a uniform width of 16 bits." as one of "The Ten Unicode Design Principles" (quotes from my hardcopy Unicode 2.0 standard - circa 1996). While also defining the surrogate pair mechanism. The ten principles have changed since then (and the 16-bit character thing is no longer one of them). So were Unicode characters 16 bits in those days, even in the presence of (possible) surrogate pairs? And how does that relate to the much later C tightening of the definition of wchar_t? |
"Öö Tiib" <ootiib@hot.ee>: Sep 30 04:08AM -0700 On Wednesday, 28 September 2016 03:42:45 UTC+3, Melzzzzz wrote: > Whenever one needs performance raw pointers have to be used... > eg implementing tree structure, or linked list for that matter... > Imagine implementing AVL tree with smart pointers ;) When someone implements AVL tree for performance reasons then he most likely keeps the nodes in some other container (like 'vector' or 'deque') for performance reasons and so uses the things for indexing in that container (like 'std::deque<node>::iterator' or 'std::vector<node>::size_type') instead of 'node*' for performance reasons. ;) |
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Sep 30 06:24PM +0100 On 27/09/2016 03:59, Stefan Ram wrote: > and benchmarksgame.alioth.debian.org, and commented: C++ > is popular, but not as popular as C, and it's fast, but not > as fast as C. Nonsense, C++ is faster than C. Why? Two reasons why: * C is mostly a subset of C++ and the parts that aren't have no bearing on performance. * C++ alternatives are often faster than their C equivalents, for example std::sort() is faster than qsort() when passed a functor which allows comparisons to be inlined. /Flibble |
Tim Rentsch <txr@alumni.caltech.edu>: Sep 30 10:52AM -0700 > Windows, wchar_t encodes individual 16-bit elements of UTF-16 > "surrogate pairs" which cannot be considered as members of any > character set, IMHO. This question came up in the newsgroup here a couple weeks ago. Alf was kind enough to post a question on stackoverflow, which provided some interesting and useful reading. Let me cut to the chase. I believe the Microsoft implementation (ie, MSVC) is technically conforming. It boils down to what characters are in "the execution wide-character set" (this from [lex.ccon p5] in C++14, specifically N4296, but I think there is similar wording in C++11 and C++03). So how does the implementation define "the execution wide-character set"? At least one source of Microsoft documentation says this (note the part after "except"): "The set of available locale names, languages, country/region codes, and code pages includes all those supported by the Windows NLS API except code pages that require more than two bytes per character [...]" Here is the link for that: https://msdn.microsoft.com/en-us/library/x99tb11d.aspx which is documentation for setlocale(), which seems pretty definitive. My conclusion is that MSVC is technically conforming, but only weaselly conforming. They wimped out by artifically defining the set of characters officially supported to be only those whose code points fit in two bytes, even though the compiler obviously knows how to deal with code points that need surrogate pairs. Technically, they are within their rights. But they deserve to be taken to task for running roughshod over the spirit of what the standards (both C and C++) obviously intend. |
legalize+jeeves@mail.xmission.com (Richard): Sep 30 06:00PM [Please do not mail me a copy of your followup] Tim Rentsch <txr@alumni.caltech.edu> spake the secret code >Technically, they are within their rights. But they deserve to >be taken to task for running roughshod over the spirit of what >the standards (both C and C++) obviously intend. Thanks for that. After reading the latest messages on this thread today, I wondered what would be the effect of wchar_t being 32-bits on Windows. My overall conclusion is that it would be worthless. - All existing code using L"" literals or wchar_t would break. - You can't pass 32-bit wchar_t's to any Win32 APIs, you'd have to convert to UTF-16 first and/or use char16_t everywhere instead of wchar_t. - The compiler could only support sizeof(wchar_t)==4 as a non-default compiler option due to the above breakage. - I can't think of anything useful I would do with 32-bit wchar_t on Windows. Can anyone provide an example? It has to be other than internal manipulation of 32-bit Unicode characters which you can already do with char32_t. -- "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline> The Computer Graphics Museum <http://computergraphicsmuseum.org> The Terminals Wiki <http://terminals.classiccmp.org> Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com> |
Tim Rentsch <txr@alumni.caltech.edu>: Sep 30 11:09AM -0700 > 16 bits in those days, even in the presence of (possible) surrogate > pairs? And how does that relate to the much later C tightening of the > definition of wchar_t? Was it that much later, or even later at all? I guess I should also ask which tightening you are referring to. AFAIK wchar_t emerged in more or less its current form in Amendment 1 in 1995, ie what is commonly called C95, and has been more or less unchanged since then. Certainly wchar_t is described in terms like those of the present day in drafts predating C99. (In C11 the types char16_t and char32_t were added, but that seems independent of wchar_t.) |
Bo Persson <bop@gmb.dk>: Sep 30 08:23PM +0200 On 2016-09-30 19:52, Tim Rentsch wrote: > Technically, they are within their rights. But they deserve to > be taken to task for running roughshod over the spirit of what > the standards (both C and C++) obviously intend. What?! The documentation states that you can only call the standard library function setlocale with locales that the C and C++ standards allow. The fact that the operating system happens to support additional locales is hardly something to complain about. Bo Persson |
Paavo Helde <myfirstname@osa.pri.ee>: Oct 01 12:07AM +0300 On 30.09.2016 21:00, Richard wrote: > Windows. Can anyone provide an example? It has to be other than > internal manipulation of 32-bit Unicode characters which you can > already do with char32_t. The reality is that definition of wchar_t is platform-dependent so it would be unwise to use it for anything intended to be portable. So the only use I can see for wchar_t is in platform-specific code parts meant for interfacing Windows SDK functions. char16_t is a different type so one cannot use it for that purpose on Windows (without a lot of casting, that is). On Linux the system calls use char and UTF-8, so I have never had a need for wchar_t there (or anything else than plain char, for that matter). Cheers Paavo |
Paavo Helde <myfirstname@osa.pri.ee>: Oct 01 12:44AM +0300 On 1.10.2016 0:07, Paavo Helde wrote: > that is). > On Linux the system calls use char and UTF-8, so I have never had a need > for wchar_t there (or anything else than plain char, for that matter). I noticed that I actually did not answer the question, so in short: if wchar_t changed to 32-bit in MSVC (with Windows SDK remaining 16-bit), then this wchar_t would lose the single use case it had so far. OTOH, it might now become easier to port other programs using wchar_t to Windows. |
legalize+jeeves@mail.xmission.com (Richard): Sep 30 09:57PM [Please do not mail me a copy of your followup] Paavo Helde <myfirstname@osa.pri.ee> spake the secret code >wchar_t changed to 32-bit in MSVC (with Windows SDK remaining 16-bit), >then this wchar_t would lose the single use case it had so far. OTOH, it >might now become easier to port other programs using wchar_t to Windows. So, a huge negative impact on all existing code in return for the possible benefit of porting some code. -- "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline> The Computer Graphics Museum <http://computergraphicsmuseum.org> The Terminals Wiki <http://terminals.classiccmp.org> Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com> |
Robert Wessel <robertwessel2@yahoo.com>: Sep 30 05:32PM -0500 On Fri, 30 Sep 2016 11:09:20 -0700, Tim Rentsch >is described in terms like those of the present day in drafts predating >C99. (In C11 the types char16_t and char32_t were added, but that seems >independent of wchar_t.) I don't have a copy of the C95 TC, but: http://www.lysator.liu.se/c/na1.html describes Unicode characters as 16 bit. The surrogate pair mechanism was actually introduced with Unicode 2.0, and Unicode was purely a 16-bit character set in the two earlier standard versions (and in the preceding drafts). But I don't know the wording in the TC. MS used a pre-standard version of wchar_t (which they originally typedef'd to the [16-bit] Windows type TCHAR), way back in 1993 (and well before that if you count the pre-release versions of WinNT). So we have: - Circa 1991: Unicode 1.0 - Circa 1992/1993: Unicode 1.01 and 1.1, mainly add CJK stuff - Circa 1993: WinNT ships with (16-bit only, what we'd now call UCS-2) Unicode 1.x support, pre-standard (16-bit) wchar_t - Circa 1993: UTF-8 proposal made public - I'm not clear on when it was added as a standard encoding, but it was not in unicode 1.1 (c.1993, where a earlier scheme, FSS-UTF was described), and UTF-8 was definitely in 2.0), but there was a minor revision between those two ("1.15", c.1995), which might have added it. - Circa 1995: TC95 standardizes wchar_t - Circa 1995: MS Ships Win95 (first "consumer" Win32 platform) with UCS-2 support - Circa 1996: Unicode 2.0 adds surrogate pair support, "breaking" the "16-bit" nature of Unicode. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment