- wstring_convert - 14 Updates
- What's up with "Large Scale C++" - 3 Updates
- [Jesus Loves You] Time is approaching - 5 Updates
- Adding a function to std::exception and company - 1 Update
- RCU... - 1 Update
"Öö Tiib" <ootiib@hot.ee>: Dec 22 05:19AM -0800 On Monday, 21 December 2020 at 21:14:55 UTC+2, Bonita Montero wrote: > What are the means to convert UTF-8-strings to u16string-s > with C++20. wstring_convert is deprecated. There are only platform- or library-specific means. Unicode standard is differently (in strict sense incorrectly in various ways) supported by platforms and libraries. In such world it is impossible to support it "correctly" so the C++ standard does not want to lie that C++ somehow does that. There is unicode.org for information about Unicode. |
Richard Damon <Richard@Damon-Family.org>: Dec 22 09:05AM -0500 On 12/21/20 2:14 PM, Bonita Montero wrote: > What are the means to convert UTF-8-strings to u16string-s > with C++20. wstring_convert is deprecated. Part of the issue is that wstring is not necessarily UTF-16 encoded, so it might not be the right choice. wstring might not even be 16 bits wide. Actually converting UTF-8 into UTF-16 isn't that hard to do, as it is a simple matter to extract the next UCS-4 code-point out of a UTF-8 string (a bit more complicated if you want to do all the suggested error checks for malformed UTF-8, but still not that hard), and converting the UCS-4 code-point into UTF-16 is even simpler (just check if it is BMP or not and write the value(s) out). Note that technically, you may want to use char16_t based string instead of wstring, as technically wstring should be based on char32_t, but for historical reasons it may still be 16 bits on Windows. |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 22 03:12PM +0100 > platforms and libraries. In such world it is impossible to support it > "correctly" so the C++ standard does not want to lie that C++ somehow > does that. There is unicode.org for information about Unicode. I'm not talking about Unicode but UTF-8. UTF-8 isn't Unicode but just a encoding. |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 22 03:14PM +0100 > Part of the issue is that wstring is not necessarily UTF-16 encoded, so > it might not be the right choice. wstring might not even be 16 bits wide. There's u16string and u32string which have UTF-16 or UTF-32 encoding. And I don't talk about a charset but en encoding, which is independent of a charset. > for malformed UTF-8, but still not that hard), and converting the UCS-4 > code-point into UTF-16 is even simpler (just check if it is BMP or not > and write the value(s) out). Nevertheless it would be nice to have this in the standard-library. |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 22 03:16PM +0100 Am 21.12.2020 um 20:14 schrieb Bonita Montero: |
Richard Damon <Richard@Damon-Family.org>: Dec 22 09:28AM -0500 On 12/22/20 9:14 AM, Bonita Montero wrote: >> code-point into UTF-16 is even simpler (just check if it is BMP or not >> and write the value(s) out). > Nevertheless it would be nice to have this in the standard-library. I think part of the issue is that despite the names, u16string is NOT required to be UTF-16 encoded, as it is just basic_string<char16_t>, and that is not required to use UTF-16. (In C there is a define __STDC_UTF_16__ to indicate that it is, which I don't see in my C++ standard, but C++ still uses words like if the native encoding is UTF-16) It looks like codecvt can be used to make the conversion IF the implementation uses UTF-16/UTF-32 for char16_t and char32_t. |
James Kuyper <jameskuyper@alumni.caltech.edu>: Dec 22 02:19PM -0500 On 12/22/20 9:28 AM, Richard Damon wrote: ... > I think part of the issue is that despite the names, u16string is NOT > required to be UTF-16 encoded, as it is just basic_string<char16_t>, and > that is not required to use UTF-16. Citation, please? A search of every ocurrance of "UTF-16" in the C++ standard leaves me with the impression that every function in the C++ standard library that has a specialization for char16_t that interprets objects of that type is required to interpret them as parts of a UTF-16 string. What did I miss? (In C there is a define > __STDC_UTF_16__ to indicate that it is, which I don't see in my C++ > standard, but C++ still uses words like if the native encoding is UTF-16) A search for "native encoding is" doesn't get any hits. The phrase "native encoding" occurs in only three places in the standard: 29.11.7.2.2p1: refers to the native encoding of "ordinary character strings" and "wide character strings", or char and wchar_t respectively. It says nothing about char16_t. D.23p4 talks about u8path(), which converts from utf8 encodings to the native encoding for filenames. I'm using n4860.pdf, 2020-03-31 as my reference. |
Richard Damon <Richard@Damon-Family.org>: Dec 22 02:41PM -0500 On 12/22/20 2:19 PM, James Kuyper wrote: > D.23p4 talks about u8path(), which converts from utf8 encodings to the > native encoding for filenames. > I'm using n4860.pdf, 2020-03-31 as my reference. Looking at the change log for C++20, one of the changes is: > guarantee that char16_t and char32_t literals are encoded as UTF-16 and UTF-32 respectively so this is a new requirement of the Standard. |
"daniel...@gmail.com" <danielaparker@gmail.com>: Dec 22 01:04PM -0800 On Tuesday, December 22, 2020 at 9:28:27 AM UTC-5, Richard Damon wrote: > It looks like codecvt can be used to make the conversion IF the > implementation uses UTF-16/UTF-32 for char16_t and char32_t. The entire header <codecvt> has been deprecated as of C++17. The std::codecvt template from <locale> hasn't been deprecated, but all the standard conversion facets have been. Daniel |
"Öö Tiib" <ootiib@hot.ee>: Dec 22 01:56PM -0800 On Tuesday, 22 December 2020 at 16:06:08 UTC+2, Richard Damon wrote: > for malformed UTF-8, but still not that hard), and converting the UCS-4 > code-point into UTF-16 is even simpler (just check if it is BMP or not > and write the value(s) out). Converting UTF-8 into UTF-16 is simple only if it is correct (in some manner of "correct") UTF-8. What to do when it is incorrect (in some sense of "incorrect")? Close the application? But it was "only" text, shame on you. |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 22 10:58PM +0100 > Converting UTF-8 into UTF-16 is simple only if it is correct (in some > manner of "correct") UTF-8. What to do when it is incorrect (in some sense > of "incorrect")? Close the application? But it was "only" text, shame on you. What do you mean with "incorrect" ? |
Richard Damon <Richard@Damon-Family.org>: Dec 22 05:30PM -0500 On 12/22/20 4:58 PM, Bonita Montero wrote: >> of "incorrect")? Close the application? But it was "only" text, shame >> on you. > What do you mean with "incorrect" ? A couple of quick things that can make a byte sequence not valid UTF-8: 1) The number of bytes the first bytes says the code-point will have doesn't match the number of bytes that it does have. 2) The first byte of the string isn't a valid first byte of a UTF-8 sequence. (like have a value between 0x80 and 0xBF, the values for subsequent bytes) 3) The byte sequences is NOT the minimal length for that value. Some variants of UTF-8 allow NUL to be encoded as 0xC0 0x00, but allowing others can allow for some possible exploits, and the standard says they should not be allowed. 4) A UTF-8 sequence that decodes to a value greater than 0x0010FFFF should be marked as invalid (and can't be converted to UTF-16) There is a code point U+FFFD (Replacement Character) reserved for this sort of error. |
"Öö Tiib" <ootiib@hot.ee>: Dec 22 02:33PM -0800 On Tuesday, 22 December 2020 at 23:58:30 UTC+2, Bonita Montero wrote: > > manner of "correct") UTF-8. What to do when it is incorrect (in some sense > > of "incorrect")? Close the application? But it was "only" text, shame on you. > What do you mean with "incorrect" ? Only subset of sequences of bytes is valid UTF-8 or valid UTF-16. Rest are invalid. With "incorrect" I meant invalid Unicode that is treated as valid by one or other library or platform. The details are not hard to find even Wikipedia mentions couple of such cases. |
"Öö Tiib" <ootiib@hot.ee>: Dec 22 02:43PM -0800 On Wednesday, 23 December 2020 at 00:30:50 UTC+2, Richard Damon wrote: > There is a code point U+FFFD (Replacement Character) reserved for this > sort of error. That character is used to make "incorrect unicode" produced in your product to look ugly in competitor's product that technically validates it correctly. |
"Christian Hanné" <the.hanne@gmail.com>: Dec 22 04:45PM +0100 You can get Volume 1 here: https://easyupload.io/s8tvow The archive-password is "fuck u". |
Brian Wood <woodbrian77@gmail.com>: Dec 22 11:30AM -0800 On Tuesday, December 22, 2020 at 9:46:16 AM UTC-6, Christian Hanné wrote: The author gave a talk about the book that is a legal and ethical way to get more info: https://duckduckgo.com/?q=cppcon+2020+lakos&page=1&adx=artexpa&sexp=%7B%22cdrexp%22%3A%22b%22%2C%22artexp%22%3A%22a%22%2C%22prodexp%22%3A%22b%22%2C%22prdsdexp%22%3A%22c%22%2C%22biaexp%22%3A%22b%22%2C%22msvrtexp%22%3A%22b%22%2C%22bltexp%22%3A%22b%22%7D&iax=videos&ia=videos&iai=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dd3zMfMC8l5U |
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Dec 22 12:22PM -0800 > on writing very large C++ programs would be gladly accepted. > If the project I am trying to sell goes ahead, it could easily > become very large. On amazon.com, I see: Large-Scale C++ Volume I: Process and Architecture Dec 17, 2019 Large-Scale C++ Volume II: Design and Implementation Mar 14, 2021 -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Philips Healthcare void Void(void) { Void(); } /* The recursive call of the void */ |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 21 08:57PM -0800 On 12/21/2020 9:04 AM, Jorgen Grahn wrote: >> [Jesus Loves You] ? > He did. But Rick doesn't decide policy here. When you're posting in > those threads, you're just as rude and disruptive as he is. ;^( |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 21 08:58PM -0800 On 12/19/2020 3:37 PM, Mr Flibble wrote: >> consider what real loss is: > [snip - tl;dr] > And Satan invented fossils, yes? certain fossils, lol. Finding fossils under the once topical like climate in the poles, as once was? |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 21 10:34PM -0800 On 12/18/2020 4:43 PM, Rick C. Hodgin wrote: > is running out for all of us -- saved or otherwise. > Peace, my friend. Tell your family and friends about Jesus. Spread the > word. Let His call be heard from your mouth unto others. https://youtu.be/SIxi2uqFVmc |
seeplus <boardmounrt@gmail.com>: Dec 22 03:20AM -0800 On Saturday, December 19, 2020 at 11:43:25 AM UTC+11, Rick C. Hodgin wrote: > My friends, the time of the rapture is upon us. This week, next week, > this month, next month, this year, next year ... it's here. Damn. This looks like you will be gone by March, maybe sooner. Could you please leave me your stuff ASAP! If you really do not think that this is going to happen, then just don't bother handing it over. Will take it that you are not a true believer. Just quote some nonsense bible message to give you an out. |
Mr Flibble <flibble@i42.REMOVETHISBIT.co.uk>: Dec 22 07:54PM On 22/12/2020 11:20, seeplus wrote: > If you really do not think that this is going to happen, then just don't bother handing it over. > Will take it that you are not a true believer. > Just quote some nonsense bible message to give you an out. Yes, Hodgin, donate all your money to charity now as you won't be needing it because the rapture is definitely going to happen! A good charity to donate all your money to would be your local natural history museum: I hear they have a good collection of fossils that definitely aren't an invention of Satan! Dweeb. /Flibble -- 😎 |
"Öö Tiib" <ootiib@hot.ee>: Dec 22 05:06AM -0800 > OK, but I don't think I claimed some big benefit. I'm not > enthralled with 2020 C++. I mention this in the hopes > it could be included in the next standard. My point is that there are no reasons to. The cases where likes of string_view are more efficient than char* are there but this is not one. Same about cases where it is safer. The char* version can't be removed even if string_view is better. So we would have two virtual functions side-by-side that do same thing. > > not allow compilers to remove families of functions from virtual > > tables as rest of the calls remain virtual in meaningful program. > Are you saying LTO is not relevant to meaningful programs? No. Your trying to mix static exceptions and LTO into discussion is just red herring. LTO and static exceptions can find local optimisation opportunity here or there. In what percentage of meaningful programs an LTO can prove that what() of *whole* std::exception family is *never* called virtually? About half of code of every meaningful program is error handling so it is close to zero. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 21 10:08PM -0800 For no kernel guys: https://youtu.be/XrW5yerbAog |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment