- dynamic_cast - 2 Updates
- How to write wide char string literals? - 18 Updates
| Vir Campestris <vir.campestris@invalid.invalid>: Jul 01 09:14PM +0100 On 28/06/2021 07:44, Bonita Montero wrote: >> demon. > dynamic_cast<> is slow and mostly the things you do with > it could be done faster through a virtual function call. Mostly. I just did a line count on our codebase; we have about 1 dynamic cast for every 10k lines. Compare with static_cast - 1 in 300. (This is over several million lines of cpp files.) I don't know what all the calls are, but the pattern I am most familiar with is: This pointer from my database points to an interface. It might be a type 1 object, in which case do ONE(). It might be a type 2, in which case do TWO(). Andy |
| "Öö Tiib" <ootiib@hot.ee>: Jul 01 02:16PM -0700 On Thursday, 1 July 2021 at 23:14:24 UTC+3, Vir Campestris wrote: > with is: This pointer from my database points to an interface. It might > be a type 1 object, in which case do ONE(). It might be a type 2, in > which case do TWO(). In some circumstances one can use typeid check and static_cast or just dynamic_cast alone but making it with something else will result with more code and likely be less efficient too. |
| Juha Nieminen <nospam@thanks.invalid>: Jul 01 04:42AM > \x works in wide string literal too, and puts in a character with that > value. The difference is that if the wide string type isn't unicode > encoded then it might get the wrong character in the string. The problem is that "\xC2\xA9" in UTF-8 is not the same thing as "\xC2\xA9" in UTF-16 or UTF-32 (whichever wchar_t happens to be). "\uXXXX", however, ought to work regardless because it specifies the actual unicode codepoint you want, rather than its encoding. |
| Christian Gollwitzer <auriocus@gmx.de>: Jul 01 07:19AM +0200 Am 30.06.21 um 09:57 schrieb Juha Nieminen: > const wchar_t* str = L"???"; > In the *source code* that string literal may be eg. UTF-8 encoded. However, > the compiler needs to convert it to wide chars. I think it is best to avoid wide strings. Now that doesn't help you if you need them to call native Windows functions which insist on wchar_t. I'm still wondering why you need to put a Unicode string in the source code at all. Could you use an i18n feature of Windows to lookup the real string? I'm not an expert of i18n on Windows, but using GNU gettext, you would write some ASCII equivalent thing in the code and then have an auxiliary translation file with a well defined encoding. At runtime the ASCII string is merely a key into the table. Plus the added bonus that you can support multiple languages. Christian |
| Ralf Goertz <me@myprovider.invalid>: Jul 01 09:19AM +0200 Am Wed, 30 Jun 2021 12:37:55 +0200 > > are assumed to be utf16be. > UTF-16-files have a byte-header which helps the compiler to > distinguish ASCII-files and UTF-16-files. I know that. It's called a byte order mark. And gcc ignores it. |
| David Brown <david.brown@hesbynett.no>: Jul 01 10:29AM +0200 On 01/07/2021 07:19, Christian Gollwitzer wrote: > ASCII string is merely a key into the table. Plus the added bonus that > you can support multiple languages. > Christian Code can require non-ASCII characters without need internationalisation. gettext and the like are certainly useful, but they are very heavy tools compared to a fixed string or small table of strings in the code. If you are writing a program for use in single company in Germany (since you have a German email address), with all the texts in German, would you want to use internationalisation frameworks just to make "groß" turn out right? The OP could also be working on embedded systems or some other code for which having a single self-contained executable is important. |
| Juha Nieminen <nospam@thanks.invalid>: Jul 01 08:44AM > (since you have a German email address), with all the texts in German, > would you want to use internationalisation frameworks just to make > "groß" turn out right? There are also many situations where using non-ascii characters in string literals may not be related to language and internationalization. After all, Unicode contains loads of characters that are not related to spoken languages, such as math symbols, and lots of other types of symbols which are universal and don't require any sort of internationalization. Sometimes these symbols may be used all on their own, sometimes as part of text (eg. in labels and titles). Also, unit tests for code supporting Unicode may benefit from being able to use string literals with non-ascii characters. (Of course, as noted in other posts in this thread, there is a working solution to get around this, and it's the use of the \u escape character.) |
| Kli-Kla-Klawitter <kliklaklawitter69@gmail.com>: Jul 01 11:10AM +0200 Am 01.07.2021 um 09:19 schrieb Ralf Goertz: >> UTF-16-files have a byte-header which helps the compiler to >> distinguish ASCII-files and UTF-16-files. > I know that. It's called a byte order mark. And gcc ignores it. No, wrong - gcc honors it since vesion 1.01. |
| Ralf Goertz <me@myprovider.invalid>: Jul 01 11:36AM +0200 Am Thu, 1 Jul 2021 11:10:55 +0200 > >> distinguish ASCII-files and UTF-16-files. > > I know that. It's called a byte order mark. And gcc ignores it. > No, wrong - gcc honors it since vesion 1.01. I created this file b.cc: int main() { return 0; } using vi with :set fileencoding=utf16 :set bomb Then ~/c> file b.cc b.cc: C source, Unicode text, UTF-16, big-endian text or ~/c> od -h b.cc 0000000 fffe 6900 6e00 7400 2000 6d00 6100 6900 0000020 6e00 2800 2900 2000 7b00 0a00 2000 2000 0000040 2000 2000 7200 6500 7400 7500 7200 6e00 0000060 2000 3000 3b00 0a00 7d00 0a00 0000074 where you can see the BOM fffe. Feeding this to gcc (or g++) you get: ~/c> gcc b.cc b.cc:1:1: error: stray '\376' in program 1 | �� i n t m a i n ( ) { | ^ b.cc:1:2: error: stray '\377' in program 1 | �� i n t m a i n ( ) { | ^ b.cc:1:3: warning: null character(s) ignored 1 | �� i n t m a i n ( ) { | ^ b.cc:1:5: warning: null character(s) ignored etc. How does that qualify as "gcc honoring the BOM"? |
| David Brown <david.brown@hesbynett.no>: Jul 01 12:58PM +0200 On 01/07/2021 10:44, Juha Nieminen wrote: >> "groß" turn out right? > There are also many situations where using non-ascii characters in > string literals may not be related to language and internationalization. Good point. > (Of course, as noted in other posts in this thread, there is a > working solution to get around this, and it's the use of the \u > escape character.) Yes - but such workarounds are hideous compared to writing: printf("Temperature %.1f °C\n", 123.4); I am glad most of my code only has to compile with gcc, and I can ignore such portability matters. |
| "Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Jul 01 01:31PM +0200 On 30 Jun 2021 20:19, James Kuyper wrote: > UTF-16 and UTF-32, respectively, making octal escapes redundant with and > less convenient than the use of UCNs. But as he said, they do work for > such strings. You snipped some context, the example we're talking about. That does decidedly not work in the sense of producing the intended string. Perhaps I can make you understand this by talking about source code in general. Yes, that example code is valid C++, so a conforming compiler shall compile it with no errors; and yes, that code has a well defined meaning, look, here's the C++ standard, it spells it out, what the meaning is. But no, it doesn't do what you intended. - Alf |
| Christian Gollwitzer <auriocus@gmx.de>: Jul 01 02:01PM +0200 Am 01.07.21 um 10:29 schrieb David Brown: > (since you have a German email address), with all the texts in German, > would you want to use internationalisation frameworks just to make > "groß" turn out right? I can see your point, but actually, most programs developed here in Germany are still written in English. This is true for comments and variable names etc., because there might be a non-German coworker involved, and mostly because people are simply used to English as "the computer language". I've seen German comments, variable names and literal strings only at the university in introductory programming courses etc. But admittedly, I never produced GUIs in C++, because there are easier options available - and these usually come with good Unicode support (e.g. Python). These smaller tools did not get i18n'ed. I still think that if I had to make a program with a German interface, it would make sense to write it with English strings and translate it with a tool - because then, adding French or Turkish later on would be easy. > The OP could also be working on embedded systems or some other code for > which having a single self-contained executable is important. OK yes there are certainly points where this approach is not suitable. Just wanted to bring another solution to the table. Ceterum censeo wchar_t esse inutilem ;) Christian |
| David Brown <david.brown@hesbynett.no>: Jul 01 02:57PM +0200 On 01/07/2021 14:01, Christian Gollwitzer wrote: > computer language". I've seen German comments, variable names and > literal strings only at the university in introductory programming > courses etc. Sure. The same is true here in Norway. But it is not true everywhere. And even when you have English identifiers, comments, etc., the text strings you show to users are often in a language other than English. > But admittedly, I never produced GUIs in C++, because there > are easier options available - and these usually come with good Unicode > support (e.g. Python). These smaller tools did not get i18n'ed. I do that too. > it would make sense to write it with English strings and translate it > with a tool - because then, adding French or Turkish later on would be > easy. Most software is written for one or a few customers, and one language will always be sufficient. Of course, such software is also almost always written for one compiler and one target, and portability of source code is not an issue. This is particularly true if you have good modularisation in the code - the user-facing parts with the text strings are less likely to be re-used elsewhere than the more library-like code underneath. > OK yes there are certainly points where this approach is not suitable. > Just wanted to bring another solution to the table. > Ceterum censeo wchar_t esse inutilem ;) Agreed - and salt the ground it was built on, to save future generations from its curse! |
| Manfred <noname@add.invalid>: Jul 01 03:44PM +0200 On 7/1/2021 7:19 AM, Christian Gollwitzer wrote: > auxiliary translation file with a well defined encoding. At runtime the > ASCII string is merely a key into the table. Plus the added bonus that > you can support multiple languages. In Windows programming the need for wchar_t strings is relatively common, since this its native character set. Most APIs are provided with both ASCII and WCHAR variants, however if you need more-than-plain-ascii text support you almost invariably end up #define'ing UNICODE and thus default to the wide char variants - moreover some APIs are given for WCHAR strings only. In these cases if you need strings literals they are best expressed as WCHAR strings directly to avoid unnecessary conversions at runtime. |
| Kli-Kla-Klawitter <kliklaklawitter69@gmail.com>: Jul 01 04:17PM +0200 Am 01.07.2021 um 11:36 schrieb Ralf Goertz: > b.cc:1:5: warning: null character(s) ignored > etc. > How does that qualify as "gcc honoring the BOM"? Use v1.01. |
| James Kuyper <jameskuyper@alumni.caltech.edu>: Jul 01 10:58AM -0400 On 7/1/21 12:42 AM, Juha Nieminen wrote: >> encoded then it might get the wrong character in the string. > The problem is that "\xC2\xA9" in UTF-8 is not the same thing as > "\xC2\xA9" in UTF-16 or UTF-32 (whichever wchar_t happens to be). Why would you use wchar_t if you char about unicode? You should be using string literal using either the u8, u, or U prefixes, and store/access the strings as arrays of char, char16_t, or char32_t, respectively. Such literals are guaranteed to be in UTF-8, UTF-16, or UTF-32 encoding, respectively. |
| Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jul 01 10:13AM -0700 > Am 01.07.2021 um 11:36 schrieb Ralf Goertz: >> Am Thu, 1 Jul 2021 11:10:55 +0200 >> schrieb Kli-Kla-Klawitter <kliklaklawitter69@gmail.com>: [...] >>>> I know that. It's called a byte order mark. And gcc ignores it. >>> No, wrong - gcc honors it since vesion 1.01. >> I created this file b.cc: [...] >> How does that qualify as "gcc honoring the BOM"? > Use v1.01. You think you're funny. You're not. -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Philips void Void(void) { Void(); } /* The recursive call of the void */ |
| Kli-Kla-Klawitter <kliklaklawitter69@gmail.com>: Jul 01 07:27PM +0200 Am 01.07.2021 um 19:13 schrieb Keith Thompson: >>> How does that qualify as "gcc honoring the BOM"? >> Use v1.01. > You think you're funny. You're not. v1.01 does honor the BOM. |
| Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jul 01 11:21AM -0700 >>> Use v1.01. >> You think you're funny. You're not. > v1.01 does honor the BOM. At the risk of giving the impression I'm taking you seriously, the oldest version of gcc available from gnu.org is 1.42, released in 1992. I've see no evidence that gcc v1.01 would have honored the BOM, but it doesn't matter, since that version is obsolete and unavailable. I conclude that you are a troll. -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Philips void Void(void) { Void(); } /* The recursive call of the void */ |
| Real Troll <real.troll@trolls.com>: Jul 01 06:45PM On 01/07/2021 19:21, Keith Thompson wrote: > I conclude that you are a troll. It takes one troll to know another. This is an expert opinion of a Real Troll! |
| You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment