soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

dynamic_cast - 2 Updates
How to write wide char string literals? - 18 Updates

Vir Campestris <vir.campestris@invalid.invalid>: Jul 01 09:14PM +0100

On 28/06/2021 07:44, Bonita Montero wrote:
>> demon.

> dynamic_cast<> is slow and mostly the things you do with
> it could be done faster through a virtual function call.

Mostly.

I just did a line count on our codebase; we have about 1 dynamic cast
for every 10k lines. Compare with static_cast - 1 in 300.

(This is over several million lines of cpp files.)

I don't know what all the calls are, but the pattern I am most familiar
with is: This pointer from my database points to an interface. It might
be a type 1 object, in which case do ONE(). It might be a type 2, in
which case do TWO().

Andy

"Öö Tiib" <ootiib@hot.ee>: Jul 01 02:16PM -0700

On Thursday, 1 July 2021 at 23:14:24 UTC+3, Vir Campestris wrote:
> with is: This pointer from my database points to an interface. It might
> be a type 1 object, in which case do ONE(). It might be a type 2, in
> which case do TWO().

In some circumstances one can use typeid check and static_cast
or just dynamic_cast alone but making it with something else
will result with more code and likely be less efficient too.

How to write wide char string literals?

Juha Nieminen <nospam@thanks.invalid>: Jul 01 04:42AM

> \x works in wide string literal too, and puts in a character with that
> value. The difference is that if the wide string type isn't unicode
> encoded then it might get the wrong character in the string.

The problem is that "\xC2\xA9" in UTF-8 is not the same thing as
"\xC2\xA9" in UTF-16 or UTF-32 (whichever wchar_t happens to be).

"\uXXXX", however, ought to work regardless because it specifies the
actual unicode codepoint you want, rather than its encoding.

Christian Gollwitzer <auriocus@gmx.de>: Jul 01 07:19AM +0200

Am 30.06.21 um 09:57 schrieb Juha Nieminen:

> const wchar_t* str = L"???";

> In the *source code* that string literal may be eg. UTF-8 encoded. However,
> the compiler needs to convert it to wide chars.

I think it is best to avoid wide strings. Now that doesn't help you if
you need them to call native Windows functions which insist on wchar_t.

I'm still wondering why you need to put a Unicode string in the source
code at all. Could you use an i18n feature of Windows to lookup the real
string? I'm not an expert of i18n on Windows, but using GNU gettext, you
would write some ASCII equivalent thing in the code and then have an
auxiliary translation file with a well defined encoding. At runtime the
ASCII string is merely a key into the table. Plus the added bonus that
you can support multiple languages.

Christian

Ralf Goertz <me@myprovider.invalid>: Jul 01 09:19AM +0200

Am Wed, 30 Jun 2021 12:37:55 +0200
> > are assumed to be utf16be.

> UTF-16-files have a byte-header which helps the compiler to
> distinguish ASCII-files and UTF-16-files.

I know that. It's called a byte order mark. And gcc ignores it.

David Brown <david.brown@hesbynett.no>: Jul 01 10:29AM +0200

On 01/07/2021 07:19, Christian Gollwitzer wrote:
> ASCII string is merely a key into the table. Plus the added bonus that
> you can support multiple languages.

> Christian

Code can require non-ASCII characters without need internationalisation.
gettext and the like are certainly useful, but they are very heavy
tools compared to a fixed string or small table of strings in the code.
If you are writing a program for use in single company in Germany
(since you have a German email address), with all the texts in German,
would you want to use internationalisation frameworks just to make
"groß" turn out right?

The OP could also be working on embedded systems or some other code for
which having a single self-contained executable is important.

Juha Nieminen <nospam@thanks.invalid>: Jul 01 08:44AM

> (since you have a German email address), with all the texts in German,
> would you want to use internationalisation frameworks just to make
> "groß" turn out right?

There are also many situations where using non-ascii characters in
string literals may not be related to language and internationalization.
After all, Unicode contains loads of characters that are not related
to spoken languages, such as math symbols, and lots of other types
of symbols which are universal and don't require any sort of
internationalization. Sometimes these symbols may be used all on
their own, sometimes as part of text (eg. in labels and titles).

Also, unit tests for code supporting Unicode may benefit from
being able to use string literals with non-ascii characters.
(Of course, as noted in other posts in this thread, there is a
working solution to get around this, and it's the use of the \u
escape character.)

Kli-Kla-Klawitter <kliklaklawitter69@gmail.com>: Jul 01 11:10AM +0200

Am 01.07.2021 um 09:19 schrieb Ralf Goertz:

>> UTF-16-files have a byte-header which helps the compiler to
>> distinguish ASCII-files and UTF-16-files.

> I know that. It's called a byte order mark. And gcc ignores it.

No, wrong - gcc honors it since vesion 1.01.

Ralf Goertz <me@myprovider.invalid>: Jul 01 11:36AM +0200

Am Thu, 1 Jul 2021 11:10:55 +0200
> >> distinguish ASCII-files and UTF-16-files.

> > I know that. It's called a byte order mark. And gcc ignores it.

> No, wrong - gcc honors it since vesion 1.01.

I created this file b.cc:

int main() {
return 0;
}

using vi with

:set fileencoding=utf16
:set bomb

Then

~/c> file b.cc
b.cc: C source, Unicode text, UTF-16, big-endian text

or

~/c> od -h b.cc
0000000 fffe 6900 6e00 7400 2000 6d00 6100 6900
0000020 6e00 2800 2900 2000 7b00 0a00 2000 2000
0000040 2000 2000 7200 6500 7400 7500 7200 6e00
0000060 2000 3000 3b00 0a00 7d00 0a00
0000074

where you can see the BOM fffe. Feeding this to gcc (or g++) you get:

~/c> gcc b.cc
b.cc:1:1: error: stray '\376' in program
1 | �� i n t m a i n ( ) {
| ^
b.cc:1:2: error: stray '\377' in program
1 | �� i n t m a i n ( ) {
| ^
b.cc:1:3: warning: null character(s) ignored
1 | �� i n t m a i n ( ) {
| ^
b.cc:1:5: warning: null character(s) ignored

etc.

How does that qualify as "gcc honoring the BOM"?

David Brown <david.brown@hesbynett.no>: Jul 01 12:58PM +0200

On 01/07/2021 10:44, Juha Nieminen wrote:
>> "groß" turn out right?

> There are also many situations where using non-ascii characters in
> string literals may not be related to language and internationalization.

Good point.

> (Of course, as noted in other posts in this thread, there is a
> working solution to get around this, and it's the use of the \u
> escape character.)

Yes - but such workarounds are hideous compared to writing:

printf("Temperature %.1f °C\n", 123.4);

I am glad most of my code only has to compile with gcc, and I can ignore
such portability matters.

"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Jul 01 01:31PM +0200

On 30 Jun 2021 20:19, James Kuyper wrote:
> UTF-16 and UTF-32, respectively, making octal escapes redundant with and
> less convenient than the use of UCNs. But as he said, they do work for
> such strings.

You snipped some context, the example we're talking about.

That does decidedly not work in the sense of producing the intended string.

Perhaps I can make you understand this by talking about source code in
general. Yes, that example code is valid C++, so a conforming compiler
shall compile it with no errors; and yes, that code has a well defined
meaning, look, here's the C++ standard, it spells it out, what the
meaning is. But no, it doesn't do what you intended.

- Alf

Christian Gollwitzer <auriocus@gmx.de>: Jul 01 02:01PM +0200

Am 01.07.21 um 10:29 schrieb David Brown:
> (since you have a German email address), with all the texts in German,
> would you want to use internationalisation frameworks just to make
> "groß" turn out right?

I can see your point, but actually, most programs developed here in
Germany are still written in English. This is true for comments and
variable names etc., because there might be a non-German coworker
involved, and mostly because people are simply used to English as "the
computer language". I've seen German comments, variable names and
literal strings only at the university in introductory programming
courses etc. But admittedly, I never produced GUIs in C++, because there
are easier options available - and these usually come with good Unicode
support (e.g. Python). These smaller tools did not get i18n'ed.
I still think that if I had to make a program with a German interface,
it would make sense to write it with English strings and translate it
with a tool - because then, adding French or Turkish later on would be
easy.

> The OP could also be working on embedded systems or some other code for
> which having a single self-contained executable is important.

OK yes there are certainly points where this approach is not suitable.
Just wanted to bring another solution to the table.

Ceterum censeo wchar_t esse inutilem ;)

Christian

David Brown <david.brown@hesbynett.no>: Jul 01 02:57PM +0200

On 01/07/2021 14:01, Christian Gollwitzer wrote:
> computer language". I've seen German comments, variable names and
> literal strings only at the university in introductory programming
> courses etc.

Sure. The same is true here in Norway. But it is not true everywhere.
And even when you have English identifiers, comments, etc., the text
strings you show to users are often in a language other than English.

> But admittedly, I never produced GUIs in C++, because there
> are easier options available - and these usually come with good Unicode
> support (e.g. Python). These smaller tools did not get i18n'ed.

I do that too.

> it would make sense to write it with English strings and translate it
> with a tool - because then, adding French or Turkish later on would be
> easy.

Most software is written for one or a few customers, and one language
will always be sufficient. Of course, such software is also almost
always written for one compiler and one target, and portability of
source code is not an issue. This is particularly true if you have good
modularisation in the code - the user-facing parts with the text strings
are less likely to be re-used elsewhere than the more library-like code
underneath.

> OK yes there are certainly points where this approach is not suitable.
> Just wanted to bring another solution to the table.

> Ceterum censeo wchar_t esse inutilem ;)

Agreed - and salt the ground it was built on, to save future generations
from its curse!

Manfred <noname@add.invalid>: Jul 01 03:44PM +0200

On 7/1/2021 7:19 AM, Christian Gollwitzer wrote:
> auxiliary translation file with a well defined encoding. At runtime the
> ASCII string is merely a key into the table. Plus the added bonus that
> you can support multiple languages.

In Windows programming the need for wchar_t strings is relatively
common, since this its native character set. Most APIs are provided with
both ASCII and WCHAR variants, however if you need more-than-plain-ascii
text support you almost invariably end up #define'ing UNICODE and thus
default to the wide char variants - moreover some APIs are given for
WCHAR strings only.
In these cases if you need strings literals they are best expressed as
WCHAR strings directly to avoid unnecessary conversions at runtime.

Kli-Kla-Klawitter <kliklaklawitter69@gmail.com>: Jul 01 04:17PM +0200

Am 01.07.2021 um 11:36 schrieb Ralf Goertz:
> b.cc:1:5: warning: null character(s) ignored

> etc.

> How does that qualify as "gcc honoring the BOM"?

Use v1.01.

James Kuyper <jameskuyper@alumni.caltech.edu>: Jul 01 10:58AM -0400

On 7/1/21 12:42 AM, Juha Nieminen wrote:
>> encoded then it might get the wrong character in the string.

> The problem is that "\xC2\xA9" in UTF-8 is not the same thing as
> "\xC2\xA9" in UTF-16 or UTF-32 (whichever wchar_t happens to be).

Why would you use wchar_t if you char about unicode? You should be using
string literal using either the u8, u, or U prefixes, and store/access
the strings as arrays of char, char16_t, or char32_t, respectively. Such
literals are guaranteed to be in UTF-8, UTF-16, or UTF-32 encoding,
respectively.

Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jul 01 10:13AM -0700

> Am 01.07.2021 um 11:36 schrieb Ralf Goertz:
>> Am Thu, 1 Jul 2021 11:10:55 +0200
>> schrieb Kli-Kla-Klawitter <kliklaklawitter69@gmail.com>:
[...]
>>>> I know that. It's called a byte order mark. And gcc ignores it.

>>> No, wrong - gcc honors it since vesion 1.01.
>> I created this file b.cc:
[...]
>> How does that qualify as "gcc honoring the BOM"?

> Use v1.01.

You think you're funny. You're not.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Kli-Kla-Klawitter <kliklaklawitter69@gmail.com>: Jul 01 07:27PM +0200

Am 01.07.2021 um 19:13 schrieb Keith Thompson:
>>> How does that qualify as "gcc honoring the BOM"?

>> Use v1.01.

> You think you're funny. You're not.

v1.01 does honor the BOM.

Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jul 01 11:21AM -0700

>>> Use v1.01.
>> You think you're funny. You're not.

> v1.01 does honor the BOM.

At the risk of giving the impression I'm taking you seriously, the
oldest version of gcc available from gnu.org is 1.42, released in 1992.

I've see no evidence that gcc v1.01 would have honored the BOM, but it
doesn't matter, since that version is obsolete and unavailable.

I conclude that you are a troll.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Real Troll <real.troll@trolls.com>: Jul 01 06:45PM

On 01/07/2021 19:21, Keith Thompson wrote:

> I conclude that you are a troll.

It takes one troll to know another. This is an expert opinion of a Real Troll!

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Thursday, July 1, 2021

Digest for comp.lang.c++@googlegroups.com - 20 updates in 2 topics

No comments:

Blog Archive

About Me