soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

legal UTF-8 characters in Identifiers ? - 10 Updates
How to refer to my base class object from derived (without cast?) - 5 Updates
Would this be a good idea? - 3 Updates

"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Mar 29 01:42PM +0200

On 2023-03-27 7:59 PM, Jason Vas Dias wrote:
> 8707 2203 (1 1) ∃ 'THERE EXISTS'
> 8708 2204 (1 1) ∄ 'THERE DOES NOT EXIST'

> Why can't I use these characters in identifiers ?

Unless changed in the most recent versions, g++ directly only allows
ASCII in identifiers, though it does support Unicode escapes (formally
universal character names) that denote non-ASCII characters.

The g++ behavior is -- or was -- essentially the rules of C, instead
of the rules of C++.

I'm being careful talking about versions because others have posted
commentary indicating that the g++ compiler they use, supports non-ASCII
characters in identifiers. I'm pretty sure they're wrong about that. But
there is the possibility of such support having been added recently.

> Would there be any support for submitting an RFC to the standards groups
> to allow implementations to have some sort of local UTF-8 character
> white-list policy ?

That discussion is now on mailing-lists. Originally it was in the now
defunct Usenet group comp.std.c++; then it was moved to Google groups;
then to mailing lists. Modulo possible changes (like, moved again) you
can find those lists at <url: http://isocpp.org>, somewhere.

> ie. I'd like to develop extensions for GCC and Clang that will allow them
> to load a UTF-8 character White-List file , then these characters would be
> allowed regardless of what the standards say about valid identifier characters.

Well the one I sorely miss is the up-arrow, ↑, to denote exponentiation.

But then, even if it was allowed, the usual technique of `%OP%` for
pseudo-operator yields precedence like % (remainder) which is not ideal.

Instead C++ should get support for exponentiation operator, possibly
`**` like Python.

> Next step is getting a complete dump of what GCC and Clang consider to
> be valid UTF-8 identifier characters, which is not as straightforward as it
> should be.

g++ is presumably dead easy; see above. :(

> I'd most appreciate any helpful advice / informative comments.

- Alf

David Brown <david.brown@hesbynett.no>: Mar 29 02:23PM +0200

On 29/03/2023 13:42, Alf P. Steinbach wrote:

> Unless changed in the most recent versions, g++ directly only allows
> ASCII in identifiers, though it does support Unicode escapes (formally
> universal character names) that denote non-ASCII characters.

It changed with gcc 10, which is about 3 years old. (I don't know if
you consider that "recent" or not.)

> The g++ behavior is -- or was -- essentially the rules of C, instead
> of the rules of C++.

C and C++ are, AFAIK, basically aligned on this - C has supported
"universal character names" since C99, and C++ had it in C++98.
Different compilers have had different levels of support, with clang
being relatively early in having full UTF-8 input support while gcc only
supported escape sequences (for C and C++) until gcc 10.

> commentary indicating that the g++ compiler they use, supports non-ASCII
> characters in identifiers. I'm pretty sure they're wrong about that. But
> there is the possibility of such support having been added recently.

Version 10 is the magic number.

> pseudo-operator yields precedence like % (remainder) which is not ideal.

> Instead C++ should get support for exponentiation operator, possibly
> `**` like Python.

That's a feature many have called for, in C and C++, spanning decades.

>> straightforward as it
>> should be.

> g++ is presumably dead easy; see above. :(

It's not particularly easy. GCC adds $ to the ASCII characters it
accepts as letters in identifiers (for most targets - there are some
targets for which $ is critically significant for the generated
assembly). Other than that, it follows the standard, with letters
defined in the XID_Start and XID_Continue classes in Unicode:

<https://www.unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers>

Muttley@dastardlyhq.com: Mar 29 01:38PM

On Wed, 29 Mar 2023 13:42:15 +0200
>Instead C++ should get support for exponentiation operator, possibly
>`**` like Python.

A pity richie/kernigan decided to use ^ for xor. I guess they wanted to keep
the bitwise operators as a single char but $ or @ would have been preferable.

Muttley@dastardlyhq.com: Mar 29 01:41PM

On Wed, 29 Mar 2023 14:23:25 +0200
>> Instead C++ should get support for exponentiation operator, possibly
>> `**` like Python.

>That's a feature many have called for, in C and C++, spanning decades.

Given how long it took to (officially) include 0b for binary literals I won't
be holding my breath. The amount of time that it could have personally saved
me in the past by not having to convert binary into hex and back would be
significant.

"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Mar 29 07:03PM +0200

On 2023-03-29 2:23 PM, David Brown wrote:
>> universal character names) that denote non-ASCII characters.

> It changed with gcc 10, which is about 3 years old. (I don't know if
> you consider that "recent" or not.)

Oh. Thanks.

"
[C:\root\temp]
> g++ --version
g++ (GCC) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[C:\root\temp]
> bash
alf@Alf-Windows-PC:/mnt/c/root/temp$ g++ --version
g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
"

So how does that go for Ubuntu, like "sudo apt update something something"?

> Different compilers have had different levels of support, with clang
> being relatively early in having full UTF-8 input support while gcc only
> supported escape sequences (for C and C++) until gcc 10.

I meant the rules for identifiers.

[snip]
> assembly). Other than that, it follows the standard, with letters
> defined in the XID_Start and XID_Continue classes in Unicode:

> <https://www.unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers>

Most if not all extant compilers support `$`. Which I know because Herb
Sutter, the C++ standardization committee chair, once tried /using/ it
for some library stuff, and unlike my earlier efforts he got a lot of
people to try it and give feedback on it. The main problem wasn't
compilers but that some companies used `$` in their own preprocessing.

Probably the reason $ is not there in the C++ basic character set is the
destructively silly political argumentation that resulted in an
"international" version of ASCII with the not-used-ever-since-by-anybody
"international currency symbol" ¤ instead; <url:
https://en.wikipedia.org/wiki/Currency_sign_(typography)#History>.

The same kind of sensitivity-oriented people that now are removing scary
wording from Roald Dahl's novels and are on their way to ban "1984".

Mumble, mumble...

Anyway thanks!

I'll at least update the Windows MinGW version of g++, probably as easy
as downloading the Nuwen distro (maintained by STL over at Microsoft).

- Alf

David Brown <david.brown@hesbynett.no>: Mar 29 08:25PM +0200

On 29/03/2023 19:03, Alf P. Steinbach wrote:

>> It changed with gcc 10, which is about 3 years old. (I don't know if
>> you consider that "recent" or not.)

> Oh. Thanks.

No problem. Some of us have an unhealthily detailed knowledge of these
things - it's nice when that knowledge helps someone!

You can always test on <https://godbolt.og>.

> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> "

> So how does that go for Ubuntu, like "sudo apt update something something"?

I believe gcc 10 is in the Ubuntu 20.04 packages:

apt update
apt install gcc-10

You'll still have standard gcc 9.x by default, and run gcc 10 as g++-10.

>> being relatively early in having full UTF-8 input support while gcc
>> only supported escape sequences (for C and C++) until gcc 10.

> I meant the rules for identifiers.

So did I - that's what I was talking about (not run-time character set
support).

> for some library stuff, and unlike my earlier efforts he got a lot of
> people to try it and give feedback on it. The main problem wasn't
> compilers but that some companies used `$` in their own preprocessing.

There are dozens of C++ compilers (and hundreds of C compilers) in use,
most of which neither you nor Sutter will have ever used, because they
are for embedded systems. Some support $, others do not. gcc supports
it, but some targets might not like it in their assembler. In
particular, many assemblers use $ to indicate hex literals, others use
it for register names. You might find that "a$" is fine, but "$0" is not.

Of course most code doesn't need to be fully portable, and $ in
identifiers will work fine for any toolchain most programmers are likely
to meet.

> "international" version of ASCII with the not-used-ever-since-by-anybody
> "international currency symbol" ¤ instead; <url:
> https://en.wikipedia.org/wiki/Currency_sign_(typography)#History>.

No, it is about compatibility with assemblers - when the C basic
character set was determined, $ was in heavy but inconsistent use in
many assemblers of the time. Allowing it as a letter in identifiers
would have made things a lot more complicated.

"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Mar 29 09:07PM +0200

On 2023-03-29 8:25 PM, David Brown wrote:
> character set was determined, $ was in heavy but inconsistent use in
> many assemblers of the time. Allowing it as a letter in identifiers
> would have made things a lot more complicated.

Oh. I remember '@' for PDP-11 assembly and '$' for HP-3000 system
functions, but nothing about '$' for assembly. Learned something. :)

- Alf

Sam <sam@email-scan.com>: Mar 29 03:22PM -0400

Alf P. Steinbach writes:

> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> "

> So how does that go for Ubuntu, like "sudo apt update something something"?

I don't have an Ubuntu box, at this moment, but it should be "sudo apt
install g++-10". Ubuntu packages multiple versions of gcc, you would use
g++-10 to compile, etc…

scott@slp53.sl.home (Scott Lurndal): Mar 29 07:32PM

>> would have made things a lot more complicated.

>Oh. I remember '@' for PDP-11 assembly and '$' for HP-3000 system
>functions, but nothing about '$' for assembly. Learned something. :)

VMS heavily used $ in system call names at source level (e.g. C/Pascal SYS$QIOW)
and in MACRO-32 identifiers ($QIOW).

"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Mar 29 11:16PM +0200

On 2023-03-29 9:22 PM, Sam wrote:

> I don't have an Ubuntu box, at this moment, but it should be "sudo apt
> install g++-10". Ubuntu packages multiple versions of gcc, you would use
> g++-10 to compile, etc…

Ah. Better. I should really have fixed the prompt and got an X-server
running in WSL (I know it's possible), and so on, but.

alf@Alf-Windows-PC:/mnt/c/root/temp$ (g++ --version; g++-10 --version) |
grep "++"
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
g++-10 (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0

- Alf

How to refer to my base class object from derived (without cast?)

Charlie R <charlier33@wp.pl>: Mar 29 10:05AM

From the derived class, I want to access my base class - the actual
*object* of the base-class part of this.

One thing that DOES work is to static_cast the (this) pointer into the
base class type, but that seems very sketchy (is this even always
correct?).

Example - how to fix the 2 "fail" lines?

#include <iostream>

class base {
public:
operator int() const { return 123; }
};

class child : public base {
public:
void show() {
std::cout << this->base::operator int() <<
"\n"; // works
std::cout << this->base << "\n"; // fails - I
want to refer to my parent object
base & parentobj = this->base; // fails too
base & parentobj2 = *
static_cast<base*>(this); // works... ugly?
}
};

int main() {
child obj;
obj.show();
}

Ralf Fassel <ralfixx@gmx.de>: Mar 29 12:13PM +0200

* Charlie R <charlier33@wp.pl>
| From the derived class, I want to access my base class - the actual
| *object* of the base-class part of this.

| One thing that DOES work is to static_cast the (this) pointer into the
| base class type, but that seems very sketchy (is this even always
| correct?).

| Example - how to fix the 2 "fail" lines?
--<snip-snip>--
| std::cout << this->base << "\n"; // fails - I want to refer to my parent object

std::cout << this->operator int() << "\n";
std::cout << operator int() << "\n";

| base & parentobj = this->base; // fails too

base & parentobj = *this;
std::cout << parentobj.operator int() << "\n";

HTH
R'

Charlie R <charlier33@wp.pl>: Mar 29 10:21AM

Wed, 29 Mar 2023 12:13:14 +0200, Ralf Fassel:

> parent object

> std::cout << this->operator int() << "\n";
> std::cout << operator int()

thanks.
Though, what when we have 2 base classes and both have int() conversion
operator?

Is there no direct way of saying "take my base object X" in an expression?
Is using the static_cast<type_of_base_class*>(this) the valid way?

Sam <sam@email-scan.com>: Mar 29 06:51AM -0400

Charlie R writes:

> thanks.
> Though, what when we have 2 base classes and both have int() conversion
> operator?

base1 &obj = *this;

or

base2 &obj = *this;

Then obj.operator int().

> Is there no direct way of saying "take my base object X" in an expression?
> Is using the static_cast<type_of_base_class*>(this) the valid way?

A cast is the only way to convert X to Y, in an expression.

"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Mar 29 01:46PM +0200

On 2023-03-29 12:21 PM, Charlie R wrote:
> operator?

> Is there no direct way of saying "take my base object X" in an expression?
> Is using the static_cast<type_of_base_class*>(this) the valid way?

You can and should then do

Base::operator int()

... or in external code

obj.Base::operator int()

- Alf

Would this be a good idea?

Charlie R <charlier33@wp.pl>: Mar 29 10:13AM

Thu, 16 Mar 2023 11:38:20 +0100, Bo Persson:

>> It would only be a small gain in efficiency but might be useful.

> There is already a proposal for this:

> https://wg21.link/p2169

std::lock_guard _(mutex1);
...
std::lock_guard _(mutex2);
...
auto [x, y, _] = f();
auto [a, b, _] = g();

I wonder how debugger should show me this 4 placeholders at end when I
look at local variables. Probably some weird internal names like with
lambdas. Would that even be a problem or what to do there...?

Muttley@dastardlyhq.com: Mar 29 10:25AM

On Wed, 29 Mar 2023 10:13:39 -0000 (UTC)

>I wonder how debugger should show me this 4 placeholders at end when I
>look at local variables. Probably some weird internal names like with
>lambdas. Would that even be a problem or what to do there...?

Perhaps the same way it deals with temporaries now? Whatever that is, I don't
actually know how any of the debuggers represent them.

Bo Persson <bo@bo-persson.se>: Mar 29 01:23PM +0200

On 2023-03-29 at 12:13, Charlie R wrote:

> I wonder how debugger should show me this 4 placeholders at end when I
> look at local variables. Probably some weird internal names like with
> lambdas. Would that even be a problem or what to do there...?

If you want them to have nice names, you are free to insert those names.
Using _ as a name means that you don't care about those objects.

So why care about things we don't care about? :-)

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Wednesday, March 29, 2023

Digest for comp.lang.c++@googlegroups.com - 18 updates in 3 topics

No comments:

Blog Archive

About Me