soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

single object - 2 Updates
Is this safe? - 23 Updates

Armando di Matteo <armando.dimatteo87@gmail.com>: Feb 21 09:13AM -0800

Kunal Goswami wrote:

> How can we assure that only one object is created for a class. And when second do not happen.

> Note - don't use counter .

Would a bool count as a counter?

e.g.

#include <cassert>
class MyClass {
static bool instantiated = false;
MyClass() { assert(!instantiated); instantiated = true; }
~MyClass() { instantiated = false; }
}

would that be okay?

Muttley@dastardlyhq.com: Feb 21 05:14PM

On Tue, 21 Feb 2023 09:13:14 -0800 (PST)
> ~MyClass() { instantiated = false; }
>}

>would that be okay?

Don't help the kids with their coursework.

Is this safe?

Tony Oliver <guinness.tony@gmail.com>: Feb 20 04:14PM -0800

On Monday, 20 February 2023 at 19:16:53 UTC, Chris Vine wrote:
> points. Since you are in Germany, one example is the esszet (ß),
> which is one code point in lower case but two code points in upper
> case (SS or SZ) by old/traditional orthography.

The OP specifically used an ASCII string; no mention was made of UTF-8.

Why did you need to bring it up?

Tony Oliver <guinness.tony@gmail.com>: Feb 20 04:17PM -0800

On Monday, 20 February 2023 at 20:57:05 UTC, Chris Vine wrote:
> > > be in the ASCII subset.
> > ... But toupper() also works with UTF-8.

> You are ill informed. It only works for the ASCII subset of UTF-8.

And the example given in the OP is, indeed, ASCII.

Again, why are you (irrelevantly) bringing UTF-8 into this?

Manu Raju <MR@invalid.invalid>: Feb 21 01:57AM

On 21/02/2023 00:17, Tony Oliver wrote:

> Again, why are you (irrelevantly) bringing UTF-8 into this?

The question was about "is this safe" and Chris told him indirectly that
it is not safe because it doesn't work in full set of UTF-8. The OP
might not even have thought of UTF-8 but answer has to be given to
the question asked.

Richard Damon <Richard@Damon-Family.org>: Feb 20 09:18PM -0500

> std::string s = "hello";
> for(auto &c: s) c = toupper(c);
> std::cout << s << std::endl;

One possible issue is the type of c will be char& but if char is signed,
then if the string contains any characters with the sign bit set,
toupper will have undefined behavior.

you need to cast the value c to unsigned char before passing to toupper
(which will then convert it to int).

This is a case where auto gives you problems, as if you intend to be
able to handle other versions of string built on types other than char,
you need to know the unsigned equivalent for them. The localized toupper
can handle that sort of operation (as it doesn't handle the -1 case, so
it takes a charT parameter, not an int).

As others have mentioned, it also assumes that your string is using a
single byte encoding for characters (like plain ASCII, or the old
"code-page" style text strings), as multi-byte encoding can't be handled
by this form of toupper().

Bonita Montero <Bonita.Montero@gmail.com>: Feb 21 05:36AM +0100

Am 20.02.2023 um 21:56 schrieb Chris Vine:

> You are ill informed. It only works for the ASCII subset of UTF-8.

toupper only applies to a - z and thereby works with UTF-8.

Andrey Tarasevich <andreytarasevich@hotmail.com>: Feb 20 08:38PM -0800

> std::string s = "hello";
> for(auto &c: s) c = toupper(c);
> std::cout << s << std::endl;

No, it is not safe in general case. Functions from <cctype> group
generally require either a non-negative arguments or `EOF`. Otherwise,
the behavior is undefined.

This means that when you pass `char` values to these functions you
better be sure that your `char` is unsigned or, at least, that all
`char` values you are passing are non-negative.

If you are not sure of that, you'll be better off explicitly casting the
argument to `unsigned char`:

for(auto &c: s) c = toupper((unsigned char) c);

--
Best regards,
Andrey

James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 20 11:48PM -0500

On 2/20/23 19:14, Tony Oliver wrote:
>> case (SS or SZ) by old/traditional orthography.

> The OP specifically used an ASCII string; no mention was made of UTF-8.

> Why did you need to bring it up?

I think that the answer to the original question is that it is
definitely safe, because it contained only characters for which
toupper() is guaranteed to work (regardless of encoding). Chris was
incorrect to suggest otherwise.
However, Bonita asked "why should this not work?" - in other words, how
could there be any room for doubt? And the answer to that is indeed to
point out that, with a different string, it might not work, so it is a
legitimate question to ask if it will work with this particular string.
Note that it's unnecessary to invoke UTF-8 specifically; any encoding
that has MB_LEN_MAX > 1 can run into the same problem.

"Öö Tiib" <ootiib@hot.ee>: Feb 21 12:05AM -0800

On Monday, 20 February 2023 at 21:16:53 UTC+2, Chris Vine wrote:
> require between 1 and 5 bytes (code units) to encode it, upper and
> lower case representations can occupy different numbers of code
> points.

From November 2003 <https://datatracker.ietf.org/doc/html/rfc3629>
all 5 and 6 byte sequences were removed and so UTF-8 code point is
now up to 4 bytes. But as a "character" can be made of several code
points there are seemingly no limits. Flag of Scotland (🏴󠁧󠁢󠁳󠁣󠁴󠁿) takes 28
bytes.

Muttley@dastardlyhq.com: Feb 21 09:14AM

On Mon, 20 Feb 2023 18:34:34 +0100
>> for(auto &c: s) c = toupper(c);
>> std::cout << s << std::endl;

>Why should this not work ?

Because I don't know where the reference is pointing to. Its been standard
with strings not to alter the contents returned by c_str() so is this a
similar case or is it an analogue of just doing s[<index>] which is gauranteed
safe?

Muttley@dastardlyhq.com: Feb 21 09:15AM

On Tue, 21 Feb 2023 01:57:15 +0000
>it is not safe because it doesn't work in full set of UTF-8. The OP
>might not even have thought of UTF-8 but answer has to be given to
>the question asked.

I was concerned whether the code would corrupt the string or crash the
program. I couldn't give a f**k about UTF8.

Muttley@dastardlyhq.com: Feb 21 09:18AM

On Mon, 20 Feb 2023 20:38:24 -0800

>If you are not sure of that, you'll be better off explicitly casting the
>argument to `unsigned char`:

> for(auto &c: s) c = toupper((unsigned char) c);

I can't believe I needed to explain what I meant by safe. I was not talking
about character encoding FFS, I simply used toupper as an example.

Ok, is this safe:

for(auto &c: s) c = 'x';

Or will it screw up the string object or have no effect in some cases just
as altering the contents returned by the c_str() pointer can sometimes have?

Chris Vine <vine24683579@gmail.com>: Feb 21 02:19AM -0800

On Tuesday, 21 February 2023 at 04:35:06 UTC, Bonita Montero wrote:
> toupper only applies to a - z ...

That's wrong. Its behaviour is locale specific, and for example with a
locale which uses the ISO-8859-1 codeset, will work for lower case
characters outside the a-z range which have an upper case
ISO-8859-1 representation.

Paavo Helde <eesnimi@osa.pri.ee>: Feb 21 12:31PM +0200

>> the question asked.

> I was concerned whether the code would corrupt the string or crash the
> program. I couldn't give a f**k about UTF8.

We understood that. Alas, there is no need for such concerns, a
std::string owns its own memory and does not wrap any external memory.
So we are having fun by nitpicking unrelated aspects.

FYI: for wrapping there is std::string_view, but your example would not
compile with s/string/string_view/.

Bonita Montero <Bonita.Montero@gmail.com>: Feb 21 02:16PM +0100

Am 21.02.2023 um 11:19 schrieb Chris Vine:

> On Tuesday, 21 February 2023 at 04:35:06 UTC, Bonita Montero wrote:

>> toupper only applies to a - z ...

> That's wrong. Its behaviour is locale specific, ...

This ...

template< class CharT >
CharT toupper( CharT ch, const locale& loc );

... is locale-specific, not C's toupper.

Ralf Goertz <me@myprovider.invalid>: Feb 21 02:40PM +0100

Am Tue, 21 Feb 2023 14:16:23 +0100

> template< class CharT >
> CharT toupper( CharT ch, const locale& loc );

> ... is locale-specific, not C's toupper.

"man toupper" disagrees:

NAME
toupper, toupper_l — transliterate lowercase characters
to uppercase

SYNOPSIS
#include <ctype.h>

int toupper(int c);
int toupper_l(int c, locale_t locale);

DESCRIPTION
For toupper(): The functionality described on this ref-
erence page is aligned with the ISO C standard. Any con-
flict between the requirements described here and the
ISO C standard is unintentional. This volume of
POSIX.1‐2017 defers to the ISO C standard.

The toupper() and toupper_l() functions have as a domain
a type int, the value of which is representable as an
unsigned char or the value of EOF. If the argument has
any other value, the behavior is undefined.

If the argument of toupper() or toupper_l() represents a
lowercase letter, and there exists a corresponding up-
percase letter as defined by character type information
in the current locale or in the locale represented by
locale, respectively (category LC_CTYPE), the result
shall be the corresponding uppercase letter.
…

Chris Vine <vine24683579@gmail.com>: Feb 21 05:51AM -0800

On Tuesday, 21 February 2023 at 13:15:22 UTC, Bonita Montero wrote:

> template< class CharT >
> CharT toupper( CharT ch, const locale& loc );

> ... is locale-specific, not C's toupper.

No. Here is what the standard says about toupper in <cctype>:

[cctype.syn]/1: "The contents and meaning of the header <cctype> are
the same as the C standard library header <ctype.h>."

C11, 7.4.2.2/3, the toupper function: "If the argument is a character
for which islower is true and there are one or more corresponding
characters, as specified by the current locale, for which isupper is
true, the toupper function returns one of the corresponding
characters (always the same one for any given locale) ..."

I don't have a more recent version of the C standard standard but I doubt
it has changed.

Paavo Helde <eesnimi@osa.pri.ee>: Feb 21 03:58PM +0200

21.02.2023 15:16 Bonita Montero kirjutas:

> template< class CharT >
> CharT toupper( CharT ch, const locale& loc );

> ... is locale-specific, not C's toupper.

Seriously? A major headache with C is that a lot of functions like
toupper() or sprintf() are locale-specific, but the locale is
unpredictable and uncontrollable for library code loaded in a
multithreaded process.

"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Feb 21 03:12PM +0100

> for(auto &c: s) c = 'x';

> Or will it screw up the string object or have no effect in some cases just
> as altering the contents returned by the c_str() pointer can sometimes have?

Demonstrates the need for making examples short and to the point, to not
introduce extraneous issues.

The assignment is safe.

`c` is a reference to an item in the string, within the string bounds,
i.e. it can't be a reference to the null-item that since C++11 is
effectively required past the end of the string.

Not what you're asking, but the common convention in C++ is to separate
the type specification from the variable, i.e. `auto& c` not `auto &c`.

But on the third and gripping hand, in this case I'd write `char& c`,
since there's no advantage in `auto` here and it obscures things.

- Alf

Muttley@dastardlyhq.com: Feb 21 02:53PM

On Tue, 21 Feb 2023 15:12:39 +0100
>> as altering the contents returned by the c_str() pointer can sometimes have?

>Demonstrates the need for making examples short and to the point, to not
>introduce extraneous issues.

3 lines is short and I didn't introduce any extranious issues. Other posters
created them.

>`c` is a reference to an item in the string, within the string bounds,
>i.e. it can't be a reference to the null-item that since C++11 is
>effectively required past the end of the string.

Ok. However in the past with some implementations of std::string, modifying
the contents of the buffer returned by c_str() can have undefined consequences.
I simply wondered if this was a similar situation.

>Not what you're asking, but the common convention in C++ is to separate
>the type specification from the variable, i.e. `auto& c` not `auto &c`.

Whose common convention?

>But on the third and gripping hand, in this case I'd write `char& c`,
>since there's no advantage in `auto` here and it obscures things.

Thats ironic coming from you. :)

Muttley@dastardlyhq.com: Feb 21 02:55PM

On Tue, 21 Feb 2023 15:58:21 +0200
>toupper() or sprintf() are locale-specific, but the locale is
>unpredictable and uncontrollable for library code loaded in a
>multithreaded process.

Why would multithreading make any difference? Enviroment variables such as
LC_TYPE are process wide.

Andrey Tarasevich <andreytarasevich@hotmail.com>: Feb 21 08:15AM -0800

> for(auto &c: s) c = 'x';

> Or will it screw up the string object or have no effect in some cases just
> as altering the contents returned by the c_str() pointer can sometimes have?

What you do need to explain is the underlying roots/origins of your
question. For anyone familiar with the language the above question will
seem trivial, meaning that people will normally assume that you meant
something more elaborate than the above.

Yes, the above is fine.

And, BTW, the original `c_str()` issue is no longer relevant. Starting
from C++11 `c_str()` points to the same location as `data()`, i.e. it is
required to point to the same controlled sequence that you access by all
other means.

--
Best regards,
Andrey

Paavo Helde <eesnimi@osa.pri.ee>: Feb 21 07:02PM +0200

>> multithreaded process.

> Why would multithreading make any difference? Enviroment variables such as
> LC_TYPE are process wide.

Exactly. The locale is process wide, so whenever I want to do something
which might require a different locale than the current process-wide
locale (e.g. producing CSV/XML/JSON files, formatting program code,
etc), I cannot use any functions like toupper() or sprintf() which use
the process-wide locale. I cannot also use setlocale() because this
might disturb other threads.

There are alternative functions where one can specify the locale
explicitly, typically ending with the "_l" suffix, but these are
relative late-comers and not so universally supported. On Windows the
names tend to be different, which complicates the things further.

The locale-dependent functions tend to be quite slow anyway. In C++ we
finally have some fast formatting support like std::to_chars(), but this
was a long wait.

The locale concept originates from 40 years back where it was assumed
that all information in text format is meant for visual consumption by a
human sitting near the same machine. This is not the case already for a
long time.

Not to speak about that the locale mechanisms are nowhere near up to the
task for supporting real i22n.

Muttley@dastardlyhq.com: Feb 21 05:13PM

On Tue, 21 Feb 2023 19:02:33 +0200
>etc), I cannot use any functions like toupper() or sprintf() which use
>the process-wide locale. I cannot also use setlocale() because this
>might disturb other threads.

Make your program multi process then. Problem solved.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Tuesday, February 21, 2023

Digest for comp.lang.c++@googlegroups.com - 25 updates in 2 topics

No comments:

Blog Archive

About Me