- single object - 2 Updates
- Is this safe? - 23 Updates
Armando di Matteo <armando.dimatteo87@gmail.com>: Feb 21 09:13AM -0800 Kunal Goswami wrote: > How can we assure that only one object is created for a class. And when second do not happen. > Note - don't use counter . Would a bool count as a counter? e.g. #include <cassert> class MyClass { static bool instantiated = false; MyClass() { assert(!instantiated); instantiated = true; } ~MyClass() { instantiated = false; } } would that be okay? |
Muttley@dastardlyhq.com: Feb 21 05:14PM On Tue, 21 Feb 2023 09:13:14 -0800 (PST) > ~MyClass() { instantiated = false; } >} >would that be okay? Don't help the kids with their coursework. |
Tony Oliver <guinness.tony@gmail.com>: Feb 20 04:14PM -0800 On Monday, 20 February 2023 at 19:16:53 UTC, Chris Vine wrote: > points. Since you are in Germany, one example is the esszet (ß), > which is one code point in lower case but two code points in upper > case (SS or SZ) by old/traditional orthography. The OP specifically used an ASCII string; no mention was made of UTF-8. Why did you need to bring it up? |
Tony Oliver <guinness.tony@gmail.com>: Feb 20 04:17PM -0800 On Monday, 20 February 2023 at 20:57:05 UTC, Chris Vine wrote: > > > be in the ASCII subset. > > ... But toupper() also works with UTF-8. > You are ill informed. It only works for the ASCII subset of UTF-8. And the example given in the OP is, indeed, ASCII. Again, why are you (irrelevantly) bringing UTF-8 into this? |
Manu Raju <MR@invalid.invalid>: Feb 21 01:57AM On 21/02/2023 00:17, Tony Oliver wrote: > Again, why are you (irrelevantly) bringing UTF-8 into this? The question was about "is this safe" and Chris told him indirectly that it is not safe because it doesn't work in full set of UTF-8. The OP might not even have thought of UTF-8 but answer has to be given to the question asked. |
Richard Damon <Richard@Damon-Family.org>: Feb 20 09:18PM -0500 > std::string s = "hello"; > for(auto &c: s) c = toupper(c); > std::cout << s << std::endl; One possible issue is the type of c will be char& but if char is signed, then if the string contains any characters with the sign bit set, toupper will have undefined behavior. you need to cast the value c to unsigned char before passing to toupper (which will then convert it to int). This is a case where auto gives you problems, as if you intend to be able to handle other versions of string built on types other than char, you need to know the unsigned equivalent for them. The localized toupper can handle that sort of operation (as it doesn't handle the -1 case, so it takes a charT parameter, not an int). As others have mentioned, it also assumes that your string is using a single byte encoding for characters (like plain ASCII, or the old "code-page" style text strings), as multi-byte encoding can't be handled by this form of toupper(). |
Bonita Montero <Bonita.Montero@gmail.com>: Feb 21 05:36AM +0100 Am 20.02.2023 um 21:56 schrieb Chris Vine: > You are ill informed. It only works for the ASCII subset of UTF-8. toupper only applies to a - z and thereby works with UTF-8. |
Andrey Tarasevich <andreytarasevich@hotmail.com>: Feb 20 08:38PM -0800 > std::string s = "hello"; > for(auto &c: s) c = toupper(c); > std::cout << s << std::endl; No, it is not safe in general case. Functions from <cctype> group generally require either a non-negative arguments or `EOF`. Otherwise, the behavior is undefined. This means that when you pass `char` values to these functions you better be sure that your `char` is unsigned or, at least, that all `char` values you are passing are non-negative. If you are not sure of that, you'll be better off explicitly casting the argument to `unsigned char`: for(auto &c: s) c = toupper((unsigned char) c); -- Best regards, Andrey |
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 20 11:48PM -0500 On 2/20/23 19:14, Tony Oliver wrote: >> case (SS or SZ) by old/traditional orthography. > The OP specifically used an ASCII string; no mention was made of UTF-8. > Why did you need to bring it up? I think that the answer to the original question is that it is definitely safe, because it contained only characters for which toupper() is guaranteed to work (regardless of encoding). Chris was incorrect to suggest otherwise. However, Bonita asked "why should this not work?" - in other words, how could there be any room for doubt? And the answer to that is indeed to point out that, with a different string, it might not work, so it is a legitimate question to ask if it will work with this particular string. Note that it's unnecessary to invoke UTF-8 specifically; any encoding that has MB_LEN_MAX > 1 can run into the same problem. |
"Öö Tiib" <ootiib@hot.ee>: Feb 21 12:05AM -0800 On Monday, 20 February 2023 at 21:16:53 UTC+2, Chris Vine wrote: > require between 1 and 5 bytes (code units) to encode it, upper and > lower case representations can occupy different numbers of code > points. From November 2003 <https://datatracker.ietf.org/doc/html/rfc3629> all 5 and 6 byte sequences were removed and so UTF-8 code point is now up to 4 bytes. But as a "character" can be made of several code points there are seemingly no limits. Flag of Scotland (🏴) takes 28 bytes. |
Muttley@dastardlyhq.com: Feb 21 09:14AM On Mon, 20 Feb 2023 18:34:34 +0100 >> for(auto &c: s) c = toupper(c); >> std::cout << s << std::endl; >Why should this not work ? Because I don't know where the reference is pointing to. Its been standard with strings not to alter the contents returned by c_str() so is this a similar case or is it an analogue of just doing s[<index>] which is gauranteed safe? |
Muttley@dastardlyhq.com: Feb 21 09:15AM On Tue, 21 Feb 2023 01:57:15 +0000 >it is not safe because it doesn't work in full set of UTF-8. The OP >might not even have thought of UTF-8 but answer has to be given to >the question asked. I was concerned whether the code would corrupt the string or crash the program. I couldn't give a f**k about UTF8. |
Muttley@dastardlyhq.com: Feb 21 09:18AM On Mon, 20 Feb 2023 20:38:24 -0800 >If you are not sure of that, you'll be better off explicitly casting the >argument to `unsigned char`: > for(auto &c: s) c = toupper((unsigned char) c); I can't believe I needed to explain what I meant by safe. I was not talking about character encoding FFS, I simply used toupper as an example. Ok, is this safe: for(auto &c: s) c = 'x'; Or will it screw up the string object or have no effect in some cases just as altering the contents returned by the c_str() pointer can sometimes have? |
Chris Vine <vine24683579@gmail.com>: Feb 21 02:19AM -0800 On Tuesday, 21 February 2023 at 04:35:06 UTC, Bonita Montero wrote: > toupper only applies to a - z ... That's wrong. Its behaviour is locale specific, and for example with a locale which uses the ISO-8859-1 codeset, will work for lower case characters outside the a-z range which have an upper case ISO-8859-1 representation. |
Paavo Helde <eesnimi@osa.pri.ee>: Feb 21 12:31PM +0200 >> the question asked. > I was concerned whether the code would corrupt the string or crash the > program. I couldn't give a f**k about UTF8. We understood that. Alas, there is no need for such concerns, a std::string owns its own memory and does not wrap any external memory. So we are having fun by nitpicking unrelated aspects. FYI: for wrapping there is std::string_view, but your example would not compile with s/string/string_view/. |
Bonita Montero <Bonita.Montero@gmail.com>: Feb 21 02:16PM +0100 Am 21.02.2023 um 11:19 schrieb Chris Vine: > On Tuesday, 21 February 2023 at 04:35:06 UTC, Bonita Montero wrote: >> toupper only applies to a - z ... > That's wrong. Its behaviour is locale specific, ... This ... template< class CharT > CharT toupper( CharT ch, const locale& loc ); ... is locale-specific, not C's toupper. |
Ralf Goertz <me@myprovider.invalid>: Feb 21 02:40PM +0100 Am Tue, 21 Feb 2023 14:16:23 +0100 > template< class CharT > > CharT toupper( CharT ch, const locale& loc ); > ... is locale-specific, not C's toupper. "man toupper" disagrees: NAME toupper, toupper_l — transliterate lowercase characters to uppercase SYNOPSIS #include <ctype.h> int toupper(int c); int toupper_l(int c, locale_t locale); DESCRIPTION For toupper(): The functionality described on this ref- erence page is aligned with the ISO C standard. Any con- flict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1‐2017 defers to the ISO C standard. The toupper() and toupper_l() functions have as a domain a type int, the value of which is representable as an unsigned char or the value of EOF. If the argument has any other value, the behavior is undefined. If the argument of toupper() or toupper_l() represents a lowercase letter, and there exists a corresponding up- percase letter as defined by character type information in the current locale or in the locale represented by locale, respectively (category LC_CTYPE), the result shall be the corresponding uppercase letter. … |
Chris Vine <vine24683579@gmail.com>: Feb 21 05:51AM -0800 On Tuesday, 21 February 2023 at 13:15:22 UTC, Bonita Montero wrote: > template< class CharT > > CharT toupper( CharT ch, const locale& loc ); > ... is locale-specific, not C's toupper. No. Here is what the standard says about toupper in <cctype>: [cctype.syn]/1: "The contents and meaning of the header <cctype> are the same as the C standard library header <ctype.h>." C11, 7.4.2.2/3, the toupper function: "If the argument is a character for which islower is true and there are one or more corresponding characters, as specified by the current locale, for which isupper is true, the toupper function returns one of the corresponding characters (always the same one for any given locale) ..." I don't have a more recent version of the C standard standard but I doubt it has changed. |
Paavo Helde <eesnimi@osa.pri.ee>: Feb 21 03:58PM +0200 21.02.2023 15:16 Bonita Montero kirjutas: > template< class CharT > > CharT toupper( CharT ch, const locale& loc ); > ... is locale-specific, not C's toupper. Seriously? A major headache with C is that a lot of functions like toupper() or sprintf() are locale-specific, but the locale is unpredictable and uncontrollable for library code loaded in a multithreaded process. |
"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Feb 21 03:12PM +0100 > for(auto &c: s) c = 'x'; > Or will it screw up the string object or have no effect in some cases just > as altering the contents returned by the c_str() pointer can sometimes have? Demonstrates the need for making examples short and to the point, to not introduce extraneous issues. The assignment is safe. `c` is a reference to an item in the string, within the string bounds, i.e. it can't be a reference to the null-item that since C++11 is effectively required past the end of the string. Not what you're asking, but the common convention in C++ is to separate the type specification from the variable, i.e. `auto& c` not `auto &c`. But on the third and gripping hand, in this case I'd write `char& c`, since there's no advantage in `auto` here and it obscures things. - Alf |
Muttley@dastardlyhq.com: Feb 21 02:53PM On Tue, 21 Feb 2023 15:12:39 +0100 >> as altering the contents returned by the c_str() pointer can sometimes have? >Demonstrates the need for making examples short and to the point, to not >introduce extraneous issues. 3 lines is short and I didn't introduce any extranious issues. Other posters created them. >`c` is a reference to an item in the string, within the string bounds, >i.e. it can't be a reference to the null-item that since C++11 is >effectively required past the end of the string. Ok. However in the past with some implementations of std::string, modifying the contents of the buffer returned by c_str() can have undefined consequences. I simply wondered if this was a similar situation. >Not what you're asking, but the common convention in C++ is to separate >the type specification from the variable, i.e. `auto& c` not `auto &c`. Whose common convention? >But on the third and gripping hand, in this case I'd write `char& c`, >since there's no advantage in `auto` here and it obscures things. Thats ironic coming from you. :) |
Muttley@dastardlyhq.com: Feb 21 02:55PM On Tue, 21 Feb 2023 15:58:21 +0200 >toupper() or sprintf() are locale-specific, but the locale is >unpredictable and uncontrollable for library code loaded in a >multithreaded process. Why would multithreading make any difference? Enviroment variables such as LC_TYPE are process wide. |
Andrey Tarasevich <andreytarasevich@hotmail.com>: Feb 21 08:15AM -0800 > for(auto &c: s) c = 'x'; > Or will it screw up the string object or have no effect in some cases just > as altering the contents returned by the c_str() pointer can sometimes have? What you do need to explain is the underlying roots/origins of your question. For anyone familiar with the language the above question will seem trivial, meaning that people will normally assume that you meant something more elaborate than the above. Yes, the above is fine. And, BTW, the original `c_str()` issue is no longer relevant. Starting from C++11 `c_str()` points to the same location as `data()`, i.e. it is required to point to the same controlled sequence that you access by all other means. -- Best regards, Andrey |
Paavo Helde <eesnimi@osa.pri.ee>: Feb 21 07:02PM +0200 >> multithreaded process. > Why would multithreading make any difference? Enviroment variables such as > LC_TYPE are process wide. Exactly. The locale is process wide, so whenever I want to do something which might require a different locale than the current process-wide locale (e.g. producing CSV/XML/JSON files, formatting program code, etc), I cannot use any functions like toupper() or sprintf() which use the process-wide locale. I cannot also use setlocale() because this might disturb other threads. There are alternative functions where one can specify the locale explicitly, typically ending with the "_l" suffix, but these are relative late-comers and not so universally supported. On Windows the names tend to be different, which complicates the things further. The locale-dependent functions tend to be quite slow anyway. In C++ we finally have some fast formatting support like std::to_chars(), but this was a long wait. The locale concept originates from 40 years back where it was assumed that all information in text format is meant for visual consumption by a human sitting near the same machine. This is not the case already for a long time. Not to speak about that the locale mechanisms are nowhere near up to the task for supporting real i22n. |
Muttley@dastardlyhq.com: Feb 21 05:13PM On Tue, 21 Feb 2023 19:02:33 +0200 >etc), I cannot use any functions like toupper() or sprintf() which use >the process-wide locale. I cannot also use setlocale() because this >might disturb other threads. Make your program multi process then. Problem solved. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment