soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Is this safe? - 5 Updates

Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 01:57PM -0800

>>> zero and 255 would result in UB.

>>-1 and 255, since EOF was explicitly an allowed value.

> Not in SVR4.2, see msg <W7NJL.30343$Kqu2.1845@fx01.iad>

Yes in SVR4.2. The code in the cited article allows for a -1 argument.

extern unsigned char __ctype[];
[...]
#define isalpha(c) ((__ctype + 1)[c] & (_U | _L))

Adding 1 to the array address allows for an index of -1.

(The standard requires EOF to have a negative value. This is a good
reason for it to be exactly -1, and I've never heard of an
implementation where EOF != -1.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:19PM -0800

> or greater than UCHAR_MAX would make it less efficient, and would only benefit
> code that has undefined behavior. Many implementations provide such safety only in
> a special debugging mode.

An implementation could make _ctype_table cover values from SCHAR_MIN to
UCHAR_MAX and use an SCHAR_MIN offset when indexing it. That would make
it well defined for any value within the range of signed char, char, or
unsigned char. No implementations are *required* to do this, but any
that do will avoid crashing when passing arbitrary char values to the
is*() and to() functions.

GNU's glibc appears to do something like this.

I'd like to see a future standard require well defined behavior for all
values from SCHAR_MIN to UCHAR_MAX.

(There could be a problem treating -1 as EOF and 255 as the letter 'ÿ'.
I'm tempted to argue that the special treatment of EOF has outlived its
usefulness, but I'm not suggesting a breaking change.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:31PM -0800

>> rubbish.

> Nope. "Returning rubbish" would be an example _unspecified
> behavior_. Undefined is a wholly different thing.

Crashing, returning rubbish, returning a sensible result, and
making demons fly out of your nose are *all* permitted consequences
of undefined behavior.

Unspecified behavior is limited to two or more possibilities
that are always (C) or usually (C++) specified by the standard.
Implementation-defined behavior is unspecified behavior where
the implementation must document its choice. The standard never
includes nasal demons as one of the possibilities.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:41PM -0800

> A Nummber of languages do add there own characters for the digits,
> besides the basic arabic numerals included in the standard character
> set.

5.2.1 (I'm using the n1570 C standard draft) does not say that
characters outside the basic character set can be digits. In
enumerating the characters that are included in the basic source and
execution character sets, it says:

the 10 decimal *digits*

0 1 2 3 4 5 6 7 8 9

The word "digits" is in italics, so this is the definition of the word.
If I'm reading it correctly, a character like '²' (superscript two)
might be in the extended character set, but it cannot be a "digit" in
the meaning used in the standard.

Similarly:

A *letter* is an uppercase letter or a lowercase letter as defined
above; in this International Standard the term does not include
other characters that are letters in other alphabets.

where the "above" includes a list of the 26 uppercase and 26 lowercase
Latin letters.

The isupper and islower functions can return a true result either for a
*letter* or for other locale-specific characters. isdigit() is not
locale-specific; it tests only for "any decimal-digit character (as
defined in 5.2.1)".

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:48PM -0800

scott@slp53.sl.home (Scott Lurndal) writes:
[...]
> there was no need to ever pass the value assigned to the EOF macro.

> if (isascii(c) && isdigit(c))

> is using the API in the manner in which it was designed.

Perhaps, but isascii() was never included in the C or C++ standard
(neither of which excludes EBCDIC or other character sets).

The is*() and to*() functions can safely handle the value returned
by getchar(), which is an int either in the range of unsigned char
or equal to EOF. They cannot safely handle arbitrary values in
a string.

The undefined behavior for negative values other than EOF is clearly
stated in the standard, so any program that fails because of it
is a buggy program -- but I suggest that it's also a misfeature,
and arguably a bug, in the standard itself.

I wouldn't mind seeing a future standard require plain char to be
unsigned. I wonder if there are any strong arguments against that.
(Yes, it would require some work for compiler and library implementers.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Thursday, February 23, 2023

Digest for comp.lang.c++@googlegroups.com - 5 updates in 1 topic

No comments:

Blog Archive

About Me