- Is this safe? - 5 Updates
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 01:57PM -0800 >>> zero and 255 would result in UB. >>-1 and 255, since EOF was explicitly an allowed value. > Not in SVR4.2, see msg <W7NJL.30343$Kqu2.1845@fx01.iad> Yes in SVR4.2. The code in the cited article allows for a -1 argument. extern unsigned char __ctype[]; [...] #define isalpha(c) ((__ctype + 1)[c] & (_U | _L)) Adding 1 to the array address allows for an index of -1. (The standard requires EOF to have a negative value. This is a good reason for it to be exactly -1, and I've never heard of an implementation where EOF != -1.) -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for XCOM Labs void Void(void) { Void(); } /* The recursive call of the void */ |
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:19PM -0800 > or greater than UCHAR_MAX would make it less efficient, and would only benefit > code that has undefined behavior. Many implementations provide such safety only in > a special debugging mode. An implementation could make _ctype_table cover values from SCHAR_MIN to UCHAR_MAX and use an SCHAR_MIN offset when indexing it. That would make it well defined for any value within the range of signed char, char, or unsigned char. No implementations are *required* to do this, but any that do will avoid crashing when passing arbitrary char values to the is*() and to() functions. GNU's glibc appears to do something like this. I'd like to see a future standard require well defined behavior for all values from SCHAR_MIN to UCHAR_MAX. (There could be a problem treating -1 as EOF and 255 as the letter 'ΓΏ'. I'm tempted to argue that the special treatment of EOF has outlived its usefulness, but I'm not suggesting a breaking change.) -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for XCOM Labs void Void(void) { Void(); } /* The recursive call of the void */ |
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:31PM -0800 >> rubbish. > Nope. "Returning rubbish" would be an example _unspecified > behavior_. Undefined is a wholly different thing. Crashing, returning rubbish, returning a sensible result, and making demons fly out of your nose are *all* permitted consequences of undefined behavior. Unspecified behavior is limited to two or more possibilities that are always (C) or usually (C++) specified by the standard. Implementation-defined behavior is unspecified behavior where the implementation must document its choice. The standard never includes nasal demons as one of the possibilities. -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for XCOM Labs void Void(void) { Void(); } /* The recursive call of the void */ |
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:41PM -0800 > A Nummber of languages do add there own characters for the digits, > besides the basic arabic numerals included in the standard character > set. 5.2.1 (I'm using the n1570 C standard draft) does not say that characters outside the basic character set can be digits. In enumerating the characters that are included in the basic source and execution character sets, it says: the 10 decimal *digits* 0 1 2 3 4 5 6 7 8 9 The word "digits" is in italics, so this is the definition of the word. If I'm reading it correctly, a character like '²' (superscript two) might be in the extended character set, but it cannot be a "digit" in the meaning used in the standard. Similarly: A *letter* is an uppercase letter or a lowercase letter as defined above; in this International Standard the term does not include other characters that are letters in other alphabets. where the "above" includes a list of the 26 uppercase and 26 lowercase Latin letters. The isupper and islower functions can return a true result either for a *letter* or for other locale-specific characters. isdigit() is not locale-specific; it tests only for "any decimal-digit character (as defined in 5.2.1)". -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for XCOM Labs void Void(void) { Void(); } /* The recursive call of the void */ |
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:48PM -0800 scott@slp53.sl.home (Scott Lurndal) writes: [...] > there was no need to ever pass the value assigned to the EOF macro. > if (isascii(c) && isdigit(c)) > is using the API in the manner in which it was designed. Perhaps, but isascii() was never included in the C or C++ standard (neither of which excludes EBCDIC or other character sets). The is*() and to*() functions can safely handle the value returned by getchar(), which is an int either in the range of unsigned char or equal to EOF. They cannot safely handle arbitrary values in a string. The undefined behavior for negative values other than EOF is clearly stated in the standard, so any program that fails because of it is a buggy program -- but I suggest that it's also a misfeature, and arguably a bug, in the standard itself. I wouldn't mind seeing a future standard require plain char to be unsigned. I wonder if there are any strong arguments against that. (Yes, it would require some work for compiler and library implementers.) -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for XCOM Labs void Void(void) { Void(); } /* The recursive call of the void */ |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment