Tuesday, February 28, 2023

Digest for comp.lang.c++@googlegroups.com - 19 updates in 2 topics

Lynn McGuire <lynnmcguire5@gmail.com>: Feb 28 03:35PM -0600

"Will Carbon Replace C++?" by Manuel Rubio
https://semaphoreci.com/blog/carbon
 
"The last CppNorth 2022 was announced with Chandler Carruth scheduled to
give a keynote, where he showed the results of a new scientific
experiment. The keynote was titled: Carbon Language: An experimental
successor to C++. But why do we need a successor and where did this idea
come from?"
 
Here we go again.
 
Lynn
Muttley@dastardlyhq.com: Feb 28 08:38AM

On Mon, 27 Feb 2023 18:31:35 +0100
>suspect you have never learned anything about serious programming, but
>merely had a few courses in C and C++ - that would explain why you don't
>understand basic concepts such as specifications.
 
I've written C and C++ in aerospace including Misra C to SIL4 standard and
even that doesn't spoon feed the developer to bounds check their array indexes
as its a given than any half decent developer would do it where required.
 
I've probably forgotten more about writing secure software than you ever knew
but if being patronising makes you feel better then go for it.
 
>and fix the bugs, but obsessing about pointless checks for one
>particular type of potential bugs is no help to anyone. It's just an
>excuse for not thinking and not paying attention to what you are doing.
 
Whatever you say. I'm surprised you even post to a C++ group as why would you
need the safety benefits of the language when clearly you're a master of
unchecked accesses? Real programmers don't bounds check arrays or check for
valid pointer values before they use them - just do a Nike, right?
"Öö Tiib" <ootiib@hot.ee>: Feb 28 01:33AM -0800

> need the safety benefits of the language when clearly you're a master of
> unchecked accesses? Real programmers don't bounds check arrays or check for
> valid pointer values before they use them - just do a Nike, right?
 
It is not safety benefit that out of bounds access is undefined behavior.
We have to use tools like valgrind, debug versions of standard library and
sanitisers to catch bounds access errors during testing. However if
everybody would start to write bounds check code to every place where
array is accessed then that would turn the code into unreadable.
Also what your code does when it realized that index that should not
be out of bounds is out of bounds? Can't fix the bug run time.
 
In other, checked languages the programming practice is very similar
to that. They just get guaranteed exceptions or crashes always, not only
on case of a debug tool being applied.
 
For example if someone writes catch to ArrayIndexOutOfBoundsException
in Java then review will question it ... was it for "safety"? Let it propagate
and crash, can't fix programming errors run-time. Does program logic rely
on that exception? Rethink that logic ... it is rather ugly.
 
Similar in Swift. All array accesses return optional values there. Yet a guard
block to detect that the result is not nil is rare as is passing the optional
to others to suffer. Instead it will be typically immediately force-unwrapped
with ! that crashes the program when optional is nil. Code will be full of
those "potential crashes" yet hundreds of thousands users per month
and app store reports zero crashes.
Muttley@dastardlyhq.com: Feb 28 10:17AM

On Tue, 28 Feb 2023 01:33:15 -0800 (PST)
>sanitisers to catch bounds access errors during testing. However if
>everybody would start to write bounds check code to every place where
>array is accessed then that would turn the code into unreadable.
 
I'm not suggesting its done everywhere, but the majority of bugs in C programs
are caused by out of bounds accesses either via arrays or pointers so its
obviously not done enough, and with attitudes like Mr Browns claiming they never
need to bother because their code is so perfect its hardly surprising.
"Öö Tiib" <ootiib@hot.ee>: Feb 28 02:43AM -0800

> are caused by out of bounds accesses either via arrays or pointers so its
> obviously not done enough, and with attitudes like Mr Browns claiming they never
> need to bother because their code is so perfect its hardly surprising.
 
Yes, lot of bugs are because of out of bounds accesses. No, that does not
indicate that code lacks checks. That indicates that unit tests are missing,
testing is inadequate and/or that tools that I mentioned are not used. So
bugs are not found and not fixed. Or otherwise let me reiterate: What your
code does when it realized that index that should not be out of bounds is
out of bounds? Can't fix the bug run time.
Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Feb 28 03:19AM -0800


> I've written C and C++ in aerospace including Misra C to SIL4 standard and
> even that doesn't spoon feed the developer to bounds check their array indexes
> as its a given than any half decent developer would do it where required.
 
It's a question of "defensive programming".
 
With the code fragment
void deleteentry(OBJECT *obj, int N, int index)
{
if (index >= 0 && index < N)
/* do deletion */
}
 
We suppress a crash if index is out of bounds. But that makes it harder to find the
place where the index calculation goes wrong. So it's very debateable whether to keep
the test in.
Richard Damon <Richard@Damon-Family.org>: Feb 28 08:10AM -0500

On 2/28/23 6:19 AM, Malcolm McLean wrote:
 
> We suppress a crash if index is out of bounds. But that makes it harder to find the
> place where the index calculation goes wrong. So it's very debateable whether to keep
> the test in.
 
Well, the answer to that is we should have an else clause that at
minimum logs the error, and perhaps core dumps and ends.
 
Of course, that is only appropriate if the function isn't supposed to be
called with bad indexes.
 
Sometimes the API is defined to be defensive, and "out of bounds" are
considered "legal" and to be quietyl ignored, but, as you imply, that
makes it harder to find errors that are creating the bad indexes.
 
APIs that require the caller to be sure of the validity of the
parameters lets us locate better where problems occur. That ALLOWS (but
doesn't require) the adding of test code to recheck and core dump on
errors, and then allows that removal for production.
 
A properly written program will never envoke undefined behavior, so the
unneeded checks aren't needed. But allowing them to be noisy helps get
the program into that state of being properly written.
 
APIs that require quiet ignoring just get in the way.
Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Feb 28 05:42AM -0800

On Tuesday, 28 February 2023 at 13:10:46 UTC, Richard Damon wrote:
> > the test in.
> Well, the answer to that is we should have an else clause that at
> minimum logs the error, and perhaps core dumps and ends.
 
Generally you can't log the error, at least in a production setting.
For example my code is ultimately released as a drawing package for artists.
It's not acceptable to log an error, because that would generate calls to
customer support which we wouldn't be able to handle effectively. That's
quite common. In any sort of consumer / end user setting, it's not usually
appropriate to involve the customer in debugging.
Richard Damon <Richard@Damon-Family.org>: Feb 28 09:39AM -0500

On 2/28/23 8:42 AM, Malcolm McLean wrote:
> customer support which we wouldn't be able to handle effectively. That's
> quite common. In any sort of consumer / end user setting, it's not usually
> appropriate to involve the customer in debugging.
 
No, you can still generally "log" the error, you just do it in a way
that the customer can't normally see. You likely want the log to auto
clear after some time, but then if the customer DOES have an issue and
contacts support, they can do something to retrieve recent problem reports.
 
Sort of like your car, there are lots of errors that might occur that
get put into the internal log that don't bring up an "idiot" light, and
at service time they can check for problems, or if a light does come on,
they can look at the log to diagnose.
Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Feb 28 07:12AM -0800

On Tuesday, 28 February 2023 at 14:39:37 UTC, Richard Damon wrote:
> get put into the internal log that don't bring up an "idiot" light, and
> at service time they can check for problems, or if a light does come on,
> they can look at the log to diagnose.
 
The decision to set up that sort of log is taken by the people who design the
system, in that case probably the systems architect appointed by the automobile
manufacturer. He might then provide an interface so that third party
programmers can access the system and generate reports.
But I can't really set up a logging system myself, in the absence of a package-wide
system. We could make it work, of course, but it would be inappropriate. Though in
fact we are in the process of developing a company wide logging system to log
the usage patterns of our portions of the software. I couldn't untilaterally decide
to extend or abuse that to log programming errors, however .
In most programming, you are seldom working on a one man project where you
are taking all the decisions yourself.
David Brown <david.brown@hesbynett.no>: Feb 28 04:37PM +0100


> I've written C and C++ in aerospace including Misra C to SIL4 standard and
> even that doesn't spoon feed the developer to bounds check their array indexes
> as its a given than any half decent developer would do it where required.
 
I've done Misra and SIL too - and some of the rules involved are
directly stupid and counter-productive. (Most are good.) For good
quality, reliable hardware and software, following Misra and/or SIL
reduces reliability. Much of it is just about arse-covering - saying
you followed these rules so that you won't get sued if something else
goes wrong.
 
Blindly following rules - such as adding bounds checks for every array
access - is just a small step above blind stupidity.
 
Write the code correctly, and you won't be adding extra checks. You'll
have checks where checks are useful because you are not sure of the
validity of the data at that point. But re-checking data that you know
is valid is simply /wrong/. It can be encouraged by lawyers, but that
doesn't stop it making the software less reliable in total.
 
(For some kinds of aerospace work you must take into account the
possibility of unpredictable errors due to radiation and single-event
upsets. Extra checks in software is, again, the wrong place to handle
these - but sometimes it is not practical to handle them in the correct
place with better hardware, ECC, redundancy, etc. Extra checks may then
be better than nothing.)
 
 
> I've probably forgotten more about writing secure software than you ever knew
> but if being patronising makes you feel better then go for it.
 
You have apparently forgotten the most important part. Make good, clear
specifications for your code, and stick to them. In serious high
reliability code, it is not unusual to /prove/ functions are correct
according to the specifications - but you will at least expect
comprehensive testing and code reviews. When the specifications for
each function is correct, and each function follows the specifications,
the complete system is correct. And it does not need extra pointless
checks - they just hide problems, muddle the code, and make
comprehensive system testing impossible.
 
So if you want a function with the spec:
 
// Function: small_power_of_ten
// Inputs: uint32_t x, x >= 0, x <= 6
// Outputs: uint32_t y = 10 ^ x
 
then you can implement it as:
 
uint32_t small_power_of_ten(uint32_t x) {
constexpr uint32_t powers[7] = {
1, 10, 100, 1000,
10000, 100000, 1000000
};
return powers[x];
}
 
You should /not/ be adding an extra check for the array bounds in
small_power_of_ten. You should be checking that any function that calls
this one does so with a valid input. No function should ever call
"small_power_of_ten" with a value above 6, because there is no
specification for what will happen in such a case - that would clearly
be a bug in the calling code.
 
And if you think it would be more useful in your code to have a function
that accepted a wider range of inputs, then you should specify it:
 
// Function: small_power_of_ten_checked
// Inputs: uint32_t x
// Outputs: uint32_t y = 10 ^ x if x <= 6
// throws an exception if x > 6
 
That's a different function entirely.
 
 
> need the safety benefits of the language when clearly you're a master of
> unchecked accesses? Real programmers don't bounds check arrays or check for
> valid pointer values before they use them - just do a Nike, right?
 
You really do not understand, do you?
 
In good code, there are no out-of-bounds accesses. But that is achieved
by having code generate correct (and therefore valid) indices, not by
checking them before use and trying to somehow magically have the called
function fix the bug in the calling function.
David Brown <david.brown@hesbynett.no>: Feb 28 04:45PM +0100

On 28/02/2023 14:42, Malcolm McLean wrote:
 
>>> We suppress a crash if index is out of bounds. But that makes it harder to find the
>>> place where the index calculation goes wrong. So it's very debateable whether to keep
>>> the test in.
 
Indeed. You might have extra tests during testing and debugging phases,
to help track errors. And sometimes you have no option but to release
code that you don't know for sure is correct - programmers don't always
get to decide on release plans.
 
>> Well, the answer to that is we should have an else clause that at
>> minimum logs the error, and perhaps core dumps and ends.
 
> Generally you can't log the error, at least in a production setting.
 
Generally, there is no "generally". There are endless varieties of what
might be called a "production setting", with an almost endless variety
of how to handle situations like this.
 
> customer support which we wouldn't be able to handle effectively. That's
> quite common. In any sort of consumer / end user setting, it's not usually
> appropriate to involve the customer in debugging.
 
Right, so it is better to let the customers continue to pay for and use
buggy software than to learn about and fix the problems? That might be
appropriate in some cases - and certainly I appreciate that it rarely
helps to have lots of people report the same bug to customer support.
David Brown <david.brown@hesbynett.no>: Feb 28 04:50PM +0100

On 28/02/2023 11:43, Öö Tiib wrote:
> bugs are not found and not fixed. Or otherwise let me reiterate: What your
> code does when it realized that index that should not be out of bounds is
> out of bounds? Can't fix the bug run time.
 
Exactly.
 
I have nothing against appropriate checks at the right places. I am
against thoughtless and pointless checks that generally can't be tested,
and can't do anything useful if the checks fail. The correct answer is
to make sure the calling code is developed correctly. If you can't do
that - perhaps the calling code is written by a different group - /then/
extra checks might be useful and you must specify how failures are
handled (logs, crashes, hope for the best, email the developers' boss,
etc.).
 
Thus public API's for a high-level library can often include extra checks.
 
Low-level code should not, except when it is a useful aid to debugging.
Muttley@dastardlyhq.com: Feb 28 03:56PM

On Tue, 28 Feb 2023 16:37:57 +0100
>> as its a given than any half decent developer would do it where required.
 
>I've done Misra and SIL too - and some of the rules involved are
>directly stupid and counter-productive. (Most are good.) For good
 
There are a few odd choices yes.
 
>reduces reliability. Much of it is just about arse-covering - saying
>you followed these rules so that you won't get sued if something else
>goes wrong.
 
Arse covering comes into it but a lot of it is very sensible.
 
>possibility of unpredictable errors due to radiation and single-event
>upsets. Extra checks in software is, again, the wrong place to handle
 
If its a CPU register thats get zapped software is the only practical place to
handle it.
 
>comprehensive testing and code reviews. When the specifications for
>each function is correct, and each function follows the specifications,
>the complete system is correct. And it does not need extra pointless
 
A function can follow specifications to the letter and still have hidden
bugs. And if you make the specifications so in depth as to account for every
eventuality then you end up debugging the specifications as well as the
software. Something formal methods advocates tend to forget.
 
>small_power_of_ten. You should be checking that any function that calls
>this one does so with a valid input. No function should ever call
>"small_power_of_ten" with a value above 6, because there is no
 
And no one should ever steal either so I don't need a lock on my doors, right?
 
>specification for what will happen in such a case - that would clearly
>be a bug in the calling code.
 
Its also a bug in your code in that it doesn't catch it.
 
>You really do not understand, do you?
 
I think I understand better than you. You seem to think complicated code can
be provenly correct. It can't without a huge amount of time and effort and
even then , its only as good as your spec in the first place.
 
>by having code generate correct (and therefore valid) indices, not by
>checking them before use and trying to somehow magically have the called
>function fix the bug in the calling function.
 
face <- palm
Richard Damon <Richard@Damon-Family.org>: Feb 28 11:29AM -0500

On 2/28/23 10:12 AM, Malcolm McLean wrote:
> to extend or abuse that to log programming errors, however .
> In most programming, you are seldom working on a one man project where you
> are taking all the decisions yourself.
 
But if you are working on a project, you are not a passive slave to that
project, but can suggest improvements to the infrastructure.
 
Yes, if the improvements are rejected, then it isn't your problem.
 
I suppose that is the difference between a developer, and a coder.
"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Feb 28 05:40PM +0100

> need the safety benefits of the language when clearly you're a master of
> unchecked accesses? Real programmers don't bounds check arrays or check for
> valid pointer values before they use them - just do a Nike, right?
 
In case you're not just trolling, to get more background on the "no
precondition check" view I recommend diving into Bertrand Meyer's
classic book Object-Oriented Software Construction, in particular
chapter 11 "Design by Contract: building reliable software" and chapter
12 "When the contract is broken: exception handling".
 
With Eiffel (BM's own language) one can turn checking on or off, sort of
like `assert`s in C++.
 
The ideal BM argues for relies heavily on that and other language and
tool support that just isn't there for C++ in general.
 
However, there's are reasons why C++ is still popular and BM's newer
more ideal language Eiffel never gained widespread popularity, and IMO a
main one is that C++ is very much about dealing with an imperfect real
world, including imperfect language and tool support.
 
And that includes that not everything needs to be automated: with a
little extra effort we can do without the conveniences of Eiffel, and
then (with that little extra effort) "no checking" can be practical.
 
 
- Alf
 
Link: https://en.wikipedia.org/wiki/Object-Oriented_Software_Construction
David Brown <david.brown@hesbynett.no>: Feb 28 06:34PM +0100

>> upsets. Extra checks in software is, again, the wrong place to handle
 
> If its a CPU register thats get zapped software is the only practical place to
> handle it.
 
You are joking, right?
 
If there is a realistic possibility of cpu registers getting zapped, and
the system is vital, you use redundant hardware. You use something like
a PPC or Cortex-R device with dual lock-step processors, or some other
external redundancy checking hardware. You don't cross your fingers and
hope that the error happens to affect one register bit at one point in
time, and that other register bits are not just as likely to be hit
sooner or later. Remember, it is just as likely for the cpu register
holding the index value to get hit just /after/ your check as just
/before/ your check - or the register holding the array base could be
hit, or the program counter, or anything else. Extra software checks
can sometimes marginally improve your reliability, but only marginally -
and it is usually /extremely/ difficult to quantify.
 
>> the complete system is correct. And it does not need extra pointless
 
> A function can follow specifications to the letter and still have hidden
> bugs.
 
The /specification/ can have bugs or be inappropriate in some way. The
/hardware/ can have bugs. But if a piece of code gives the specified
output and effects for the specified inputs, then it is correct and
bug-free.
 
> And if you make the specifications so in depth as to account for every
> eventuality then you end up debugging the specifications as well as the
> software. Something formal methods advocates tend to forget.
 
You split things up into manageable pieces.
 
And yes, you need to get the specifications right - that is also part of
the job of software design. But again, you divide into parts and you
divide the responsibilities (even if it is the same person doing
everything). Get the specifications right for a function, then
implement the function trusting the specifications. If you can't trust
the specifications, you are not ready to code the function.
 
 
>> this one does so with a valid input. No function should ever call
>> "small_power_of_ten" with a value above 6, because there is no
 
> And no one should ever steal either so I don't need a lock on my doors, right?
 
You lock your front door, because that forms a boundary between areas of
trust. You don't lock your in-house doors because you trust those
inside the house.
 
Check the parameters for calls from external source, not the inside ones.
 
>> specification for what will happen in such a case - that would clearly
>> be a bug in the calling code.
 
> Its also a bug in your code in that it doesn't catch it.
 
No, no, no. Establish your responsibilities and who is doing what task.
The calling function must guarantee that it provides inputs according
to the callee's specification. The callee assumes these are valid. It
is not the callee's job to figure out what the caller did wrong - that
would be impossible! (Again, as always, you might be able to provide
some help during debugging.)
 
 
> I think I understand better than you. You seem to think complicated code can
> be provenly correct. It can't without a huge amount of time and effort and
> even then , its only as good as your spec in the first place.
 
Oh, I agree that code cannot be better than the specifications. And I
agree that complicated code (or complicated specifications) make things
much more difficult. That's why you divide things up into parts - both
code and specifications - and handle them individually. That's why you
don't waste time and effort making things needlessly extra complicated
by duplicating effort or mixing work that is supposed to be done in the
separate tasks. Let the caller do its job right, and let the callee
trust its inputs - don't put the responsibility of getting the inputs
right into the wrong task or function, or into /both/ tasks or functions.
 
Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Feb 28 11:31AM -0800

On Tuesday, 28 February 2023 at 15:45:47 UTC, David Brown wrote:
> buggy software than to learn about and fix the problems? That might be
> appropriate in some cases - and certainly I appreciate that it rarely
> helps to have lots of people report the same bug to customer support.
 
We're a commercial company. Whilst of course we want to be honest, and we do want our software to be as good as possible, ultimately we're about making money. And customer support costs money. We offer good customer support for things that the support team can support, for example we can advise customers about their workflow and potentially save them very large amounts of artists' time, we can suggest tools which the customer has not found. But the support team can't fix bugs. They have to report bugs to the developers, who are us, and then a fix will cost at least a thousand pounds. Now the issue with logging errors isn't so much that the bug will be reported, as that the customer thinks he's done us a favour because he's dilgently noted down the error report and recorded it. So he expects a fix. And that will cost a thousand pounds. And if he doesn't get it, he'll think badly of us. And he knows if exactly the same bug crops up again, which might be after the "fix". A customer is a client, not a friend. You do try to be friendly and helpful. But ultimately his interests are not your interests.
David Brown <david.brown@hesbynett.no>: Feb 28 08:59PM +0100

On 28/02/2023 20:31, Malcolm McLean wrote:
> same bug crops up again, which might be after the "fix". A customer
> is a client, not a friend. You do try to be friendly and helpful. But
> ultimately his interests are not your interests.
 
Yes, I appreciate that, and it's fair enough - time and effort costs
money. It just looked a little like you were hiding problems, to avoid
customer services being bothered!
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

Monday, February 27, 2023

Digest for comp.lang.c++@googlegroups.com - 6 updates in 1 topic

Muttley@dastardlyhq.com: Feb 27 08:43AM

On Sat, 25 Feb 2023 19:13:47 +0100
 
>The check is in the compiler. But it is not part of the printf function
>- which is not built into the compiler in any way, and is not used in
>Scott's code here.
 
So how does putting "format(printf" there work if it doesn't know about printf?
Muttley@dastardlyhq.com: Feb 27 08:44AM

On Sat, 25 Feb 2023 19:15:37 +0100
>write good quality code that works correctly, great. I haven't seen
>much evidence so far of you having any ability to learn, but I'll keep
>hoping.
 
So you generally don't bother doing array bounds checking and you patronise
me about not being able to write good code??
 
Please, look in the mirror sometime when you get a chance.
David Brown <david.brown@hesbynett.no>: Feb 27 10:31AM +0100

>> - which is not built into the compiler in any way, and is not used in
>> Scott's code here.
 
> So how does putting "format(printf" there work if it doesn't know about printf?
 
You misunderstand.
 
A compiler can /know/ about a standard library function, without having
an /implementation/ of it. An actual implementation of "printf" is in
not "built in" to gcc (or any other compiler that I have seen). But a
knowledge of the specification of "printf" might be.
 
The compiler has no idea where the output of "printf" ends up, or how it
will get there - that is up to the library that implements printf, the
OS that runs the program, and many other factors.
 
But it /does/ know that if you call printf with the format string
"%i%s", then you need two extra parameters - the first one of which is
an "int" (after default promotion for variadic functions), and the
second one is a string.
 
So a compiler can check that you have the right parameters to "printf"
even if the function is not built into the compiler.
 
 
In particular with the case of gcc, the compiler has a checking feature
for printf-style format strings (and scanf, and a couple of other
formats). The declaration of "printf" in <stdio.h> in standard
libraries used with gcc will have "__attribute__((format(printf..." in
their declaration. The parameter type checking is built into the
compiler, the function implementation is not.
 
(gcc has knowledge of a few other aspects of printf and friends -
sometimes it can figure out optimisations in a certain specific cases.
It might know, for example, some relations between varieties like
fprintf and printf, and it tries to do some buffer size checks on
sprintf and snprintf. But for anything non-trivial, it calls the
external library function to do the real work.)
David Brown <david.brown@hesbynett.no>: Feb 27 10:40AM +0100

>> hoping.
 
> So you generally don't bother doing array bounds checking and you patronise
> me about not being able to write good code??
 
Yes.
 
I write efficient code for small devices. Why should I waste processor
time checking something that I know is correct?
 
Good programming involves splitting tasks into subtasks, where each
subtask will do its own job. Subtasks, or functions, will have
specifications - sometimes written out explicitly, sometimes (especially
for small local functions) obvious. Every function can assume its
preconditions are satisfied when it is called - it is the job of the
/calling/ function to get that right.
 
Having each bit of code doing the job of other bits of code is just a
waste of everyone's time and effort, and leads to pointless redundancy
(not useful redundancy) and untestable and unmaintainable code. And it
encourages others to do a bad job - the guy writing the calling function
can be lazy because he thinks the called function will do all the work.
 
I don't do array bounds checking. I write code that accesses arrays
with the index value that is correct for the requirements of the program
- additional checks are therefore worse than useless.
Muttley@dastardlyhq.com: Feb 27 10:24AM

On Mon, 27 Feb 2023 10:40:55 +0100
>I don't do array bounds checking. I write code that accesses arrays
>with the index value that is correct for the requirements of the program
>- additional checks are therefore worse than useless.
 
I wonder how many C programmers thought that only to see their masterpiece
die in a heap?
 
Unless you're only doing array access in a loop up to array size or via literals
then I suspect your code will come unstuck one day.
David Brown <david.brown@hesbynett.no>: Feb 27 06:31PM +0100

> die in a heap?
 
> Unless you're only doing array access in a loop up to array size or via literals
> then I suspect your code will come unstuck one day.
 
I suspect you haven't a clue about how I work and what my code does. I
suspect you have never learned anything about serious programming, but
merely had a few courses in C and C++ - that would explain why you don't
understand basic concepts such as specifications.
 
I could be wrong, of course - maybe you have had decent courses but did
not understand them, or simply think that you know better than others.
 
In the meantime, I will continue to write code that does what it is
specified to do. Of course I make mistakes sometimes, and have to find
and fix the bugs, but obsessing about pointless checks for one
particular type of potential bugs is no help to anyone. It's just an
excuse for not thinking and not paying attention to what you are doing.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

Sunday, February 26, 2023

Digest for comp.lang.c++@googlegroups.com - 1 update in 1 topic

Daniel <danielaparker@gmail.com>: Feb 26 09:50AM -0800

On Friday, February 24, 2023 at 12:48:51 PM UTC-5, Alf P. Steinbach wrote:
> > to 80 characters, as many coding standards and style guidelines
> > recommend.
> Even better, that you updated to more modern software. :-o
 
I believe the first time that I've taken issue with Tim Rentsch, but,
yes :-)
 
Daniel
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

Saturday, February 25, 2023

Digest for comp.lang.c++@googlegroups.com - 14 updates in 1 topic

Tim Rentsch <tr.17687@z991.linuxsc.com>: Feb 24 03:24PM -0800


> and such aspects are documented for isupper (and others) but /not/ for
> isdigit. That a distinction is pointless unless isdigit is supposed to
> be an exception.
 
I think you may be misreading what is being said here (meaning in
section 7.4 paragraph 2). The behavior of all <ctype.h> functions
is (potentially) affected by the current locale. The functions that
are noted are _only_ those functions whose locale-specific aspects
occur in locales other than the "C" locale (but must not occur in
the "C" locale). A function that does not have a note might still
have locale-specific aspects, and in fact may have locale-specific
aspects even for the "C" locale. For example, in section 7.4.1.4,
there is no mention of the "C" locale, but the function iscntrl()
tests for a locale-specific set of characters; thus iscntrl() may
yield nonzero for control characters besides those mentioned in
section 5.2.1 paragraph 3, even when called while the "C" locale is
in effect.
 
None of the above changes what isdigit() is allowed to do, which is
specified as exactly the ten characters listed as decimal digits in
section 5.2.1 paragraph 3.
"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Feb 25 06:47AM +0100

On 2023-02-24 6:39 PM, David Brown wrote:
> some uses of the C++ "new" operator require inclusion of a standard
> library header, and both languages have standard library functions that
> cannot be implemented purely in the language itself.
 
#include <initializer_list> // std::initializer_list, for use
in range based `for`.
#include <new> // std::operator new, for placement
`new` expressions.
#include <typeinfo> // std::type/info, for using `typeid`.
 
#if CPP_VERSION >= CPP20
# include <compare> // for checking result of
spaceship operator `<=>`.

Friday, February 24, 2023

Digest for comp.lang.c++@googlegroups.com - 16 updates in 3 topics

Tim Rentsch <tr.17687@z991.linuxsc.com>: Feb 24 08:48AM -0800


> Well, now I have.
 
> From the link:
 
> #define EOF (-2) /**< End-of-file (usually from read) */
 
On implementations that have 8-bit chars, a reasonable choice for
EOF is -SCHAR_MAX-2.
Muttley@dastardlyhq.com: Feb 24 05:05PM

On Fri, 24 Feb 2023 18:20:30 +0200
>Eastern Arabic numerals, it would need to be able to recognize these
>numbers. These numbers would be used in "mathematical operations by the
>compiler" and would thus matter "from a programming POV".
 
man isdigit:
 
DESCRIPTION
The isdigit() function tests for a decimal digit character. Regardless
of locale, this includes the following characters only:
 
``0''``1''``2''``3''``4''
``5''``6''``7''``8''``9''
 
The isnumber() function behaves similarly to isdigit(), but may recognize
additional characters, depending on the current locale setting.
Muttley@dastardlyhq.com: Feb 24 05:09PM

On Fri, 24 Feb 2023 17:20:32 +0100
 
>> We're not talking about the language, we're talking about API functions.
>> the is*() functions unlike printf are not built in to the compiler.
 
>I don't know what distinction you are trying to make, but I really think
 
The distinction is that with built in functions the compiler can tell you
if you're passing daft values. Eg: many compilers will warning you if you
try to pass a non pointer to %s in printf which they couldn't do at compile
time if printf was a normal library function.
 
>> I'm aware how it does work. I'm also saying that crashing is a bug.
 
>Yes - the bug is in the code that calls the function with invalid inputs.
 
No - the bug is in the library function not bounds checking indexes into
an array.
 
Do you think not bounds checking a C array index is professional code?
A simple yes/no will suffice.
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 24 12:16PM -0500

On 2/24/23 11:20, David Brown wrote:
...
> the C standards, which do not distinguish between the "core language"
> and the "standard library" - both aspects are part of the C programming
> language.
 
That's not quite the case. Section 6 is titled "Language", and section 7
is titled "Library", so it does distinguish them - but it provides
specifications for both, and mandates that in a conforming
implementation, they work together to meet the requirements imposed upon
the implementation.
 
Since we are discussing the C standard in comp.lang.c++ (without having
veered off-topic - the C++ standard defines the behavior of the <ctype>
functions only by cross-referencing the definitions of the <ctype.h>
functions in the C standard), it should be noted that it isn't quite the
same in the C++ standard. Sections 16-32 of the C++ standard each have a
title referring to the C++ standard library, but sections 5-15, which
describe the C++ language, don't say so in their titles. However,
section 4.2p1-2 does explicitly describe this organizational structure
Paavo Helde <eesnimi@osa.pri.ee>: Feb 24 07:24PM +0200

> ``5''``6''``7''``8''``9''
 
> The isnumber() function behaves similarly to isdigit(), but may recognize
> additional characters, depending on the current locale setting.
 
Where did you see "isdigit" in my last response? If not, why bring it up
again? Compilers written in other languages might even not have
isdigit(), but would still need to parse numbers in whatever target
language they are compiling.
scott@slp53.sl.home (Scott Lurndal): Feb 24 05:32PM

>if you're passing daft values. Eg: many compilers will warning you if you
>try to pass a non pointer to %s in printf which they couldn't do at compile
>time if printf was a normal library function.
 
Why do you think that?
 
void log(const char *, ...)
__attribute__((format(printf, 2, 3)));
size_t trace(const char *, ...)
__attribute__((format(printf, 2, 3)));
David Brown <david.brown@hesbynett.no>: Feb 24 06:39PM +0100

On 24/02/2023 18:16, James Kuyper wrote:
> title referring to the C++ standard library, but sections 5-15, which
> describe the C++ language, don't say so in their titles. However,
> section 4.2p1-2 does explicitly describe this organizational structure
 
While there are different sections of the standards that cover the
library in detail, the standard libraries (in both C and C++) are
intertwined with the "core language" to some extent. For example, the C
"sizeof" operator returns a type that is declared in library headers,
some uses of the C++ "new" operator require inclusion of a standard
library header, and both languages have standard library functions that
cannot be implemented purely in the language itself.
 
Sometimes it is useful to distinguish a bit between "core language"
features and "standard library" features, and sometimes compilers and
standard libraries are implemented by separate groups. But they are
still defined in the same standards documents, still part of an
"implementation", and of very limited use if considered separately.
David Brown <david.brown@hesbynett.no>: Feb 24 06:42PM +0100

> if you're passing daft values. Eg: many compilers will warning you if you
> try to pass a non pointer to %s in printf which they couldn't do at compile
> time if printf was a normal library function.
 
Did you not understand anything I wrote? Or are you just refusing to
learn in the mistaken belief that learning makes you look bad?
 
 
> an array.
 
> Do you think not bounds checking a C array index is professional code?
> A simple yes/no will suffice.
 
Do you intend to stop asking stupid questions? A simple yes/no will
suffice.
 
Oh, wait, some questions don't make sense as they stand - and certainly
don't make sense in the context of a yes/no answer.
"james...@alumni.caltech.edu" <jameskuyper@alumni.caltech.edu>: Feb 24 09:46AM -0800

On Friday, February 24, 2023 at 12:06:13 PM UTC-5, Mut...@dastardlyhq.com wrote:
...
> of locale, this includes the following characters only:
 
> ``0''``1''``2''``3''``4''
> ``5''``6''``7''``8''``9''
 
That is perfectly correct, but it less authoritative than the citations
from the C standard which have already been posted to this thread,
and say the same thing (though less directly).
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 24 10:03AM -0800

David Brown <david.brown@hesbynett.no> writes:
[...]
> headers, some uses of the C++ "new" operator require inclusion of a
> standard library header, and both languages have standard library
> functions that cannot be implemented purely in the language itself.
 
I mostly agree, but I don't think sizeof is a good example.
 
The language-defined sizeof operator yields a result of an
implementation-defined unsigned integer type. The library provides a
name for that type, but a program can use sizeof without referring to
that name.
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
David Brown <david.brown@hesbynett.no>: Feb 24 07:59PM +0100

On 24/02/2023 19:03, Keith Thompson wrote:
> implementation-defined unsigned integer type. The library provides a
> name for that type, but a program can use sizeof without referring to
> that name.
 
Fair point.
Tim Rentsch <tr.17687@z991.linuxsc.com>: Feb 24 08:34AM -0800

> char *dest_pointer = dest;
 
> while (*source != '\0') *dest_pointer++ = *source_pointer++;
 
> I would call that moving the data, not processing the data.
 
I think it's fair to say that any effort that involves activity
by the processor may be called processing.
 
Furthermore your earlier message says "displays". The code shown
doesn't do any displaying.
"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Feb 24 06:48PM +0100

On 2023-02-24 4:51 PM, Tim Rentsch wrote:
> facts, you would simply limit the line lengths of your postings
> to 80 characters, as many coding standards and style guidelines
> recommend.
 
Even better, that you updated to more modern software. :-o
 
- Alf
Tim Rentsch <tr.17687@z991.linuxsc.com>: Feb 24 08:55AM -0800

> occurrences of "Hello, world!" in a given program point at the same
> location in memory, and to make "Hello, world!" + 7 == "world!". Both
> optimizations have actually been implemented by many implementations.
 
It's true that two or more string literals may have some bytes in
common, but that's not because writing to a byte in a string
literal has undefined behavior; it's because there is an explicit
statement in the C++ standard (and also the C standard) that allows
it. If anything the implication goes the other direction: because
pre-standard C implementations stored multiple string literals in
the same memory, when C was standardized it was pretty much a
necessity that storing into the bytes of a string literal had to
be undefined behavior.
 
Nitpick: a string literal need not be a string (as the C standard
points out, in a footnote IIRC).
Tim Rentsch <tr.17687@z991.linuxsc.com>: Feb 24 09:03AM -0800

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
 
[string literals may overlap]
 
 
> I hadn't realized until now that the standard allows two evaluations of
> the same string literal to refer to distinct objects (and it's difficult
> to imagine an implementation choice that would make them distinct).
 
IIANM the statement about two evaluations is present in C++
but not in C.
"james...@alumni.caltech.edu" <jameskuyper@alumni.caltech.edu>: Feb 24 09:43AM -0800

On Friday, February 24, 2023 at 11:55:33 AM UTC-5, Tim Rentsch wrote:
> the same memory, when C was standardized it was pretty much a
> necessity that storing into the bytes of a string literal had to
> be undefined behavior.
 
True. But the fact that storing into the bytes of a string literal has undefined
behavior is what makes storing multiple string literals in overlapping memory
workable. We're really saying the same thing, from two different points of view.
 
> Nitpick: a string literal need not be a string (as the C standard
> points out, in a footnote IIRC).
 
I agree, but not, I suspect, in the sense that you mean it. A string literal is a
source-code feature. The corresponding string that I was referring to is
created at run-time, so they can't be the same thing.
That array is guaranteed to be null-terminated, and therefore always
contains at least one string, which might be empty (as in ""). The value of a
string literal not used to initialize an array is a pointer to the first element of
that array, which is also necessarily the start of the first (and possibly only,
possibly empty) string contained in that array. It is that string which I was
referring to when I mentioned the "corresponding string".
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

Thursday, February 23, 2023

Digest for comp.lang.c++@googlegroups.com - 5 updates in 1 topic

Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 01:57PM -0800

>>> zero and 255 would result in UB.
 
>>-1 and 255, since EOF was explicitly an allowed value.
 
> Not in SVR4.2, see msg <W7NJL.30343$Kqu2.1845@fx01.iad>
 
Yes in SVR4.2. The code in the cited article allows for a -1 argument.
 
extern unsigned char __ctype[];
[...]
#define isalpha(c) ((__ctype + 1)[c] & (_U | _L))
 
Adding 1 to the array address allows for an index of -1.
 
(The standard requires EOF to have a negative value. This is a good
reason for it to be exactly -1, and I've never heard of an
implementation where EOF != -1.)
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:19PM -0800

> or greater than UCHAR_MAX would make it less efficient, and would only benefit
> code that has undefined behavior. Many implementations provide such safety only in
> a special debugging mode.
 
An implementation could make _ctype_table cover values from SCHAR_MIN to
UCHAR_MAX and use an SCHAR_MIN offset when indexing it. That would make
it well defined for any value within the range of signed char, char, or
unsigned char. No implementations are *required* to do this, but any
that do will avoid crashing when passing arbitrary char values to the
is*() and to() functions.
 
GNU's glibc appears to do something like this.
 
I'd like to see a future standard require well defined behavior for all
values from SCHAR_MIN to UCHAR_MAX.
 
(There could be a problem treating -1 as EOF and 255 as the letter 'ÿ'.
I'm tempted to argue that the special treatment of EOF has outlived its
usefulness, but I'm not suggesting a breaking change.)
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:31PM -0800

>> rubbish.
 
> Nope. "Returning rubbish" would be an example _unspecified
> behavior_. Undefined is a wholly different thing.
 
Crashing, returning rubbish, returning a sensible result, and
making demons fly out of your nose are *all* permitted consequences
of undefined behavior.
 
Unspecified behavior is limited to two or more possibilities
that are always (C) or usually (C++) specified by the standard.
Implementation-defined behavior is unspecified behavior where
the implementation must document its choice. The standard never
includes nasal demons as one of the possibilities.
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:41PM -0800


> A Nummber of languages do add there own characters for the digits,
> besides the basic arabic numerals included in the standard character
> set.
 
5.2.1 (I'm using the n1570 C standard draft) does not say that
characters outside the basic character set can be digits. In
enumerating the characters that are included in the basic source and
execution character sets, it says:
 
the 10 decimal *digits*
 
0 1 2 3 4 5 6 7 8 9
 
The word "digits" is in italics, so this is the definition of the word.
If I'm reading it correctly, a character like '²' (superscript two)
might be in the extended character set, but it cannot be a "digit" in
the meaning used in the standard.
 
Similarly:
 
A *letter* is an uppercase letter or a lowercase letter as defined
above; in this International Standard the term does not include
other characters that are letters in other alphabets.
 
where the "above" includes a list of the 26 uppercase and 26 lowercase
Latin letters.
 
The isupper and islower functions can return a true result either for a
*letter* or for other locale-specific characters. isdigit() is not
locale-specific; it tests only for "any decimal-digit character (as
defined in 5.2.1)".
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Feb 23 02:48PM -0800

scott@slp53.sl.home (Scott Lurndal) writes:
[...]
> there was no need to ever pass the value assigned to the EOF macro.
 
> if (isascii(c) && isdigit(c))
 
> is using the API in the manner in which it was designed.
 
Perhaps, but isascii() was never included in the C or C++ standard
(neither of which excludes EBCDIC or other character sets).
 
The is*() and to*() functions can safely handle the value returned
by getchar(), which is an int either in the range of unsigned char
or equal to EOF. They cannot safely handle arbitrary values in
a string.
 
The undefined behavior for negative values other than EOF is clearly
stated in the standard, so any program that fails because of it
is a buggy program -- but I suggest that it's also a misfeature,
and arguably a bug, in the standard itself.
 
I wouldn't mind seeing a future standard require plain char to be
unsigned. I wonder if there are any strong arguments against that.
(Yes, it would require some work for compiler and library implementers.)
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

Digest for comp.lang.c++@googlegroups.com - 25 updates in 2 topics

"james...@alumni.caltech.edu" <jameskuyper@alumni.caltech.edu>: Feb 22 09:21PM -0800

> "james...@alumni.caltech.edu" <james...@alumni.caltech.edu> wrote:
> >On Tuesday, February 21, 2023 at 9:53:23 AM UTC-5, Mut...@dastardlyhq.com
> >wrote:
...
> Absolute rubbish. Why would UTF8 have anything to do with "safety"? Safety
> means will the program crash or have hidden bugs, not whether a string gets
> translated into uppercase properly or not which would be immediately obvious.
 
Not getting the expected results can make other parts of the program malfunction.
There are certainly more dangerous problems a program can have, but there's more
reasons to worry about that issue than about the one that was your actual concern.
"james...@alumni.caltech.edu" <jameskuyper@alumni.caltech.edu>: Feb 22 09:30PM -0800

> On Tue, 21 Feb 2023 10:58:31 -0800 (PST)
> "james...@alumni.caltech.edu" <james...@alumni.caltech.edu> wrote:
...
> >parsed as a structured binding declaration, and in that context & would be
> No it wouldn't. Whitespace is not significant in C++ (ok, apart from the > >
> vs >>) template syntax hack up until 2011.
 
If white-space between tokens were significant after translation phase 5, the way in
which people used it wouldn't qualify as a "convention", but as a necessity for
correct code. Such conventions are chosen to make code easier for humans to
understand, not because they are needed to ensure that compilers handle the code
correctly.
 
Note: white-space within tokens is always significant, and white-space between
tokens can be significant in translation phase 4 and earlier.
Paavo Helde <eesnimi@osa.pri.ee>: Feb 23 09:39AM +0200


> Absolute rubbish. Why would UTF8 have anything to do with "safety"? Safety
> means will the program crash or have hidden bugs, not whether a string gets
> translated into uppercase properly or not which would be immediately obvious.
 
 
Because you asked: I have seen isdigit() crashing hard on negative
values, would not be surprised if toupper() would behave the same in
some implementation.
 
In case of a multi-byte UTF-8 character encoded in a std::string all its
bytes would have a negative value if 'char' is signed on that platform.
So there.
Muttley@dastardlyhq.com: Feb 23 09:25AM

On Wed, 22 Feb 2023 22:33:09 +0100
>> means will the program crash or have hidden bugs, not whether a string gets
>> translated into uppercase properly or not which would be immediately obvious.
 
>The UB includes that the program can crash.
 
Feel free to explain how toupper could crash.
Muttley@dastardlyhq.com: Feb 23 09:29AM

On Wed, 22 Feb 2023 22:34:37 +0100
>> vs >>) template syntax hack up until 2011.
 
>Either you missed the point, or you understood and deliberately snipped
>what you quoted to create a misleading impression.
 
I didn't miss the point at all. You implied that the positioning of the
whitespace in a declaration makes a difference. It doesn't.
Muttley@dastardlyhq.com: Feb 23 09:31AM

On Wed, 22 Feb 2023 21:30:04 -0800 (PST)
>correctly.
 
>Note: white-space within tokens is always significant, and white-space between
>tokens can be significant in translation phase 4 and earlier.
 
Quite obviously a program with no whitespace won't compile. That doesn't mean
the whitespace is significant in the programming sense.
Muttley@dastardlyhq.com: Feb 23 09:35AM

On Thu, 23 Feb 2023 09:39:04 +0200
>> means will the program crash or have hidden bugs, not whether a string gets
>> translated into uppercase properly or not which would be immediately obvious.
 
>Because you asked: I have seen isdigit() crashing hard on negative
 
Thats clearly a library bug. All bets are off when they exist.
 
>In case of a multi-byte UTF-8 character encoded in a std::string all its
>bytes would have a negative value if 'char' is signed on that platform.
>So there.
 
One would assume unsigned would be used internally.
Paavo Helde <eesnimi@osa.pri.ee>: Feb 23 12:30PM +0200

>>> translated into uppercase properly or not which would be immediately obvious.
 
>> Because you asked: I have seen isdigit() crashing hard on negative
 
> Thats clearly a library bug. All bets are off when they exist.
 
What makes you think so? The C standard clearly says in 7.4 (Character
handling <ctype.h>):
 
"In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the macro
EOF. If the argument has any other value, the behavior is undefined."
 
Undefined behavior may or may not involve a program crash.
 
>> bytes would have a negative value if 'char' is signed on that platform.
>> So there.
 
> One would assume unsigned would be used internally.
 
Alas, std::string is standardized to use plain char, whose signedness is
implementation dependent.
Muttley@dastardlyhq.com: Feb 23 10:34AM

On Thu, 23 Feb 2023 12:30:39 +0200
>representable as an unsigned char or shall equal the value of the macro
>EOF. If the argument has any other value, the behavior is undefined."
 
>Undefined behavior may or may not involve a program crash.
 
I would still consider a crash to be a bug. Undefined would just be returning
rubbish. I can't even figure out HOW it would crash since all its doing is
 
return (c >= '0' && c <= '9')
 
unless there's some obscure way of doing that test even faster.
"Öö Tiib" <ootiib@hot.ee>: Feb 23 02:47AM -0800

> >> translated into uppercase properly or not which would be immediately obvious.
 
> >Because you asked: I have seen isdigit() crashing hard on negative
 
> Thats clearly a library bug. All bets are off when they exist.
 
Nope, standard does matter only as specification. Read the licence
agreements of compilers you use or something. There are all the
warranties that you actually get and there the interesting part of our
work only starts. No bets are off ... in practice we may need to use
clearly and provably defective implementations for developing
financial applications that people use daily and trust blindly without
thinking. We get paid well for that.
 
About 15 years ago one of my teams helped programming particular
point-of-sale credit card terminal using gcc that produced a binary
that rebooted that terminal on case of situation that Paavo described.
POS could talk native language of card owner (that might contain none
of Latin characters) got certified by EMV (eurocard-mastercard-visa)
and I saw it in use only few years ago.
Paavo Helde <eesnimi@osa.pri.ee>: Feb 23 01:20PM +0200

>> EOF. If the argument has any other value, the behavior is undefined."
 
>> Undefined behavior may or may not involve a program crash.
 
> I would still consider a crash to be a bug.
 
Sure, but the bug would be in your code.
 
> Undefined would just be returning
> rubbish. I can't even figure out HOW it would crash since all its doing is
 
> return (c >= '0' && c <= '9')
 
Nope, because it's not known at the compile time which characters should
be considered digits. It might do something like
 
if (c==(EOF)) {
return (EOF);
} else {
lock_current_locale();
int result = get_current_locale()->isdigit_map[c];
unlock_current_locale();
return result;
}
 
where isdigit_map is a 256-element array provided by the locale.
Richard Damon <Richard@Damon-Family.org>: Feb 23 07:18AM -0500

> rubbish. I can't even figure out HOW it would crash since all its doing is
 
> return (c >= '0' && c <= '9')
 
> unless there's some obscure way of doing that test even faster.
 
The issue is that to support locales, things like isdigit might be
implemented as
 
int isdigit(int c) {
return _prop_table[c+1] & DIGIT_PROPERTY;
}
 
where _prop_table gets set to a table based on the current locale, which
might define additional characters that are digits.
David Brown <david.brown@hesbynett.no>: Feb 23 03:02PM +0100


>> Undefined behavior may or may not involve a program crash.
 
> I would still consider a crash to be a bug. Undefined would just be returning
> rubbish.
 
Undefined behaviour means there is no define behaviour - crashing is
entirely plausible. It doesn't matter what /you/ think about it. The C
standard is quite clear about this - if you pass a valid argument to
isdigit(), as specified in the standard, you'll get a valid result. If
you pass something invalid, all bets are off and whatever happens is
/your/ problem.
 
This is so fundamental to the whole concept of programming that I am
regularly surprised by people who call themselves programmers, yet fail
to comprehend it. A function has a specified input domain, and a
specified result or behaviour for inputs in that domain. Move outside
that input domain, and you are in the realm of nonsense. You don't
expect particular behaviour from 1/0 - maybe you'll get a random value,
maybe you'll get a crash. Why you think calling isdigit() with an
invalid input should have some guarantees is beyond me. "Garbage in,
garbage out" applies to behaviour, not just values, and has been
understood since Babbage designed the first programmable mechanical
computer.
 
So yes, there is a bug - it's in /your/ code if you pass an invalid
value to the function.
 
 
> I can't even figure out HOW it would crash since all its doing is
 
> return (c >= '0' && c <= '9')
 
> unless there's some obscure way of doing that test even faster.
 
There are other ways that can be faster (depending on details of
processor, cache uses, and other aspects). The traditional
implementation of the <ctype.h> classification functions involves lookup
tables, and it is quite reasonable for a negative value to lead to
things going horribly wrong.
Ben Bacarisse <ben.usenet@bsb.me.uk>: Feb 23 02:49PM

> to comprehend it. A function has a specified input domain, and a specified
> result or behaviour for inputs in that domain. Move outside that input
> domain, and you are in the realm of nonsense.
 
This is formally true, but I think we can also legitimately ask to what
extent a function's domain (and the corresponding returned values) are
reasonable and helpful.
 
> some guarantees is beyond me. "Garbage in, garbage out" applies to
> behaviour, not just values, and has been understood since Babbage designed
> the first programmable mechanical computer.
 
Those of us with a background in languages like C are not going to be
confused, but I bet almost everyone who comes to C from a more modern
language will be astonished by what you have to do to get isdigit to
work safely. To have a character testing function that does not work
for all the values of the language's char type is, well, bonkers.
 
In Haskell, a program won't even compile unless isDigit is called with
an argument of type Char, and the result is defined to be exactly one
of True or False for all values of that type.
 
And that brings up another trap for the unwary: C's isdigit returns
something that is only vaguely Boolean. For example, you can't test if
char c1, c2; are both digits or neither are digits with
 
isdigit(c1) == isdigit(c2)
 
because the value indicating "yes" is not guaranteed to be anything
other than "not zero". Instead you'd write
 
!isdigit((unsigned char)c1) == !isdigit((unsigned char)c2)
 
I think an occasional nod to how we have got used to such nonsense is
merited!
 
--
Ben.
Andrey Tarasevich <andreytarasevich@hotmail.com>: Feb 23 07:05AM -0800


> Feel free to explain how toupper could crash.
 
There's no such concept as "explaining" undefined behavior.
 
Yet, it could be very simple. The implementation treats `toupper` as an
intrinsic, and the compiler explicitly generates a "CRASH NOW!!11"
instruction for invalid arguments. Let's say that in their
implementation of `toupper` it would result in a negligible performance
penalty or no penalty at all.
 
GCC is well-known to do such things, for one example.
 
--
Best regards,
Andrey
Andrey Tarasevich <andreytarasevich@hotmail.com>: Feb 23 07:16AM -0800


>> Undefined behavior may or may not involve a program crash.
 
> I would still consider a crash to be a bug. Undefined would just be returning
> rubbish.
 
Nope. "Returning rubbish" would be an example _unspecified behavior_.
Undefined is a wholly different thing.
 
--
Best regards,
Andrey
David Brown <david.brown@hesbynett.no>: Feb 23 04:25PM +0100

On 23/02/2023 15:49, Ben Bacarisse wrote:
 
> This is formally true, but I think we can also legitimately ask to what
> extent a function's domain (and the corresponding returned values) are
> reasonable and helpful.
 
Sure. You could, for example, argue that "isdigit" would be better
designed if it were to return "false" on any int value outside of the
current valid range. But it might be less efficient if it had such an
extended domain - do you optimise for maximum efficiency for programmers
who are able to read and follow specifications and write correct code,
or do you optimise for minimal surprise for programmers who can't or
won't follow the specifications? I'd say that for C, it's the former -
let those who don't understand the importance of following
specifications use a different language more suited to their needs,
wants and skills. There is a time and a place for making functions with
maximal input domains and controlled handling of nonsensical inputs -
low-level functions like "isdigit" are not such cases.
 
> language will be astonished by what you have to do to get isdigit to
> work safely. To have a character testing function that does not work
> for all the values of the language's char type is, well, bonkers.
 
IMHO the concept of "character" in C and C++ is a mess these days. It
was perhaps inevitable, given the history of the languages, the
development of characters, and the overriding requirement for backwards
compatibility. The notion of "signed characters" and "unsigned
characters" is insane. The jumble of "wide characters" and various
varieties of UTF formats is confusing at best. There are various
character sets - source, execution, basic, extended, whatever (the terms
seem to change regularly, especially in C++). Sometimes these are the
same, sometimes different. Some of the different character types are
the same size as others, but have different interpretations. Sometimes
they have the same interpretations, but are still distinct. Some of the
standard C library functions work only on 7-bit ASCII, some will work
with UTF-8 as well. Some of them work with "int" parameters instead of
more logical "char" types, and support non-character values (like EOF)
in functions that appear to take character parameters. And some
functions treat EOF as a normal character.
 
And if you are coming to C from pretty much any other language, you'll
be shocked at the rudimentary "string" support.
 
So while I agree that it might be surprising to find that "isdigit()" is
not defined for all possible values of all character types, I think it
would be /way/ down the list.
 
This is not a criticism of C - different languages are better and worse
for different things. But if you are working in C, you have to learn C
- you can't just assume it is like whatever other languages you have
used. And if C and C++ were to try to be like other languages, such as
by specifying values for every int value in "isdigit" calls, or having
"isdigit" throw a C++ exception on bad values, you'd lose some of the
aspects that make C and C++ important and useful languages.
 
 
> !isdigit((unsigned char)c1) == !isdigit((unsigned char)c2)
 
> I think an occasional nod to how we have got used to such nonsense is
> merited!
 
Yes, absolutely.
"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Feb 23 04:34PM +0100

> rubbish. I can't even figure out HOW it would crash since all its doing is
 
> return (c >= '0' && c <= '9')
 
> unless there's some obscure way of doing that test even faster.
 
An MS runtime library example, but apparently this is old code, not for
the current version:
 
<url:
https://github.com/ojdkbuild/tools_toolchain_sdk10_1607/blob/master/Source/10.0.14393.0/ucrt/convert/isctype.cpp#L34-L39>
 
// The _chvalidator function is called by the character
classification functions
// in the debug CRT. This function tests the character argument to
ensure that
// it is not out of range. For performance reasons, this function
is not used
// in the retail CRT.
#if defined _DEBUG
 
extern "C" int __cdecl _chvalidator(int const c, int const mask)
{
_ASSERTE(c >= -1 && c <= 255);
return _chvalidator_l(nullptr, c, mask);
}
 
extern "C" int __cdecl _chvalidator_l(_locale_t const locale, int
const c, int const mask)
{
_ASSERTE(c >= -1 && c <= 255);
 
_LocaleUpdate locale_update(locale);
 
int const index = (c >= -1 && c <= 255) ? c : -1;
 
return
locale_update.GetLocaleT()->locinfo->_public._locale_pctype[index] & mask;
}