Wednesday, November 6, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

"Öö Tiib" <ootiib@hot.ee>: Nov 05 03:49PM -0800

On Tuesday, 5 November 2019 23:48:46 UTC+2, Manfred wrote:
> programming error: even if this is true I think that unsigned overflow
> should have defined behavior (and wrap) rather than being handled as an
> error by the compiler.
 
I just listed facts with what lot of people agree like
"on majority of cases overflow (even unsigned) is programming error"
I did not say what to conclude from these facts here.
 
 
> > Physically damaged, disconnected or short-circuited temperature sensor
> > can no way repair or reconnect itself.
> Undoubtedly, but that's not what I wrote.
 
Ok.
 
> My point is that rather than using NaNs the hardware or driver should
> raise specific error signals (like some error code on the control I/O
> port, or at the API level) instead.
 
Device has to operate on incomplete data. And saturating silent NaN
works perfectly as such missing part of data. Driver that is panicking
throwing up and signaling too lot has to be killed to reduce disturbance.
Panic solves nothing regardless if you are Schwarzenegger or not. ;)
Bonita Montero <Bonita.Montero@gmail.com>: Nov 06 07:08AM +0100

>> And That's not how computers work.
 
> That is utterly irrelevant.
 
You can rely on that p0907r0 will be included in an upcoming standard
and all implementations will have std::numeric_limits<signed...>::
is_modulo to be set to true; so g++ must drop thé shown optimization.
There are so many language-properties that represent how a CPU logi-
cally works, why not this property?
David Brown <david.brown@hesbynett.no>: Nov 06 09:30AM +0100

On 06/11/2019 07:08, Bonita Montero wrote:
 
>> That is utterly irrelevant.
 
> You can rely on that p0907r0 will be included in an upcoming standard
> and all implementations will have std::numeric_limits<signed...>::
 
Have you actually /read/ the paper, and its subsequent revisions (we are
now on p0907r4) ?
 
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html>
 
Signed integer overflow remains undefined behaviour. This is what the
majority of the committee, the majority of compiler vendors, and the
majority of users want.
 
> is_modulo to be set to true; so g++ must drop thé shown optimization.
 
"is_modulo" can be (but doesn't need to be) set to true if the
implementation gives signed integer arithmetic wrapping semantics.
 
/If/ an implementation has is_modulo set true for signed types, then you
are correct that it can't do the kind of optimisations I showed (or many
other optimisations). gcc, clang and MSVC currently have is_modulo
false for signed integer types, and do not guarantee wrapping behaviour.
This is fine, and the way it should be. (gcc and clang leave it false
even under "-fwrapv", which is also fine.)
 
> There are so many language-properties that represent how a CPU logi-
> cally works, why not this property?
 
C and C++ are high level languages, abstracted from the underlying cpu.
 
And it has already been explained to you why undefined signed integer
overflow is a good idea.
Bonita Montero <Bonita.Montero@gmail.com>: Nov 06 09:57AM +0100

>> There are so many language-properties that represent how a CPU logi-
>> cally works, why not this property?
 
> C and C++ are high level languages, abstracted from the underlying cpu.
 
C isn't high-level and C++ is high-level as well as low-level.
And the issue we're ralking about is low-level.
Manfred <noname@add.invalid>: Nov 06 01:33PM +0100

On 11/5/2019 8:45 PM, Paavo Helde wrote:
 
>> With the same reasoning you could say that unsigneds might never
>> wrap; but in fact they're specified to wrap.
 
> In retrospect, this (wrapping unsigneds) looks like a major design mistake.
 
No, it isn't.
 
 
> IMO, wrapping integers (signed or unsigned) are an example of
> "optimization which nobody asked for", and they are there basically only
> because the hardware happened to support such operations.
 
Look at the following code and see for yourself how efficient it is to
check for integer overflow if unsigned integers do wrap.
Achieving the same would be much more verbose (and less efficient) if
unsigned overflow were not defined behavior.
 
(taken from http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1969.htm)
 
char* make_pathname (const char *dir, const char *fname, const char *ext)
{
size_t dirlen = strlen (dir);
size_t filelen = strlen (fname);
size_t extlen = strlen (ext);
 
size_t pathlen = dirlen;
 
// detect and handle integer wrapping
if ( (pathlen += filelen) < filelen
|| (pathlen += extlen) < extlen
|| (pathlen += 3) < 3)
return 0;
 
char *p, *path = malloc (pathlen);
if (!path)
return 0;
 
p = memcpy (path, dir, dirlen);
p [dirlen] = '/';
 
p = memcpy (p + dirlen + 1, fname, filelen);
p [filelen] = '.';
 
memcpy (p + filelen + 1, ext, extlen + 1);
 
return path;
}
Bonita Montero <Bonita.Montero@gmail.com>: Nov 06 01:51PM +0100

>           || (pathlen += extlen) < extlen
>           || (pathlen += 3) < 3)
>           return 0;
 
Sorry, but when are paths longer than size_t?
"Öö Tiib" <ootiib@hot.ee>: Nov 06 05:26AM -0800

On Wednesday, 6 November 2019 00:00:16 UTC+2, David Brown wrote:
 
> Agreed (where "trap" could mean any kind of notification, exception,
> error log, etc.). But this is something you might only want during
> debugging - it is of significant efficiency cost.
 
Indeed, majority of programming errors should be found during
debugging.
 
 
> I like that in debugging or finding problems - with tools like
> sanitizers. But I would not want that in normal code. With this kind
> of semantics, the compiler can't even simplify "x + 1 - 1" to "x".
 
It may be can or may be can not that depends on wording.
I have not really thought it thru how to word the semantics
precisely. The major purpose is to get errors when program
is actually storing value into type where it does not fit (IOW
really overflows).
 
Analogical argument is that automatic storage overflow may
not be trapped in principle since that would disallow optimizing
recursions (that exhaust stack) into loops (that don't exhaust
stack). The rules can be still likely worded in a way that
implementation is allowed not to trap when it manages to get
the job done without exhausting automatic storage somehow.
 
> turn it into the most efficient results. I intentionally use an
> optimising compiler for C and C++ programming - when efficiency doesn't
> matter, I'll program in Python where integers grow to avoid overflow.
 
I do almost same but I think of some of it slightly differently.
Exact formula in programming is unfortunately more important
than its clarity and intuitivity for reader.
 
For example we need to calculate average of two values of A and B.
Mathematically there are lot of ways to calculate it and what is
most intuitive may depend on meaning of A and B. Like:
1) (A + B) / 2
2) A / 2 + B / 2
3) A + (B - A) / 2
4) B + (A - B) / 2
etc.
But in software these can be very different expressions because
these have different potential overflows and/or losses of
accuracy. Until something helps to reduce that issue it is
all about exactly that formula and period.
 
As of efficiency It is anyway often uncertain until it is
shown where the bottlenecks are. Also it can be often only
shown by profiling products with realistic worst case loads
of data and then it is usually small subset of code that can
change overall efficiency.
 
Python I use less not because of its bad performance but
because I have failed to use it scalably. Lot of of lIttle script
programs is great, but when any of those starts to grow bigger
then my productivity with those drops. For C++ same feels like
nonsensically unimportant size. Somehow in C++ I have
learned to separate different concerns and to abstract details
away but not in Python.
 
> the operations that have the behaviour, not the types. However, I can't
> see a convenient way to specify overflow behaviour on operations - using
> types is the best balance between flexibility and legible code.
 
I mean totally new "advanced" operators like (A +% B) or (C +^ D).
Yes there will be precedence (and may be associativity etc.)
to define but it is business as usual and not some show-stopper
issue. In some languages (like Swift) it is done and it seems to
work fine.
David Brown <david.brown@hesbynett.no>: Nov 06 02:37PM +0100

On 06/11/2019 13:33, Manfred wrote:
>           || (pathlen += extlen) < extlen
>           || (pathlen += 3) < 3)
>           return 0;
 
That is just silly, in all sorts of ways.
 
First, decide if the function is an "internal" function where you can
trust the parameters, and have undefined behaviour if assumptions don't
hold, or an "external" function where you have to check the validity of
the parameters.
 
If it is internal, you know the lengths of the passed strings will not
sum to more than 4G - or you don't care if someone does something
ridiculous. (And on most modern systems, size_t is 64-bit - overflowing
here would require 16 EB ram for storing the strings.)
 
If it is external, the checking is too little - if you have char*
pointers from an unknown source, you should be wary about running
strlen() on them because you don't know if it will actually end with a 0
in a reasonable limit.
 
 
You only need to check for overflow if it is possible for the
calculations to overflow. If the operands are too small to cause an
overflow, there will not be an overflow.
 
And until you are talking about large integers for cryptography or that
sort of thing, adding up realistic numbers will not overflow a 64-bit type.
 
So /if/ you have an old 32-bit size_t system, and /if/ you have
maliciously crafted parameters that point to huge strings (and you'll
have to make them point within the same string - you don't get over 4 GB
user memory address space with 32-bit size_t), then you can do your
adding up using 64-bit types and you get zero risk of overflow.
 
 
uint_least64_t dirlen = strlen (dir);
uint_least64_t filelen = strlen (fname);
uint_least64_t extlen = strlen (ext);
 
uint_least64_t pathlen = dirlen + filelen + extlen;
 
if (big_size_t > MAX_SANE_PATHLENGTH) return 0;
 
 
There are times when unsigned wrapping overflow is useful. This is not
one of them.
Manfred <noname@add.invalid>: Nov 06 03:38PM +0100

On 11/6/2019 2:37 PM, David Brown wrote:
>>           || (pathlen += 3) < 3)
>>           return 0;
 
> That is just silly, in all sorts of ways.
 
You realize that this comes from the glibc maintainers, don't you?
You can say they wrote silly code for this example (I don't), but I
doubt there are many more knowledgeable people about this kind of matter
than them.
 
Moreover, I took this as an example of detection of integer overflow.
The fact that it happens to be about pathname strings is irrelevant to
this discussion.
 
> trust the parameters, and have undefined behaviour if assumptions don't
> hold, or an "external" function where you have to check the validity of
> the parameters.
 
This example was written about code safety, so yes, I believe it is
pretty clear the assumption is that strings come from an external source.
Obviously this applies to string /contents/; the pointer themselves can
only be internal to the program (can't they?), so no need to check for
null pointer.
On the other hand, contents of the string is checked by ensuring that
the result of strlen and their combination is valid. This is ensured
/exactly/ by making use of unsigned wrapping behavior.
 
> pointers from an unknown source, you should be wary about running
> strlen() on them because you don't know if it will actually end with a 0
> in a reasonable limit.
 
This code handles C strings, so there is no way to check for their
length other than running strlen.
The fact that you seem to miss is that it is exactly thanks to the check
that you call "silly" that it is ensured that they "actually end with a
0 in a reasonable limit".
 
We could argue about what happens with /read/ access to a
non-0-terminated string, but I would simply assume that the strings are
0 terminated, since the function is going to be called by some other
part of the program that can take care that there is a 0 at the end of
the buffer. What is not guaranteed is that the strings actually contain
pathnames, and don't contain very long malicious text instead (e.g. they
could come from stdin).
That risk is avoided by the code you call silly.
So, no there is not too little checking.
 
> overflow, there will not be an overflow.
 
> And until you are talking about large integers for cryptography or that
> sort of thing, adding up realistic numbers will not overflow a 64-bit type.
 
In fact cryptography is another example where unsigned wrap is useful,
but it would be much more complex (and off topic) to draw an example of
this (not that I claim to be an expert in this area)
 
And no, just assuming that "adding up realistic numbers will not
overflow a 64-bit type" is not what safe code is about.
 
> uint_least64_t extlen = strlen (ext);
 
> uint_least64_t pathlen = dirlen + filelen + extlen;
 
> if (big_size_t > MAX_SANE_PATHLENGTH) return 0;
 
You realize that this code is less efficient than the original one,
don't you?
And what would be the correct value for MAX_SANE_PATHLENGTH? Are you
aware of the trouble that has been caused by Windows MAX_PATH?
 
The example I posted achieves the same level of safety, using less
resources, and allowing for the maximum string length that the system
can /safely/ handle (don't miss the check after malloc). What more do
you want?
 
 
> There are times when unsigned wrapping overflow is useful. This is not
> one of them.
 
I suggest you read the code again (and its source - it is instructive)
Bonita Montero <Bonita.Montero@gmail.com>: Nov 06 03:53PM +0100

> You can say they wrote silly code for this example (I don't), but I
> doubt there are many more knowledgeable people about this kind of
> matter than them.
 
Yes, this is useless code.
James Kuyper <jameskuyper@alumni.caltech.edu>: Nov 06 10:06AM -0500

On 11/6/19 9:38 AM, Manfred wrote:
...
> And no, just assuming that "adding up realistic numbers will not
> overflow a 64-bit type" is not what safe code is about.
 
Assuming it: no. Verifying it: yes. If you validate your inputs, you can
often place upper and lower limits on the value of an expression
calculated from those inputs. If those limits fall within the range that
is guaranteed to be representable in the expression's type, it is
perfectly legitimate to not bothering to include an overflow check.
Paavo Helde <myfirstname@osa.pri.ee>: Nov 06 05:54PM +0200

On 6.11.2019 14:33, Manfred wrote:
 
> memcpy (p + filelen + 1, ext, extlen + 1);
 
> return path;
> }
 
Seriously?
 
std::string make_pathname(const std::string& dir,
const std::string& fname, const std::string& ext)
{
return dir + "/" + fname + "." + ext;
}
 
No need to check for any overflows.
 
Not to speak about that there cannot be overflow in the first place
because if pathlen overflows the three strings dir, fname and ext would
not fit in the process memory anyway.
 
Not to speak about that the time lost for a more explicit check for
overflow would be zero or unmeasurable, compared to any file access
itself, or even when compared to the malloc() call in the same function.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 06 05:00PM +0100

On 06.11.2019 15:38, Manfred wrote:
> You can say they wrote silly code for this example (I don't), but I
> doubt there are many more knowledgeable people about this kind of matter
> than them.
 
David has a point that with 32-bit `size_t` there's no way to have
separate strings whose lengths sum to >= 4G.
 
So some of the arguments have to point within the same superlong string
in order for the checking to end up at `return 0;`.
 
Whether it's silly to try to give well-defined behavior also for such an
unlikely case: maybe silly when one just codes up something for limited
use and with limited time, but probably not silly when one's crafting
widely used library code.
 
I.e. the context, what it's made for, "glibc", is important.
 
However I think the appeal to authority, "glibc /maintainers/", is a
fallacious argument.
 
 
- Alf
Manfred <noname@add.invalid>: Nov 06 05:47PM +0100

On 11/6/2019 5:00 PM, Alf P. Steinbach wrote:
>> matter than them.
 
> David has a point that with 32-bit `size_t` there's no way to have
> separate strings whose lengths sum to >= 4G.
 
I should check the details (if I had the time and will to do it) but
even if this is true for the physical memory address space, if I
remember correctly the 386 has way larger virtual memory addressing
space: it does have segmenting capability, even if most OSs never used it.
I don't remember if it is possible for the 386 to address more than 4G
within a single process, though. Theoretically it is nonetheless
possible, using segments, to have a 32-bit architecture wherein the
lengths sum up to more than 4G.
 
More practically, the example was about code safety, and so the
possibility of malicious usage has to be assumed, hence the need for the
check (at least for the +3 part).
 
 
> I.e. the context, what it's made for, "glibc", is important.
 
> However I think the appeal to authority, "glibc /maintainers/", is a
> fallacious argument.
 
It would be if it was only an appeal to authority.
After giving context (and yes, pointing out that this example was not
just rubbish taken from some dump in the internet), in the followup my
argument has gone into the subject of the matter.
 
Bonita Montero <Bonita.Montero@gmail.com>: Nov 06 05:54PM +0100

> I remember correctly the 386 has way larger virtual memory addressing
> space: it does have segmenting capability, even if most OSs never used.
> it.
 
And there is an operating-system using the glibc in a segmented
environment?
 
> 4G within a single process, though. Theoretically it is nonetheless
> possible, using segments, to have a 32-bit architecture wherein the
> lengths sum up to more than 4G.
 
I think it would be rather stupid to continue the segmented behaviour
of the 286 protected mode with the 386 protected mode, although it is
hypothetically possible. Also because the 32-bit-machnise almost never
had more memory than 4GB.
Manfred <noname@add.invalid>: Nov 06 06:15PM +0100

On 11/6/2019 4:54 PM, Paavo Helde wrote:
>     return dir + "/" + fname + "." + ext;
> }
 
> No need to check for any overflows.
 
How do you think that overflow check is done inside std::string?
 
 
> Not to speak about that the time lost for a more explicit check for
> overflow would be zero or unmeasurable, compared to any file access
> itself, or even when compared to the malloc() call in the same function.
 
What do you mean with "more explicit check for overflow"?
 
Assuming you know the variables are 32-bit unsigned, I suppose you could do
if (pathlen < 0xFFFFFFFF-filelen)
{
pathlen += filelen;
}
else
{
return 0;
}
 
and then repeat, but honestly I don't see the benefit of it compared to
the above (as a first you are introducing a dependency on the specific
integer size).
Or you can cast to a wider type, but then you are not solving the
problem, you are only moving it forward, and still I wouldn't see the
benefit.
 
Besides, this is about /integer/ overflow check, so it could apply to
more computations other than memory size.
Bonita Montero <Bonita.Montero@gmail.com>: Nov 06 06:33PM +0100

> What do you mean with "more explicit check for overflow"?
 
concatenating strings in C++ with the + opereator is reliable.
"Öö Tiib" <ootiib@hot.ee>: Nov 06 10:55AM -0800

On Wednesday, 6 November 2019 19:16:08 UTC+2, Manfred wrote:
> > }
 
> > No need to check for any overflows.
 
> How do you think that overflow check is done inside std::string?
 
All standard library writers are rather good programmers.
Obviously they have something that is easy to read from afar that it does
no way overflow. Likely it is some short inline member to call
when size is supposed to grow that does the check:
 
if (max_size() - size() < size_to_add)
throw std::length_error(text_to_throw);
 
Why don't you look into any of implementations in your computer?
Bo Persson <bo@bo-persson.se>: Nov 06 08:00PM +0100

On 2019-11-06 at 17:54, Bonita Montero wrote:
> of the 286 protected mode with the 386 protected mode, although it is
> hypothetically possible. Also because the 32-bit-machnise almost never
> had more memory than 4GB.
 
The original problem wasn't only about memory.
 
The designers of Windows NT *did* briefly consider adding support for
more than one 4GB segment in a program.
 
However, to load a new segment you first have to swap the old 4GB
segment out to disk. And they couldn't see PC hard disks ever becoming
that large. :-)
 
 
Bo Persson
Bonita Montero <Bonita.Montero@gmail.com>: Nov 06 08:13PM +0100

> The designers of Windows NT *did* briefly consider adding support for
> more than one 4GB segment in a program.
 
The actually have it today for a small in which the
thread-information-block resides:
https://en.wikipedia.org/wiki/Win32_Thread_Information_Block
 
> However, to load a new segment you first have to swap the old 4GB
> segment out to disk.
 
That's not necessarily true.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Nov 06 12:03AM -0800

On 10/24/2019 9:02 PM, Ian Collins wrote:
>> wait() family of functions to detect when a child dies.
 
> Or simply use the platform's service management framework rather than
> reinventing it!
 
That works. There is usually a basic start/resume/pause/shutdown
protocol wrt services. I am just fond of creating a little webserver for
each main service.
queequeg@trust.no1 (Queequeg): Nov 06 01:16PM

>>will the process be stopped before raise() returns? My test shows that
>>yes, but I don't know if it's guaranteed or only a coincidence.
 
> Yes, it is guaranteed.
 
Ok, thanks.
 
--
https://www.youtube.com/watch?v=9lSzL1DqQn0
Bonita Montero <Bonita.Montero@gmail.com>: Nov 06 11:43AM +0100

I just found that alloca() with MSVC isn't that fast that it could be.
alloca() calls a function called __chkstk which touches the pages down
the stack to tigger Windows' overcommitting of stacks, That's while
Windows is only able to allocate new pages to a stack when the pages
are touched down the stack, i.e. you'll get a exception if you skip
a page.
So I came to the conclusion to write a little class that has a static
internal buffer with two template-parameters: first the type of the
internal static array and second the size of the array. The construc-
tor takes a parameter which will be the final size of the container;
if it is larger than the second template-parameter, an external array
will be allocated via new T[N].
The allocation will have more overhead than an alloca(), but my idea
is that if there are a larger number of entries the processing time
on the entires will outweigh the allocation.
I'm asking myself if there's a class in boost or another well-known
classlib that implements the same pattern.
 
So here's the code:
 
#pragma once
#include <cstddef>
#include <utility>
#include <stdexcept>
#include <algorithm>
 
template<typename T, std::size_t N>
struct overflow_array
{
overflow_array( std::size_t n );
~overflow_array();
T &operator []( std::size_t i );
T &front();
T &back();
T *data();
T *begin();
T *end();
T const *cbegin() const;
T const *cend() const;
void resize( std::size_t n );
 
private:
T m_array[N];
T *m_external;
T *m_begin,
*m_end;
};
 
template<typename T, std::size_t N>
inline
overflow_array<T, N>::overflow_array( std::size_t n )
{
if( N <= n )
{
m_external = nullptr;
m_begin = m_array;
return;
}
m_external = new T[n];
m_begin = m_external;
m_end = m_external + n;
}
 
template<typename T, std::size_t N>
inline
overflow_array<T, N>::~overflow_array()
{
if( m_external )
delete []m_external;
}
 
template<typename T, std::size_t N>
inline
T &overflow_array<T, N>::operator []( std::size_t i )
{
return m_begin[i];
}
 
template<typename T, std::size_t N>
inline
T &overflow_array<T, N>::front()
{
return *m_begin;
}
 
template<typename T, std::size_t N>
inline
T &overflow_array<T, N>::back()
{
return m_end[-1];
}
 
template<typename T, std::size_t N>
inline
T *overflow_array<T, N>::data()
{
return m_begin;
}
 
template<typename T, std::size_t N>
inline
T *overflow_array<T, N>::begin()
{
return m_begin;
}
 
template<typename T, std::size_t N>
inline
T *overflow_array<T, N>::end()
{
return m_end;
}
 
template<typename T, std::size_t N>
inline
T const *overflow_array<T, N>::cbegin() const
{
return m_begin;
}
 
template<typename T, std::size_t N>
inline
T const *overflow_array<T, N>::cend() const
{
return m_end;
}
 
template<typename T, std::size_t N>
inline
void overflow_array<T, N>::resize( std::size_t n )
{
if( n <= N )
return;
T *newExternal = new T[n];
copy( m_begin, m_end, newExternal );
delete []m_external;
m_external = newExternal;
m_begin = newExternal;
m_end = newExternal + n;
}
Bonita Montero <Bonita.Montero@gmail.com>: Nov 06 11:46AM +0100

>     {
>         m_external = nullptr;
>         m_begin = m_array;
m_end = m_array + n;
Paavo Helde <myfirstname@osa.pri.ee>: Nov 06 01:18PM +0200

On 6.11.2019 12:43, Bonita Montero wrote:
> on the entires will outweigh the allocation.
> I'm asking myself if there's a class in boost or another well-known
> classlib that implements the same pattern.
 
Looks like boost::container::small_vector:
"https://www.boost.org/doc/libs/1_71_0/doc/html/boost/container/small_vector.html"
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: