soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

is there any UB in this integer math... - 9 Updates
Turning bits into integers - 1 Update

Manfred <noname@add.invalid>: Feb 19 04:35PM +0100

On 2/19/2019 4:01 PM, Bart wrote:
> example. How is it treating the top bit? It doesn't matter. The x86 will
> set flags both for an unsigned add, and a signed one. The resulting bit
> pattern is the same in both cases.

I know, but in the case of 'signed' It Happens to Work™, under the
assumption of an underlying two's complement representation.
In the case of 'unsigned' It Just Works™, guaranteed by the standard and
with no assumptions required.
When dealing with any kind of workable logic (math, algorithms, etc.)
the possibility of dropping an assumption is an added value.

> C however (and presumably C++ inherits) says that the result is
> undefined for the second 'add' operation, even it is always well-defined
> for x86.

You know that C as a language and x86 as an architecture are two
different things.

> No signed arithmetic is going on. But if you interpret the 4294967293
> result as two's complement, with the top bit as a sign bit, then you
> will get -3 if printed.

This too is based on the assumption of two's complement.

> Similarly, the c+d in the first example gives -1294967296, but
> 3000000000 if the same bit pattern is interpreted as an unsigned value.

As a confirmation of earlier arguments, here -1294967296 as a result
falls into the category of 'rubbish', 3000000000 does not.

> The two representations are closely related, in that the corresponding
> bit patterns are interchangeable, but C refuses to acknowledge that.

The two representations are interchangeable under the assumption of
two's complement. C has made the choice of lifting this constraint.

> That might be because two's complement representation is not universal,
> but it means some C compiler ending up doing unexpected things even on
> the 99.99% of machines which do use it.

I wouldn't look at this this way. I believe the rationale is that
wrapping overflow has negligible use for signed types, while it has
clear value for unsigned ones (see earlier example).

More than the compiler doing unexpected things, it is that C requires
the programmer to pay attention to details.
It is not distracted-friendly.

Paavo Helde <myfirstname@osa.pri.ee>: Feb 19 10:31PM +0200

On 19.02.2019 21:57, Bart wrote:
> upper limit of +127. But what about (120+20)-20?

> This should end up back as 120 using two's complement, but according to
> C, it's undefined because an intermediate value overflows.

And rightly so. Try e.g. (120+20)/2 - 20, with those imaginary 8-bit
wrap-around rules the result would be -78, not the expected 50.

In C++ 8-bit ints get promoted to normal ints, so the above is not a
real example, but there are actual problems with larger numbers.
Effectively you are saying that some sloppy code is OK because it
accidentally happens to work most of the time, whereas some other very
similar sloppy code is not OK.

On the same token one could claim that writing to a location after a
dynamic array of odd number of elements is OK as the memory allocator
would leave an unused alignment gap at that place anyway.

Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 19 08:54PM

On Tue, 2019-02-19, Öö Tiib wrote:
> On Monday, 18 February 2019 20:56:42 UTC+2, David Brown wrote:
...
>> should be dropped in favour of the sanitizer, which is a more modern and
>> flexible alternative and which is actively maintained.

> Sanitizers sound like debugging options.

But that's what you want, isn't it?

> Why two almost equal features are developed into same tool?

I imagine someone at Google got funding for developing the ideas
behind ASan, did it, and got it into clang and then GCC. This guy
wasn't interested in in removing the legacy mechanism, and the GCC
people weren't interested in delaying the work by insisting that it be
harmonized with -ftrapv.

Personally I'm more annoyed by ASan not being well integrated into
GCC: the Google-style user interface with annoying colors, and the
documentation on a wiki somewhere.

Still, I've seen much worse things happen to software.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Manfred <noname@add.invalid>: Feb 19 09:55PM +0100

On 2/19/2019 8:57 PM, Bart wrote:
>> As a confirmation of earlier arguments, here -1294967296 as a result
>> falls into the category of 'rubbish', 3000000000 does not.

> So just like 1u-2u, but that one apparently is not classed as rubbish.

1u-2u is rubbish in conventional arithmetic, but it is not in wrapping
arithmetic.
Any programmer knows that the result is 0xFFFFFFFF, and this is what C
guarantees.

>> clear value for unsigned ones (see earlier example).

> After 40 years of experiencing exactly that behaviour on a fair number
> of machines and languages, it's what I'm come to expect.

I'll rephrase: C requires, like for most program behaviours, that
wrapping be intentional for the programmer to use.
Since an intentional use of wrapping implies that all bits follow the
same semantics, then the unsigned type is the correct choice.

Using wrapping arithmetic with a signed type is not correct - purely
from a binary logic point of view, because that first bit would be
arbitrarily supposed to behave like the others - thus C assumes that a
careful programmer will not use it with signed types, and leaves freedom
to compiler writers to use this feature space anyhow they like, by
specifying undefined behavior.

I would guess that since this has been clearly and intentionally
specified in the standard, at least /some/ use for this 'undefinedness'
must have been foreseen by the committee.

Manfred <noname@add.invalid>: Feb 19 05:43PM +0100

On 2/18/2019 7:56 PM, David Brown wrote:
> only work on benchmark" claims - the people writing them are widely
> dispersed with totally different kinds of employers. In particular, the
> gcc developers fall into several categories:

I believe you about the categories below, but I find it hard to believe
that benchmarks don't matter.
In fact whenever results from different compilers are presented,
performance is one of the key metrics that is discussed, which is
obvious since correctness of the generated binary is no issue for any
self-respecting compiler.

Moreover, most occurrences of "undefined behavior" are usually justified
by increased performance of compiler optimizations.

> chips, so their concern is that you (the programmer) get the best from
> the compiler and their chip. Benchmarks for the compiler don't matter -
> support for the chip matters.

But they care that the chip is properly supported by the compiler, and
"properly" includes performance.

> provide tools and services to developers. They want programmers to be
> happy with the tools - they don't care if you use a different compiler
> instead.

You may have a point in this category. I would say that more than making
/programmers/ happy, they want to make software project /mangers/ happy,
which brings into the picture features like time-to-market, man-hours
cost etc., all goals that come prior to performance of the final product.
This is IMHO one reason for which in many benchmarks MSVC comes usually
after clang and gcc (in this order)

> 3. Those working for big users, like Google and Facebook. They don't
> care about benchmarks - they care about performance on their own software.

True, but they know that in order to get performance out of their
application software, they need performing compilers.
I believe clang developers know this very well.

> 4. The independent and volunteer developers. They care about the
> quality of their code, and making something worthwhile - they don't care
> about benchmark performances.

Not really. Performance may not be their first goal (compared to e.g.
being the first to support the latest technology for the open source
community), but sure it is one of the goals, especially after first
introducing such new technologies.

David Brown <david.brown@hesbynett.no>: Feb 19 10:34PM +0100

On 19/02/2019 17:43, Manfred wrote:
>> the gcc developers fall into several categories:

> I believe you about the categories below, but I find it hard to believe
> that benchmarks don't matter.

I am not saying that benchmarks don't matter at all - but they are of
less consequence than other factors. Also, the current trend appears to
be moving away from synthetic benchmarks to using real applications.
Look at benchmarks on www.phoronix.com for examples.

> performance is one of the key metrics that is discussed, which is
> obvious since correctness of the generated binary is no issue for any
> self-respecting compiler.

Yes, speed is important - and it can be an easy number to show for
comparisons. But I really don't think it is the biggest factor in
deciding which compiler to use (or which compiler to buy) - people look
at how it supports the target they want, how it supports the languages
and language versions they want, what other tools are available, ease of
use, static error checking, extensions, documentation, support,
compatibility, speed of compilation. Compiler writers - and compiler
sellers - know this.

They also know that a compiler that generates unexpected nonsense from
code that works fine on other compilers will not be popular. Sometimes
they do optimise from undefined behaviour anyway, either because the
gains are significant enough, or because they simply don't think much
code will be adversely affected.

> Moreover, most occurrences of "undefined behavior" are usually justified
> by increased performance of compiler optimizations.

That is one point, yes. Another is that it can help find mistakes. A
compiler can (with static checks or run-time checks) warn about signed
integer overflow, because it is not allowed in C. In a language that
allows it with wrapping, the tools have to accept it and can't help you
find the problem - despite the fact that it is almost certainly a bug in
the code.

And for many types of undefined behaviour, there simply is no sensible
choice of what would be an expected and appropriate defined behaviour.
That is the case for signed integer overflow - there is /no/ sensible
alternative. The current situation in C makes "int" far more like real
mathematical integers that would be possible with any given choice of
behaviour - for example, (x + 1 > x) is always true in C and in
mathematics, but is not true in, say, Java.

Arguably there are some things that are undefined behaviour in C that
could have been fully defined or implementation defined, such as shift
operator behaviours. Some compilers /do/ define such behaviour fully.

Other kinds of undefined behaviour, such as out-of-bounds array access,
cannot possibly be anything other than undefined behaviour. There is
nothing a compiler could do here without adding very significant
overhead to the code. If you don't like a language that has undefined
behaviour, then C and C++ are not for you.

>> - support for the chip matters.

> But they care that the chip is properly supported by the compiler, and
> "properly" includes performance.

Yes. But they are interested in performance on real code, not
benchmarks - and they are only interested if the code compiles to give
the required results.

> cost etc., all goals that come prior to performance of the final product.
> This is IMHO one reason for which in many benchmarks MSVC comes usually
> after clang and gcc (in this order)

A good project manager is only happy when his/her programmers are happy.
I am not claiming that all managers are good, of course, and sometimes
their job may involve persuading the programmers to change what they are
happy with. But whoever decides, benchmark speeds should not be the
deciding factor (though speed on the developers' real code may be).

> True, but they know that in order to get performance out of their
> application software, they need performing compilers.
> I believe clang developers know this very well.

Yes. But again, it is performance on real code that is important, not
artificial benchmarks.

> being the first to support the latest technology for the open source
> community), but sure it is one of the goals, especially after first
> introducing such new technologies.

Again, synthetic benchmark performance is not the goal.

David Brown <david.brown@hesbynett.no>: Feb 19 10:35PM +0100

On 19/02/2019 16:11, Bart wrote:
>> compilers - do you think they should work like Pascal or Fortan
>> compilers? It is a different language.

> No, I meant at odds even with other C compilers.

Undefined behaviour means there is no definition of what the code should
do. Different compilers can implement it in different ways and give
different results. That does not mean that one compiler is "right" and
the other "wrong" - the source code is wrong, and both compilers are right.

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 19 02:30PM -0800

> Your reddit critic quoted text, presumable from your code that he was
> commenting about, that doesn't match any of the code you've posted on
> this thread.

Huh? Wow. I posted the same ct_rwmutex here that is in the reddit post:

https://pastebin.com/raw/xCBHY9qd

The ct_rwmutex is the exact same one I posted here, in this thread:

https://groups.google.com/d/msg/comp.lang.c++/q4dZJFQxpdg/hzHBsE1YBQAJ

The original post in this thread was about boiling down the math to
address the question over on Reddit. There is no UB, there is no integer
overflow or underflow as long as the number of readers never exceeds
LONG_MAX.

> The topic of this thread being the validity of his comments,
> I've no particular interest in diving deeply into code that isn't the
> code he was commenting about.

It is the exact same algorithm I posted here, in this very thread.

https://groups.google.com/d/msg/comp.lang.c++/q4dZJFQxpdg/hzHBsE1YBQAJ

And on here in the reddit thread:

https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/

Where is it different? I just boiled down the code into a simple form
that shows how m_count will never be less than -LONG_MAX, or never
greater than LONG_MAX, and how it can take any value in between. There
is no UB. Also, wrt the lock() function, count will never be negative.

> I'm certainly not interested in your code
> for it's own sake. I just scanned your code, made a few comments, and
> cut everything not directly relevant to the comments I made.

But you cut a very important moment in the lock() function. There can
only be a single thread that takes writer access to my rwmutex. You
cannot just cut something out in a sensitive synchronization algorithm
and pretend it never happened!

> Note - another indication that he was commenting on different code than
> I was looking at is his comment that m_count is never decremented. The
> code you presented does decrement m_count.

He seems to be a troll, and cuts out important context, and asks
questions as if that cut context never existed. Wow. He must be doing it
on purpose. He said m_count never decrements because he does know how to
comprehend the algorithm, or doing it on purpose.

[...]

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 19 03:00PM -0800

On 2/19/2019 6:14 AM, Ralf Goertz wrote:
>> from 0xFFFFFFFF to 0x00000000 can make sense.

> Does that really matter? If I have an unsigned short set to its maximum
> possible value (32767 here) then add 1 I get -32768.

Fwiw, if I allowed my ct_rwmutex algorithm to break the bounds of a long
wrt -LONG_MAX and LONG_MAX, it would not work at all. LONG_MAX + 1, or
-LONG_MAX - 1, would ruin everything! UB aside for a moment... It would
screw up the counting, and things would start to deadlock all over the
place.

So in my algorithm, there is no overflow or underflow, by design. It is
a key aspect that makes the algorithm correct. This boils down to the
reason why my algorithm cannot deal with _more_ than LONG_MAX readers
hitting it at the same time.

Turning bits into integers

jameskuyper@alumni.caltech.edu: Feb 19 12:27PM -0800

> std::vector<unsigned int>::const_iterator operator()(unsigned int i, unsigned int j) const {
> assert(j > i);

> if ((j-i) % 4 != 0 || (j-i) / 4 != l || (i-2) % 4 != 0)

(j-1)%4 != 0 || (j-1)/4 != 1 is equivalent to
!((j-1)%4 == 0 && (j-1)/4 == 1) (de Morgan's laws).

The meaning of the % operator is defined by the following requirement,
if and only if a/b is representable in the result type (7.6.5p4):

(a/b)*b + (a%b) == a

Therefore, (j-1)%4 == 0 && (j-1)/4 == 1 implies that

1*4 + 0 = j-i

Therefore you can simplify that condition to

if(j != i + 4 || (i-2)%4 == 0)

Does that re-write correctly express the condition you intended to
express? If not, then (assuming I carried out the above simplifications
correctly), then neither does your original.
123456789012345678901234567890123456789012345678901234567890123456789012

Note also that for positive i, (i-2)%4==0 is equivalent to i%4 == 2 (for
negative values of i, it's equivalent to i%4 == -2). If you know that i
is always positive, that could also be used to simplify your code.

> return k.end();

> return std::find(k.begin(), k.end(), (i-2) / 4) - k.begin();

std::find() returns an iterator of the same type as passed into it.
Subtracting k.begin() from that iterator gives a value of the type
std::vector<unsigned int>::difference_type, which must be a signed
integer type.

> }
> };

> Here is the revised code. Basically, what I want is to output the index of a value inside the vector k, if it exists. The presence of this value is determined by the if statement. I'm not sure if the type of const_iterator is correct, but I think the rest is correct.

Your function should be returning
std::vector<unsigned int>::difference_type. Such a type will not, in
general, be implicitly convertible to an iterator type such as the one
you actually specified.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Tuesday, February 19, 2019

Digest for comp.lang.c++@googlegroups.com - 10 updates in 2 topics

No comments:

Blog Archive

About Me