Tuesday, February 19, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

rgladkik@sfu.ca: Feb 19 10:06AM -0800

Sorry for the late reply and thanks to those who responded. I agree that what I had programmed made no sense. I'm a beginner.
 
struct NonlocalBonds {
std::vector<unsigned int> k;
unsigned int l;
 
std::vector<unsigned int>::const_iterator operator()(unsigned int i, unsigned int j) const {
assert(j > i);
 
if ((j-i) % 4 != 0 || (j-i) / 4 != l || (i-2) % 4 != 0)
return k.end();
 
return std::find(k.begin(), k.end(), (i-2) / 4) - k.begin();
}
};
 
Here is the revised code. Basically, what I want is to output the index of a value inside the vector k, if it exists. The presence of this value is determined by the if statement. I'm not sure if the type of const_iterator is correct, but I think the rest is correct.
rgladkik@sfu.ca: Feb 19 10:13AM -0800

I don't know if this is the best method to accomplish what I want to do.
Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 19 08:22PM

> index of a value inside the vector k, if it exists. The presence of
> this value is determined by the if statement. I'm not sure if the
> type of const_iterator is correct, but I think the rest is correct.
 
I still don't understand your goal, but with the above I can at least
make partial suggestions. Does this capture what you mean?
 
std::vector<unsigned>::const_iterator
operator() (unsigned i, unsigned j) const {
unsigned val = f(i, j);
return std::find(k.begin(), k.end(), val);
}
 
I broke out the calculation of the value to a separate helper function
f() so it doesn't distract.
 
You seem to be a bit undecided about if you want to return an iterator
into k, or an index in k. Sometimes one works best, sometimes the
other -- it depends on what the calling code wants. If you return an
index, you have to decide what to return if you don't find the value.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 03:40PM -0800

> happens if the following code is executed at a time when that is the
> case?
 
>> if (m_count.fetch_add(1, std::memory_order_release) < 0)
 
That would be unlocking a reader. Without contention from a writer this is:
______________________
m_count = LONG_MAX
 
// reader
 
lock_shared():
fetch_add(m_count, -1);
m_count = LONG_MAX - 1
 
 
// read access
 
 
unlock_shared():
fetch_add(m_count, 1);
m_count = LONG_MAX
______________________
 
 
You snipped way to much of my code.
 
 
 
 
> You said that m_count can have any value in the range -LONG_MAX to
> LONG_MAX, inclusive. Under what circumstances would m_count be negative?
 
When there is a writer.
 
 
> What happens if the following code is executed at a time when that is
> the case?
 
>> long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
 
This gains writer access and can only be executed by a single thread.
You just snipped the part where we gain a writer lock. Why did you snip
so much context? Only a single thread can decrement m_count by LONG_MAX.
Look:
_________________
// WRITE, more hefty
void lock()
{
m_wrlock.lock();
 
long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
std::memory_order_acquire);
 
if (count < RWMUTEX_COUNT_MAX)
{
long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count,
std::memory_order_acquire);
 
if (rdwake + RWMUTEX_COUNT_MAX - count)
{
m_wrwset.dec();
}
}
}
 
// write unlock
void unlock()
{
long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
std::memory_order_release);
 
if (count < 0)
{
m_rdwset.add(-count);
}
 
m_wrlock.unlock();
}
_________________
 
lock()
// write
unlock()
 
 
 
 
> You said that m_count can have any value in the range -LONG_MAX to
> LONG_MAX, inclusive. Under what circumstances would m_count be positive?
 
m_count is initialized to LONG_MAX. It means no activity.
 
 
> What happens if the following code is executed at a time when that is
> the case?
 
>> long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
 
This can only be executed when we unlock write access, this can only
occur when the calling thread has previously locked for write access.
 
Remember:
 
lock()
// write
unlock()
 
 
 
> actually come up. However, your response to the person who asked the
> very same question on Reddit (in less detail) just denigrated his
> understanding without bother to explain what those reasons are.
 
You should take a deeper look.
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 18 03:22PM -0500

On 2/17/19 23:23, Chris M. Thomasson wrote:
> There is a sort of funny discussion going on over on reddit about my
> read/write mutex. Some people think is has UB wrt an integer overflow or
> underflow. I boiled it down to the following simple program:
...
> Is there any UB in there? Some on reddit seem to think so. They are most
> likely trolling me. Little shi%'s.
 
 
I see no undefined behavior in the code you've shown. In general, I'm
less reliable when I say there's no problem than I am when saying that
there is a problem - but I've got David and Ben backing me up on this.
Are you sure the code you've give us is the same code they're referring to?
 
It would help a great deal if you could explain the arguments they've
given for thinking it's undefined. However, it's generally very
difficult to accurately summarize an argument that hasn't convinced you,
even if it's a valid argument that should have convinced you - your own
biases get in the way of accurately summarizing it.
It would be better yet if you could give us a link so we can review the
discussion and determine the validity of their arguments ourselves.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:12PM -0800

On 2/18/2019 12:22 PM, James Kuyper wrote:
> biases get in the way of accurately summarizing it.
> It would be better yet if you could give us a link so we can review the
> discussion and determine the validity of their arguments ourselves.
 
It has to come from confusion about fetch_add on Reddit. Some apparently
think that count can be negative. It will never be negative. So, I
boiled it down to a case that shows what happens when a write access
notices a batch of n readers. It simulates how the writer obtains n in
order to properly wait for the n readers.
________________________
#include <iostream>
#include <climits>
 
 
long fetch_add(long& gcount, long addend)
{
long lcount = gcount;
gcount += addend;
return lcount;
}
 
 
int main()
{
long m_count = LONG_MAX;
std::cout << "m_count = " << m_count << "\n";
 
// simulate three concurret readers
fetch_add(m_count, -3);
std::cout << "m_count = " << m_count << ", 3 readers\n";
 
// now m_count = LONG_MAX - 3
 
// simulate a writer. There can only be a single writer.
long count = fetch_add(m_count, -LONG_MAX);
std::cout << "m_count = " << m_count << ", 3 readers in write mode\n";
 
if (count < LONG_MAX)
{
std::cout << "\nReaders Detected!\n";
long readers = LONG_MAX - count;
std::cout << "count = " << count << "\n";
std::cout << "readers = " << readers << "\n";
}
 
return 0;
}
________________________
 
 
There can only be a single writer here, because this is modeling the
point where a writer detects n readers. count will never be negative.
 
 
Fwiw, here is my algorithm:
________________________
// Chris M. Thomassons Experimental Read/Write Mutex
// Yeah, it is pretty damn fat wrt the state, however
// it has some interesting properties...
// The state can be compressed a bit...
// btw, it has no loops...
// Take a look at the lock_shared and unlock_shared functions
 
#define RWMUTEX_COUNT_MAX LONG_MAX
 
struct ct_rwmutex
{
// shared state
std::atomic<long> m_wrstate;
std::atomic<long> m_count;
std::atomic<long> m_rdwake;
 
ct_slow_semaphore m_rdwset;
ct_slow_semaphore m_wrwset;
ct_fast_mutex m_wrlock;
 
 
ct_rwmutex() :
m_wrstate(1),
m_count(RWMUTEX_COUNT_MAX),
m_rdwake(0),
m_rdwset(0),
m_wrwset(0) {
}
 
 
// READ, pretty slim...
void lock_shared()
{
if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
{
m_rdwset.dec();
}
}
 
void unlock_shared()
{
if (m_count.fetch_add(1, std::memory_order_release) < 0)
{
if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
{
m_wrwset.inc();
}
}
}
 
 
// WRITE, more hefty
void lock()
{
m_wrlock.lock();
 
long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
std::memory_order_acquire);
 
if (count < RWMUTEX_COUNT_MAX)
{
long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count,
std::memory_order_acquire);
 
if (rdwake + RWMUTEX_COUNT_MAX - count)
{
m_wrwset.dec();
}
}
}
 
// write unlock
void unlock()
{
long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
std::memory_order_release);
 
if (count < 0)
{
m_rdwset.add(-count);
}
 
m_wrlock.unlock();
}
};
________________________
 
Can you notice any integer overflow and/or underflow? I cannot.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:31PM -0800

> supported by the fact that your main critic quotes code, presumably
> an excerpt from your code, that doesn't match any code you've
> provided in this thread. Could you please provide that code?
 
The only way my algorithm can get into a condition where m_count can
equal -LONG_MAX is if LONG_MAX readers concurrently acquire read access.
This is fine. So, in my example I showed 3 readers, this takes m_count
to -3. Try it with LONG_MAX readers. Now, this is critical: If _more_
than LONG_MAX readers hit my algorithm at the same time, then integer
overflow and underflow _will_ occur. So, as long as the number of
readers is LONG_MAX, and never exceeds this amount, then there is no UB.
 
Basically, the math in my algorithm requires that the number of reader
threads never exceeds LONG_MAX. Btw, have you seen a computer that had
more than LONG_MAX threads running? Perhaps 64-bit LONG_MAX?
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:44PM -0800

On 2/18/2019 2:31 PM, Chris M. Thomasson wrote:
>>> On 2/18/2019 6:34 AM, Ben Bacarisse wrote:
>>>> "Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>
>>>> writes:
[...]
> equal -LONG_MAX is if LONG_MAX readers concurrently acquire read access.
> This is fine. So, in my example I showed 3 readers, this takes m_count
> to -3.
 
It goes to -3 when a writer subtracts LONG_MAX from m_count. This is in
regard to the boiled down code:
_________________________
m_count = LONG_MAX
 
fetch_add(m_count, -3) // three readers
 
m_count = LONG_MAX - 3
 
count = fetch_add(m_count, -LONG_MAX); // a single writer
 
count = LONG_MAX - 3
 
m_count = -3
 
 
readers = LONG_MAX - count = 3 = perfect!
_________________________
 
 
okay, now take this into account wrt LONG_MAX readers...
_________________________
m_count = LONG_MAX
 
fetch_add(m_count, -LONG_MAX) // LONG_MAX readers
 
m_count = LONG_MAX - LONG_MAX = 0 // Okay, we are full of readers!
 
count = fetch_add(m_count, -LONG_MAX); // a single writer
 
count = 0
 
m_count = -LONG_MAX
 
readers = LONG_MAX - count = LONG_MAX = Nice! :^D
_________________________
 
 
Only a single writer is allowed to perform the:
 
count = fetch_add(m_count, -LONG_MAX);
 
mutation.
 
 
 
 
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 18 06:06PM -0500

On 2/18/19 17:50, Chris M. Thomasson wrote:
> On 2/18/2019 2:45 PM, jameskuyper@alumni.caltech.edu wrote:
...
 
> m_count is initialized to LONG_MAX in the constructor of ct_rwmutex. Any
> time that ct_rwmutex::m_count equals LONG_MAX means that there are no
> readers or writers, or activity whatsoever.
 
So a call to unlock_shared() when there are no readers, writers, or
activity, produces signed overflow.
 
I asked about three different cases. Your responded to my second case
with an answer to my first case, and didn't respond to either of the
other two cases.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 03:16PM -0800

On 2/18/2019 3:06 PM, James Kuyper wrote:
>> readers or writers, or activity whatsoever.
 
> So a call to unlock_shared() when there are no readers, writers, or
> activity, produces signed overflow.
 
Why in the world would a thread call unlock_shared() when it did not
call lock_shared() first? Your question make no sense to me. The only
time you call unlock_shared() is to unlock read access that was
previously acquired with lock_shared():
__________________
lock_shared()
// read activity
unlock_shared()
 
 
lock()
// write activity
unlock()
__________________
 
Think about it for a moment.
 
 
> I asked about three different cases. Your responded to my second case
> with an answer to my first case, and didn't respond to either of the
> other two cases.
 
Working on on another benchmark right now. A little time constrained. It
will have some comments.
Paavo Helde <myfirstname@osa.pri.ee>: Feb 19 08:39AM +0200

On 18.02.2019 22:23, James Kuyper wrote:
>> by the compiler-flag.
 
> I've no doubt that he understood exactly what you were asking for. His
> point is that this solution is better than the one you were asking for.
 
Not necessarily better as I do not see much value in the wrap-around
behavior in general, but 'unsigned' is already existing and its
wrap-around behavior is guaranteed by the standards.
 
What Bonita asked for was a signed int with defined wrap-around
behavior. But what would be the use of it? What's so special about the
number -2147483648? Not to speak about the complications arising in
non-two's-complement implementations, hypothetical or not.
 
For unsigned types at least the wrap-over appears at number 0, which is
pretty special, the binary representation is fixed, simple and maps well
to the behavior of existing hardware bit registers. So it makes a bit
more sense.
 
But TBH, in my mind also the unsigned wrap-around is mostly a misfeature
just hiding some bugs and allowing some too clever folks to cut some
corners. In a language like C++ it should be a library feature (and of
course also allow for other ranges than 0..256^N).
David Brown <david.brown@hesbynett.no>: Feb 19 09:01AM +0100

On 19/02/2019 07:39, Paavo Helde wrote:
> wrap-around behavior is guaranteed by the standards.
 
> What Bonita asked for was a signed int with defined wrap-around
> behavior.
 
Actually, I am not sure that is what he asked for. I think he asked for
a way to turn off checking, not necessarily to define the behaviour.
Disabling the sanitizer of a particular function is the best I could offer.
 
> But what would be the use of it? What's so special about the
> number -2147483648? Not to speak about the complications arising in
> non-two's-complement implementations, hypothetical or not.
 
That is, IMHO, a key point in all this. People often say they want
wrapping signed integers because they don't want undefined behaviour -
but wrapping behaviour is almost invariably /wrong/. The underlying
hardware does not have wrapping signed overflow because two's complement
wrapping is desirable - it has it because it is cheap, simple, is easy
to extend (think of doing 32-bit arithmetic on 8-bit processors), and
re-uses the same hardware logic as wrapping unsigned overflow. Some
languages, like java and "gcc -fwrapv" C, make signed overflow defined
behaviour with wrapping - that just hides problems and means the
compiler can't be of any help with warnings or run-time checks that
might have helped you find your bugs.
 
And having a defined behaviour limits the optimiser - it limits how
expressions can be re-arranged and simplified.
 
> pretty special, the binary representation is fixed, simple and maps well
> to the behavior of existing hardware bit registers. So it makes a bit
> more sense.
 
Unsigned overflow wrapping is usually a mistake too, but it is perhaps a
little more useful than with signed. And it is useful to have /some/
way to get wrapping overflow for the rare occasions when it is needed.
 
> just hiding some bugs and allowing some too clever folks to cut some
> corners. In a language like C++ it should be a library feature (and of
> course also allow for other ranges than 0..256^N).
 
Agreed.
 
Surely it shouldn't be too hard to make a standardised library for
integers with a selection of traits that could be chosen, such as sizes
and behaviour on overflow (undefined behaviour, unspecified result,
wrapping, saturation, exception, errno, fatal error, etc.)
Bart <bc@freeuk.com>: Feb 19 11:03AM

On 19/02/2019 08:01, David Brown wrote:
>> more sense.
 
> Unsigned overflow wrapping is usually a mistake too, but it is perhaps a
> little more useful than with signed.
 
That's the question that should be asked: what's so special about
unsigned over signed that overflow is tolerated with one but that the other.
 
People might use unsigned for various reasons, and probably not often
that they actually want modular arithmetic. I might use it to get a more
useful numeric range than the same width signed.
 
There are few things a language can do, but what is really undesirable
is the compiler (it tends not to be the language), deciding that the
programmer could never have been so crass as to deliberately allow
signed overflow, so it assumes it can't happen and takes the opportunity
to do something entirely unexpected and unintuitive.
 
More importantly, especially with C compilers (as there are necessarily
more of these around), to exhibit behaviour which is at odds with most
other compilers. Or sometimes even the same one, if that behaviour is
only turned on with optimisation.
David Brown <david.brown@hesbynett.no>: Feb 19 01:05PM +0100

On 19/02/2019 12:03, Bart wrote:
 
> That's the question that should be asked: what's so special about
> unsigned over signed that overflow is tolerated with one but that the
> other.
 
I agree. I suppose the answer might be that wrapping is occasionally
useful, and it can be easily supported for unsigned types in C (unlike
for signed types, where it could at most be implementation dependent),
so it is then reasonable to support wrapping on unsigned types.
 
Personally, I'd prefer that usually unsigned overflow was also undefined
- as my unsigned arithmetic seldom needs to wrap, that would let me get
the advantages of undefined behaviour in optimisation.
 
> programmer could never have been so crass as to deliberately allow
> signed overflow, so it assumes it can't happen and takes the opportunity
> to do something entirely unexpected and unintuitive.
 
It is the language in this case. C assumes the programmer writes
correct code. /Compilers/ may choose not to make that assumption in
some cases.
 
> more of these around), to exhibit behaviour which is at odds with most
> other compilers. Or sometimes even the same one, if that behaviour is
> only turned on with optimisation.
 
Of course C compilers exhibit behaviour that is different from other
compilers - do you think they should work like Pascal or Fortan
compilers? It is a different language.
 
And of course they exhibit different behaviour with different flags.
But if your code is accurate, then the results will be the same
regardless of optimisation (baring compiler bugs, obviously).
 
If your code fails when optimising, due to signed integer overflow
issues, then it will also fail when not optimising - unless you
seriously misunderstand the language (in which case there is little hope
for any compiler to guess what you might have meant). They symptoms
might be different, but the results will be wrong.
Manfred <noname@add.invalid>: Feb 19 01:54PM +0100

On 2/19/2019 12:03 PM, Bart wrote:
>> little more useful than with signed.
 
> That's the question that should be asked: what's so special about
> unsigned over signed that overflow is tolerated with one but that the other.
 
Trivial observation, but that answers the question: in unsigned types
the first bit is a digit bit. In signed types it is a sign bit.
This makes the unsigned type suitable for bit-oriented arithmetic, that
is obviously not the same as usual arithmetic, in which wrapping from
0xFFFFFFFF to 0x00000000 can make sense.
 
A good example is given in:
 
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1969.htm
 
under the section "Testing Return Values For Errors"
Ralf Goertz <me@myprovider.invalid>: Feb 19 03:14PM +0100

Am Tue, 19 Feb 2019 13:54:49 +0100
> This makes the unsigned type suitable for bit-oriented arithmetic,
> that is obviously not the same as usual arithmetic, in which wrapping
> from 0xFFFFFFFF to 0x00000000 can make sense.
 
Does that really matter? If I have an unsigned short set to its maximum
possible value (32767 here) then add 1 I get -32768. Modulo 65536 (the
same module that unsigned short has) this is the same as 32768. So with
signed you get just another representation of the integers modulo 2^n
where n is the number of bits. Some mathematicians prefer one others the
other. No big deal. No?
Bart <bc@freeuk.com>: Feb 19 03:01PM

On 19/02/2019 12:54, Manfred wrote:
>> other.
 
> Trivial observation, but that answers the question: in unsigned types
> the first bit is a digit bit. In signed types it is a sign bit.
 
It is open to interpretation:
 
unsigned int a=2000000000,b=1000000000;
signed int c=2000000000,d=1000000000;
a+b;
c+d;
 
Both of these operations generate the same x86 'add' operation for
example. How is it treating the top bit? It doesn't matter. The x86 will
set flags both for an unsigned add, and a signed one. The resulting bit
pattern is the same in both cases.
 
C however (and presumably C++ inherits) says that the result is
undefined for the second 'add' operation, even it is always well-defined
for x86.
 
But take this case:
 
unsigned int a=5, b=2;
a-b;
b-a;
 
C says that both of these are well-defined, even though a-b gives you
the expected 3, while b-a gives you 4294967293.
 
No signed arithmetic is going on. But if you interpret the 4294967293
result as two's complement, with the top bit as a sign bit, then you
will get -3 if printed.
 
Similarly, the c+d in the first example gives -1294967296, but
3000000000 if the same bit pattern is interpreted as an unsigned value.
 
The two representations are closely related, in that the corresponding
bit patterns are interchangeable, but C refuses to acknowledge that.
 
That might be because two's complement representation is not universal,
but it means some C compiler ending up doing unexpected things even on
the 99.99% of machines which do use it.
Bart <bc@freeuk.com>: Feb 19 03:11PM

On 19/02/2019 12:05, David Brown wrote:
 
> Of course C compilers exhibit behaviour that is different from other
> compilers - do you think they should work like Pascal or Fortan
> compilers? It is a different language.
 
No, I meant at odds even with other C compilers.
Manfred <noname@add.invalid>: Feb 19 04:35PM +0100

On 2/19/2019 4:01 PM, Bart wrote:
> example. How is it treating the top bit? It doesn't matter. The x86 will
> set flags both for an unsigned add, and a signed one. The resulting bit
> pattern is the same in both cases.
 
I know, but in the case of 'signed' It Happens to Work™, under the
assumption of an underlying two's complement representation.
In the case of 'unsigned' It Just Works™, guaranteed by the standard and
with no assumptions required.
When dealing with any kind of workable logic (math, algorithms, etc.)
the possibility of dropping an assumption is an added value.
 
 
> C however (and presumably C++ inherits) says that the result is
> undefined for the second 'add' operation, even it is always well-defined
> for x86.
 
You know that C as a language and x86 as an architecture are two
different things.
 
 
> No signed arithmetic is going on. But if you interpret the 4294967293
> result as two's complement, with the top bit as a sign bit, then you
> will get -3 if printed.
 
This too is based on the assumption of two's complement.
 
 
> Similarly, the c+d in the first example gives -1294967296, but
> 3000000000 if the same bit pattern is interpreted as an unsigned value.
 
As a confirmation of earlier arguments, here -1294967296 as a result
falls into the category of 'rubbish', 3000000000 does not.
 
 
> The two representations are closely related, in that the corresponding
> bit patterns are interchangeable, but C refuses to acknowledge that.
 
The two representations are interchangeable under the assumption of
two's complement. C has made the choice of lifting this constraint.
 
 
> That might be because two's complement representation is not universal,
> but it means some C compiler ending up doing unexpected things even on
> the 99.99% of machines which do use it.
 
I wouldn't look at this this way. I believe the rationale is that
wrapping overflow has negligible use for signed types, while it has
clear value for unsigned ones (see earlier example).
 
More than the compiler doing unexpected things, it is that C requires
the programmer to pay attention to details.
It is not distracted-friendly.
"Öö Tiib" <ootiib@hot.ee>: Feb 19 08:12AM -0800

On Monday, 18 February 2019 20:56:42 UTC+2, David Brown wrote:
 
> I am sorry, but you appear to be confused. The "Meltdown and Spectre"
> stuff is a hardware problem, not a software problem. Compiler writers
> have tried to find ways to work around the hardware flaw.
 
The hardware vendors are major employers of those compiler writers.
 
> only work on benchmark" claims - the people writing them are widely
> dispersed with totally different kinds of employers. In particular, the
> gcc developers fall into several categories:
 
Note that "they only work on benchmark" is not quote of mine. I can't
defend positions of straw-men that you build. ;-) Will you tell next
that I wrote somewhere that "I do not care about performance"?
 
> chips, so their concern is that you (the programmer) get the best from
> the compiler and their chip. Benchmarks for the compiler don't matter -
> support for the chip matters.
 
The market statements are not mostly about performance of those chips
and the likes of "Meltdown and Spectre" are not result of overeager
optimizing of those and even when it seems to be so then nothing of it
is measured with benchmarks compiled on those compilers?
 
> provide tools and services to developers. They want programmers to be
> happy with the tools - they don't care if you use a different compiler
> instead.
 
The programmers they target are not overly concerned with performance?
My impression is that programmers start to post about performance before
figuring out how to turn the optimizations on.
 
> 3. Those working for big users, like Google and Facebook. They don't
> care about benchmarks - they care about performance on their own software.
 
They do not use compilers to build their software and so do not care about
compiler optimizations?
 
> 4. The independent and volunteer developers. They care about the
> quality of their code, and making something worthwhile - they don't care
> about benchmark performances.
 
Again there are developers who don't care about performance?
 
> with newer versions, but not more than that. And those that pick a
> compiler for its speed, do so based on the speed for their own source
> code, not for some benchmark.
 
However it seems that there are only few weirdos like me who think that
it does not matter how fast the wrong answers are calculated and consider
it better when those wrong answers are not calculated at all.
 
> money. Who would profit from making compilers focused on benchmark
> performance as the main goal, with a disregard for support for existing
> C or C++ sources?
 
What conspiracy theory? Where did I say that they disregard support for
existing source code? If to follow money then Google, Apple, Microsoft
and Oracle have tons of own C and C++ source code that they want to
be performant and don't want to break but they want developers to use
Go, Swift, C# or Java. So indeed they might want to "extend" crappy
"features" and "optimizations" into C++ that they won't ever use in
and with their own code.
 
> you can often find these early and fix them, but sometimes the flaws are
> discovered quite late. Hardware flaws are harder to fix - but very easy
> for amateurs to condemn once they are found.
 
The cache, branch prediction and speculative execution are performance
optimizations piled together. That can be tricky to get such a pile
correct and if to prioritize correctness below performance then defects
slip through. Same things do happen with compiler optimizations.
 
> code in the manner the programmer appeared to expect. But for most
> undefined behaviour, it is hard or impossible to guess what the
> programmer expected - that is the nature of undefined behaviour.
 
What is so controversial what is the behavior that programmer expects
on case of -ftrapv? Fortunately has been obvious that I want
division-by-zero to trap (even on MIPS and ARM, without special
compiler options) but that might also change out of blue when
a way to "optimize" it will be discovered, and then we need to
add some -fplease-dont-remove-divide-by-zero-trap I suspect.
 
> particular, it does not trap in all cases. Personally, I think the flag
> should be dropped in favour of the sanitizer, which is a more modern and
> flexible alternative and which is actively maintained.
 
Sanitizers sound like debugging options. Why two almost equal features
are developed into same tool? With such logic one day the
-fsanitize=signed-integer-overflow will also become "poor"
and "unreliable" and then some third feature will be the "correct"
way to make programs to crash on signed integer overflow. With
feature creep after a while nothing is reliable.

> usually aim to right correct code, and have that correct code run as
> fast as reasonably possible. If you want software that is full of
> run-time checks, you don't program in C or C++.
 
See? We have modern, branch-predicting and eagerly executing hardware
with megabytes of cache but reasonability of -ftrapv usage is questionable.
Sure, it is not for omnipotent programmers, without even measuring what
it costs but I am fallible and have never met anyone almighty.
How is it so self-evident that -ftrapv is unreasonable option?
Correctness of behavior is more important than performance of behavior
and incorrect behavior is often worse than no behavior whatsoever.
 
> In C and C++, you can always manually add any checks you want. With
> C++, you can make your own types that do checking in the manner that
> suits your needs.
 
Why? I agree that signed integer overflow is programming error and
so if the integer calculations happen to be important then the -ftrapv
handles it correctly (when it happens to work).
Also how can I be sure that they don't "optimize" my manual checks
away? After a while it can be like with null pointers. One optimizes
dereference away, other optimizes null pointer check away (since it
was "after" dereference) and result is that neither dereference does
crash nor check work.
Manfred <noname@add.invalid>: Feb 19 05:43PM +0100

On 2/18/2019 7:56 PM, David Brown wrote:
> only work on benchmark" claims - the people writing them are widely
> dispersed with totally different kinds of employers.  In particular, the
> gcc developers fall into several categories:
 
I believe you about the categories below, but I find it hard to believe
that benchmarks don't matter.
In fact whenever results from different compilers are presented,
performance is one of the key metrics that is discussed, which is
obvious since correctness of the generated binary is no issue for any
self-respecting compiler.
 
Moreover, most occurrences of "undefined behavior" are usually justified
by increased performance of compiler optimizations.
 
> chips, so their concern is that you (the programmer) get the best from
> the compiler and their chip.  Benchmarks for the compiler don't matter -
> support for the chip matters.
 
But they care that the chip is properly supported by the compiler, and
"properly" includes performance.
 
> provide tools and services to developers.  They want programmers to be
> happy with the tools - they don't care if you use a different compiler
> instead.
 
You may have a point in this category. I would say that more than making
/programmers/ happy, they want to make software project /mangers/ happy,
which brings into the picture features like time-to-market, man-hours
cost etc., all goals that come prior to performance of the final product.
This is IMHO one reason for which in many benchmarks MSVC comes usually
after clang and gcc (in this order)
 
 
> 3. Those working for big users, like Google and Facebook.  They don't
> care about benchmarks - they care about performance on their own software.
 
True, but they know that in order to get performance out of their
application software, they need performing compilers.
I believe clang developers know this very well.
 
 
> 4. The independent and volunteer developers.  They care about the
> quality of their code, and making something worthwhile - they don't care
> about benchmark performances.
 
Not really. Performance may not be their first goal (compared to e.g.
being the first to support the latest technology for the open source
community), but sure it is one of the goals, especially after first
introducing such new technologies.
jameskuyper@alumni.caltech.edu: Feb 19 10:04AM -0800

On Monday, February 18, 2019 at 6:40:35 PM UTC-5, Chris M. Thomasson wrote:
> On 2/18/2019 2:45 PM, jameskuyper@alumni.caltech.edu wrote:
...
> > different from the code that has been presented makes me unwilling to
> > invest much time on the code that has been presented. However, a quick
> > scan lead me to the following questions:
...
> You snipped way to much of my code.
 
Your reddit critic quoted text, presumable from your code that he was
commenting about, that doesn't match any of the code you've posted on
this thread. The topic of this thread being the validity of his comments,
I've no particular interest in diving deeply into code that isn't the
code he was commenting about. I'm certainly not interested in your code
for it's own sake. I just scanned your code, made a few comments, and
cut everything not directly relevant to the comments I made.
 
Note - another indication that he was commenting on different code than
I was looking at is his comment that m_count is never decremented. The
code you presented does decrement m_count. If the code he was commenting
on also had a decrement in it, then when he made that comment your best
response would have been to point out where that decrement was. Why
didn't you respond in that fashion? If it didn't have a decrement, then
one of the dangerous cases is not when you have LONG_MAX simultaneous
lockers, but whenever you've had a total of LONG_MAX lockers, whether or
not they've ever been unlocked. For programs that run 24/7 for long
periods of time (which some do), that could be a significant problem
 
I've no idea whether or not this is the case, since I haven't seen the
code he was commenting on.
 
...
> ... Why did you snip
> so much context?
 
As explained above.
 
> > You said that m_count can have any value in the range -LONG_MAX to
> > LONG_MAX, inclusive. Under what circumstances would m_count be positive?
 
> m_count is initialized to LONG_MAX. It means no activity.
 
For the second time in a row, you responded to a question about m_count
being positive, by addressing only the possibility that m_count is
LONG_MAX. What about values for m_count from 1 to LONG_MAX-1?
 
The code I saw did nothing to prevent a call to fetch_add() when m_count
happens to have a value that trigger overflow inside fetch_add(). Your
responses seem to all take the form of saying that it never makes sense
to call the relevant functions when m_count has such a value. If that is
true (and for the reasons I've given above, I haven't bothered checking
whether it is), then explaining the correct rules for using your package
so as to avoid signed integer overflow would have been a much better
response than asserting, without explanation, that your critics have
failed to understand your package.
 
...
> > very same question on Reddit (in less detail) just denigrated his
> > understanding without bother to explain what those reasons are.
 
> You should take a deeper look.
 
Post the code that your critic was commenting on, and I might bother
examining it more carefully.
 
Thinking about this issue led me to another point. Could you describe
what m_count means? I'm looking for a description that remains accurate
before and after you add LONG_MAX to m_count. Similarly, it should
remain accurate before and after you subtract LONG_MAX from m_count.
To put it in another, more formal way, can you define a class invariant involving m_count which remains unchanged after calling the member
function that adds LONG_MAX to it? It should also remain unchanged after
calling the member function that subtracts LONG_MAX from it.
I'm not saying that this can't be done, but only that I'm having a hard
timing imagining how it could be done. My imagination has often failed
me in similar circumstances.
 
I might be able to figure this out on my own from close examination of
your code, but for the reasons given above, I'm unwilling to perform
such an examination. In any event, I shouldn't have to - a description
of what m_oount means should be part of the documentation for your
design of this class. Not everyone bothers thinking explicitly in terms
of class invariants (I certainly can't claim that I do so on every
occasion), but well-designed classes should have such invariants, even
if the designer wasn't thinking explicitly about them.
Bart <bc@freeuk.com>: Feb 19 07:57PM

On 19/02/2019 15:35, Manfred wrote:
> with no assumptions required.
> When dealing with any kind of workable logic (math, algorithms, etc.)
> the possibility of dropping an assumption is an added value.
 
I wonder how many programs would stop working if they were recompiled
for a processor that didn't use two's complement (assume everything else
needed to run them is the same).
 
>> result as two's complement, with the top bit as a sign bit, then you
>> will get -3 if printed.
 
> This too is based on the assumption of two's complement.
 
I think I said I chose x86 as an example, but it's a VERY common example
of how things actually work.
 
Do you have any instance of actual machines that don't use two's
complement? I'd imagine they would either be obsolete, be odd
micro-controllers, or be some IBM mainframe.
 
However, looking at the numeric limits of various other languages for a
32-bit signed int type:
 
C# -2**31 to +2**31-1
Java -2**31 to +2**31-1
Rust believed to be -2**31 to +2**31-1
Go -2**31 to +2**31-1
D not specified that I could see
 
Where known, they all assume two's complement, regardless of machine.
And all, including D, seem to have agreed on widths of 8/16/32/64 bits.
 
C doesn't even stipulate that.
 
Can you blame anyone for wanting a language to acknowledge something
that is pretty much universal?
 
> As a confirmation of earlier arguments, here -1294967296 as a result
> falls into the category of 'rubbish', 3000000000 does not.
 
So just like 1u-2u, but that one apparently is not classed as rubbish.
 
> I wouldn't look at this this way. I believe the rationale is that
> wrapping overflow has negligible use for signed types, while it has
> clear value for unsigned ones (see earlier example).
 
After 40 years of experiencing exactly that behaviour on a fair number
of machines and languages, it's what I'm come to expect.
 
Take 8-bit signed to keep the numbers simple: (120+20) will overflow the
upper limit of +127. But what about (120+20)-20?
 
This should end up back as 120 using two's complement, but according to
C, it's undefined because an intermediate value overflows.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 01:43PM -0800


>> Here is the code: Still have to try it out on GCC in c++17 mode.
 
>> Can anybody run it?
 
>> https://pastebin.com/raw/xCBHY9qd
[...]
 
> Testing: Chris M. Thomasson's Experimental Read/Write Mutex
 
> msec = 40027
> shared.m_state = 160000000
 
[...]
 
> Testing: Chris M. Thomasson's Experimental Read/Write Mutex
 
> msec = 37518
> shared.m_state = 160000000
[...]
> VS SHARED MUTEX:
 
> Ran for 15 minutes then i stopped it. It's not even close.
 
Wow. Thank you so much for taking the time to test it for yourself. I
really do appreciate it. It seems that some of the std::shared_mutex
impls are created around pure mutex/condvar. So, it is only slow pathed
(always blocking). My algorithms try to create fast-paths (non-blocking)
paths that can cleverly "skip" calls into the "slow" paths, so to speak.
 
I am working on another type of benchmark code. Will get back to you.
 
Thanks again.
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 08:55PM -0800

On 2/17/2019 3:06 PM, Chris M. Thomasson wrote:
 
> Writers would push and pop items from the list.
 
> How many reads and writes can be completed per-second, per-thread if the
> test was run for a fixed amount of time...
 
Fwiw, here is such a test using Relacy Race Detector, you know, just to
keep things in line:
 
https://groups.google.com/d/msg/lock-free/zzZX4fvtG04/-SFh9r_JAQAJ
 
https://pastebin.com/raw/xCBHY9qd
(link to raw code)
 
Readers iterate all the nodes in struct ct_simple_stack.
 
Writers push and pop from it.
 
Next step is coding it up in pure 100% standard C++11. Well, C++17 wrt
comparing it against std::shared_mutex... ;^)
 
_________________________
// Experimental Read-Write Mutex Test
// More "realistic" test, in Relacy...
// By: Chris M. Thomasson
//_______________________________________________
 
 
//#define RL_DEBUGBREAK_ON_ASSERT
//#define RL_MSVC_OUTPUT
//#define RL_FORCE_SEQ_CST
//#define RL_GC
 
 
#include <relacy/relacy_std.hpp>
#include <iostream>
 
 
// Simple macro based redirection of the verbose std membars.
#define CT_MB_ACQ std::memory_order_acquire
#define CT_MB_REL std::memory_order_release
#define CT_MB_RLX std::memory_order_relaxed
#define CT_MB_ACQ_REL std::memory_order_acq_rel
#define CT_MB_SEQ_CST std::memory_order_seq_cst
#define CT_MB(mp_0) std::atomic_thread_fence(mp_0)
 
 
#define READERS 4
#define WRITERS 2
#define THREADS (READERS + WRITERS)
#define ITERS 3
 
 
 
 
////////////////
 
 
// bare bones mutex/condvar based semaphore
struct ct_slow_semaphore
{
VAR_T(unsigned long) m_state;
std::mutex m_mutex;
std::condition_variable m_cond;
 
ct_slow_semaphore(unsigned long state) : m_state(state) {}
 
void inc()
{
m_mutex.lock($);
++VAR(m_state);
m_mutex.unlock($);
m_cond.notify_one($);
}
 
void add(unsigned long addend)
{
m_mutex.lock($);
VAR(m_state) += addend;
m_mutex.unlock($);
m_cond.notify_all($);
}
 
void dec()
{
m_mutex.lock($);
while (VAR(m_state) == 0) m_cond.wait(m_mutex, $);
--VAR(m_state);
m_mutex.unlock($);
}
};
 
 
 
 
// bin-sema
struct ct_auto_reset_event
{
bool m_state;
std::mutex m_mutex;
std::condition_variable m_cond;
 
ct_auto_reset_event() : m_state(false) {}
 
void signal()
{
m_mutex.lock($);
m_state = true;
m_cond.notify_one($);
m_mutex.unlock($);
}
 
void wait()
{
m_mutex.lock($);
while (m_state == false) m_cond.wait(m_mutex, $);
m_state = false; // auto-reset
m_mutex.unlock($);
}
};
 
 
 
 
// just a layer over an auto-reset event
struct ct_fast_mutex
{
std::atomic<unsigned int> m_state;
ct_auto_reset_event m_waitset;
 
ct_fast_mutex() : m_state(0) {}
 
void lock()
{
if (m_state.exchange(1, std::memory_order_acquire))
{
while (m_state.exchange(2, std::memory_order_acquire))
{
m_waitset.wait();
}
}
}
 
void unlock()
{
if (m_state.exchange(0, std::memory_order_release) == 2)
{
m_waitset.signal();
}
}
};
 
 
 
// Chris M. Thomassons Experimental Read/Write Mutex
// Yeah, it is pretty damn fat wrt the state, however
// it has some interesting properties...
// The state can be compressed a bit...
// btw, it has no loops...
// Take a look at the lock_shared and unlock_shared functions
 
#define RWMUTEX_COUNT_MAX LONG_MAX
 
struct ct_rwmutex
{
// shared state
std::atomic<long> m_wrstate;
std::atomic<long> m_count;
std::atomic<long> m_rdwake;
 
ct_slow_semaphore m_rdwset;
ct_slow_semaphore m_wrwset;
ct_fast_mutex m_wrlock;
 
 
ct_rwmutex() :
m_wrstate(1),
m_count(RWMUTEX_COUNT_MAX),
m_rdwake(0),
m_rdwset(0),
m_wrwset(0) {
}
 
 
// READ, pretty slim...
void lock_shared()
{
if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
{
m_rdwset.dec();
}
}
 
void unlock_shared()
{
if (m_count.fetch_add(1, std::memory_order_release) < 0)
{
if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
{
m_wrwset.inc();
}
}
}
 
 
// WRITE, more hefty
void lock()
{
m_wrlock.lock();
 
long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
std::memory_order_acquire);
 
if (count < RWMUTEX_COUNT_MAX)
{
long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count,
std::memory_order_acquire);
 
if (rdwake + RWMUTEX_COUNT_MAX - count)
{
m_wrwset.dec();
}
}
}
 
// write unlock
void unlock()
{
long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
std::memory_order_release);
 
if (count < 0)
{
m_rdwset.add(-count);
}
 
m_wrlock.unlock();
}
};
 
 
 
////////////////
 
 
struct ct_simple_stack
{
struct node
{
VAR_T(node*) m_next;
VAR_T(unsigned int) m_tid;
 
node(unsigned int tid) : m_tid(tid) {}
};
 
VAR_T(node*) m_head;
 
ct_simple_stack() : m_head(nullptr) {}
 
void push(node* n)
{
VAR(n->m_next) = VAR(m_head);
VAR(m_head) = n;
}
 
node* pop()
{
node* n = VAR(m_head);
 
if (n)
{
VAR(m_head) = VAR(n->m_next);
}
 
return n;
}
 
 
node* flush()
{
node* n = VAR(m_head);
VAR(m_head) = nullptr;
return n;
}
 
};
 
 
void ct_destroy_nodes(ct_simple_stack& s)
{
ct_simple_stack::node* n = s.flush();
 
while (n)
{
ct_simple_stack::node* next = VAR(n->m_next);
delete n;
n = next;
}
}
 
 
 
// Relacy Stack Test...
struct ct_fast_mutex_test
: rl::test_suite<ct_fast_mutex_test, THREADS>
{
ct_rwmutex g_rwmutex;
ct_simple_stack g_stack;
 
void before()
{
 
}
 
void after()
{
ct_destroy_nodes(g_stack);
}
 
// reader
void reader(unsigned int tidx)
{
g_rwmutex.lock_shared();
 
for (unsigned int i = 0; i < ITERS; ++i)
{
ct_simple_stack::node* h = VAR(g_stack.m_head);
 
while (h)
{
ct_simple_stack::node* next = VAR(h->m_next);
 
if (i % 512 == 0)
{
rl::yield(1, $);
}
 
h = next;
}
}
 
g_rwmutex.unlock_shared();
}
 
 
// writer
void writer(unsigned int tidx)
{
g_rwmutex.lock();
g_stack.push(new ct_simple_stack::node(tidx));
g_stack.push(new ct_simple_stack::node(tidx));
g_rwmutex.unlock();
 
g_rwmutex.lock();
ct_simple_stack::node* n = g_stack.pop();
g_rwmutex.unlock();
 
if (n)
{
// destroy
delete n;
}
}
 
 
void thread(unsigned int tidx)
{
if (tidx < READERS)
{
reader(tidx);
}
 
else
{
writer(tidx);
}
}
};
 
 
 
// Test away... Or fly? Humm...
int main()
{
{
rl::test_params p;
 
p.iteration_count = 10000000;
//p.execution_depth_limit = 33333;
//p.search_type = rl::sched_bound;
//p.search_type = rl::fair_full_search_scheduler_type;
//p.search_type = rl::fair_context_bound_scheduler_type;
 
rl::simulate<ct_fast_mutex_test>(p);
}
 
return 0;
}
_________________________
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: