- Turning bits into integers - 3 Updates
- is there any UB in this integer math... - 20 Updates
- an experimental read/write mutex... - 2 Updates
rgladkik@sfu.ca: Feb 19 10:06AM -0800 Sorry for the late reply and thanks to those who responded. I agree that what I had programmed made no sense. I'm a beginner. struct NonlocalBonds { std::vector<unsigned int> k; unsigned int l; std::vector<unsigned int>::const_iterator operator()(unsigned int i, unsigned int j) const { assert(j > i); if ((j-i) % 4 != 0 || (j-i) / 4 != l || (i-2) % 4 != 0) return k.end(); return std::find(k.begin(), k.end(), (i-2) / 4) - k.begin(); } }; Here is the revised code. Basically, what I want is to output the index of a value inside the vector k, if it exists. The presence of this value is determined by the if statement. I'm not sure if the type of const_iterator is correct, but I think the rest is correct. |
rgladkik@sfu.ca: Feb 19 10:13AM -0800 I don't know if this is the best method to accomplish what I want to do. |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 19 08:22PM > index of a value inside the vector k, if it exists. The presence of > this value is determined by the if statement. I'm not sure if the > type of const_iterator is correct, but I think the rest is correct. I still don't understand your goal, but with the above I can at least make partial suggestions. Does this capture what you mean? std::vector<unsigned>::const_iterator operator() (unsigned i, unsigned j) const { unsigned val = f(i, j); return std::find(k.begin(), k.end(), val); } I broke out the calculation of the value to a separate helper function f() so it doesn't distract. You seem to be a bit undecided about if you want to return an iterator into k, or an index in k. Sometimes one works best, sometimes the other -- it depends on what the calling code wants. If you return an index, you have to decide what to return if you don't find the value. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 03:40PM -0800 > happens if the following code is executed at a time when that is the > case? >> if (m_count.fetch_add(1, std::memory_order_release) < 0) That would be unlocking a reader. Without contention from a writer this is: ______________________ m_count = LONG_MAX // reader lock_shared(): fetch_add(m_count, -1); m_count = LONG_MAX - 1 // read access unlock_shared(): fetch_add(m_count, 1); m_count = LONG_MAX ______________________ You snipped way to much of my code. > You said that m_count can have any value in the range -LONG_MAX to > LONG_MAX, inclusive. Under what circumstances would m_count be negative? When there is a writer. > What happens if the following code is executed at a time when that is > the case? >> long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX, This gains writer access and can only be executed by a single thread. You just snipped the part where we gain a writer lock. Why did you snip so much context? Only a single thread can decrement m_count by LONG_MAX. Look: _________________ // WRITE, more hefty void lock() { m_wrlock.lock(); long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX, std::memory_order_acquire); if (count < RWMUTEX_COUNT_MAX) { long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count, std::memory_order_acquire); if (rdwake + RWMUTEX_COUNT_MAX - count) { m_wrwset.dec(); } } } // write unlock void unlock() { long count = m_count.fetch_add(RWMUTEX_COUNT_MAX, std::memory_order_release); if (count < 0) { m_rdwset.add(-count); } m_wrlock.unlock(); } _________________ lock() // write unlock() > You said that m_count can have any value in the range -LONG_MAX to > LONG_MAX, inclusive. Under what circumstances would m_count be positive? m_count is initialized to LONG_MAX. It means no activity. > What happens if the following code is executed at a time when that is > the case? >> long count = m_count.fetch_add(RWMUTEX_COUNT_MAX, This can only be executed when we unlock write access, this can only occur when the calling thread has previously locked for write access. Remember: lock() // write unlock() > actually come up. However, your response to the person who asked the > very same question on Reddit (in less detail) just denigrated his > understanding without bother to explain what those reasons are. You should take a deeper look. |
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 18 03:22PM -0500 On 2/17/19 23:23, Chris M. Thomasson wrote: > There is a sort of funny discussion going on over on reddit about my > read/write mutex. Some people think is has UB wrt an integer overflow or > underflow. I boiled it down to the following simple program: ... > Is there any UB in there? Some on reddit seem to think so. They are most > likely trolling me. Little shi%'s. I see no undefined behavior in the code you've shown. In general, I'm less reliable when I say there's no problem than I am when saying that there is a problem - but I've got David and Ben backing me up on this. Are you sure the code you've give us is the same code they're referring to? It would help a great deal if you could explain the arguments they've given for thinking it's undefined. However, it's generally very difficult to accurately summarize an argument that hasn't convinced you, even if it's a valid argument that should have convinced you - your own biases get in the way of accurately summarizing it. It would be better yet if you could give us a link so we can review the discussion and determine the validity of their arguments ourselves. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:12PM -0800 On 2/18/2019 12:22 PM, James Kuyper wrote: > biases get in the way of accurately summarizing it. > It would be better yet if you could give us a link so we can review the > discussion and determine the validity of their arguments ourselves. It has to come from confusion about fetch_add on Reddit. Some apparently think that count can be negative. It will never be negative. So, I boiled it down to a case that shows what happens when a write access notices a batch of n readers. It simulates how the writer obtains n in order to properly wait for the n readers. ________________________ #include <iostream> #include <climits> long fetch_add(long& gcount, long addend) { long lcount = gcount; gcount += addend; return lcount; } int main() { long m_count = LONG_MAX; std::cout << "m_count = " << m_count << "\n"; // simulate three concurret readers fetch_add(m_count, -3); std::cout << "m_count = " << m_count << ", 3 readers\n"; // now m_count = LONG_MAX - 3 // simulate a writer. There can only be a single writer. long count = fetch_add(m_count, -LONG_MAX); std::cout << "m_count = " << m_count << ", 3 readers in write mode\n"; if (count < LONG_MAX) { std::cout << "\nReaders Detected!\n"; long readers = LONG_MAX - count; std::cout << "count = " << count << "\n"; std::cout << "readers = " << readers << "\n"; } return 0; } ________________________ There can only be a single writer here, because this is modeling the point where a writer detects n readers. count will never be negative. Fwiw, here is my algorithm: ________________________ // Chris M. Thomassons Experimental Read/Write Mutex // Yeah, it is pretty damn fat wrt the state, however // it has some interesting properties... // The state can be compressed a bit... // btw, it has no loops... // Take a look at the lock_shared and unlock_shared functions #define RWMUTEX_COUNT_MAX LONG_MAX struct ct_rwmutex { // shared state std::atomic<long> m_wrstate; std::atomic<long> m_count; std::atomic<long> m_rdwake; ct_slow_semaphore m_rdwset; ct_slow_semaphore m_wrwset; ct_fast_mutex m_wrlock; ct_rwmutex() : m_wrstate(1), m_count(RWMUTEX_COUNT_MAX), m_rdwake(0), m_rdwset(0), m_wrwset(0) { } // READ, pretty slim... void lock_shared() { if (m_count.fetch_add(-1, std::memory_order_acquire) < 1) { m_rdwset.dec(); } } void unlock_shared() { if (m_count.fetch_add(1, std::memory_order_release) < 0) { if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1) { m_wrwset.inc(); } } } // WRITE, more hefty void lock() { m_wrlock.lock(); long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX, std::memory_order_acquire); if (count < RWMUTEX_COUNT_MAX) { long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count, std::memory_order_acquire); if (rdwake + RWMUTEX_COUNT_MAX - count) { m_wrwset.dec(); } } } // write unlock void unlock() { long count = m_count.fetch_add(RWMUTEX_COUNT_MAX, std::memory_order_release); if (count < 0) { m_rdwset.add(-count); } m_wrlock.unlock(); } }; ________________________ Can you notice any integer overflow and/or underflow? I cannot. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:31PM -0800 > supported by the fact that your main critic quotes code, presumably > an excerpt from your code, that doesn't match any code you've > provided in this thread. Could you please provide that code? The only way my algorithm can get into a condition where m_count can equal -LONG_MAX is if LONG_MAX readers concurrently acquire read access. This is fine. So, in my example I showed 3 readers, this takes m_count to -3. Try it with LONG_MAX readers. Now, this is critical: If _more_ than LONG_MAX readers hit my algorithm at the same time, then integer overflow and underflow _will_ occur. So, as long as the number of readers is LONG_MAX, and never exceeds this amount, then there is no UB. Basically, the math in my algorithm requires that the number of reader threads never exceeds LONG_MAX. Btw, have you seen a computer that had more than LONG_MAX threads running? Perhaps 64-bit LONG_MAX? |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 02:44PM -0800 On 2/18/2019 2:31 PM, Chris M. Thomasson wrote: >>> On 2/18/2019 6:34 AM, Ben Bacarisse wrote: >>>> "Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com> >>>> writes: [...] > equal -LONG_MAX is if LONG_MAX readers concurrently acquire read access. > This is fine. So, in my example I showed 3 readers, this takes m_count > to -3. It goes to -3 when a writer subtracts LONG_MAX from m_count. This is in regard to the boiled down code: _________________________ m_count = LONG_MAX fetch_add(m_count, -3) // three readers m_count = LONG_MAX - 3 count = fetch_add(m_count, -LONG_MAX); // a single writer count = LONG_MAX - 3 m_count = -3 readers = LONG_MAX - count = 3 = perfect! _________________________ okay, now take this into account wrt LONG_MAX readers... _________________________ m_count = LONG_MAX fetch_add(m_count, -LONG_MAX) // LONG_MAX readers m_count = LONG_MAX - LONG_MAX = 0 // Okay, we are full of readers! count = fetch_add(m_count, -LONG_MAX); // a single writer count = 0 m_count = -LONG_MAX readers = LONG_MAX - count = LONG_MAX = Nice! :^D _________________________ Only a single writer is allowed to perform the: count = fetch_add(m_count, -LONG_MAX); mutation. |
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 18 06:06PM -0500 On 2/18/19 17:50, Chris M. Thomasson wrote: > On 2/18/2019 2:45 PM, jameskuyper@alumni.caltech.edu wrote: ... > m_count is initialized to LONG_MAX in the constructor of ct_rwmutex. Any > time that ct_rwmutex::m_count equals LONG_MAX means that there are no > readers or writers, or activity whatsoever. So a call to unlock_shared() when there are no readers, writers, or activity, produces signed overflow. I asked about three different cases. Your responded to my second case with an answer to my first case, and didn't respond to either of the other two cases. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 03:16PM -0800 On 2/18/2019 3:06 PM, James Kuyper wrote: >> readers or writers, or activity whatsoever. > So a call to unlock_shared() when there are no readers, writers, or > activity, produces signed overflow. Why in the world would a thread call unlock_shared() when it did not call lock_shared() first? Your question make no sense to me. The only time you call unlock_shared() is to unlock read access that was previously acquired with lock_shared(): __________________ lock_shared() // read activity unlock_shared() lock() // write activity unlock() __________________ Think about it for a moment. > I asked about three different cases. Your responded to my second case > with an answer to my first case, and didn't respond to either of the > other two cases. Working on on another benchmark right now. A little time constrained. It will have some comments. |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 19 08:39AM +0200 On 18.02.2019 22:23, James Kuyper wrote: >> by the compiler-flag. > I've no doubt that he understood exactly what you were asking for. His > point is that this solution is better than the one you were asking for. Not necessarily better as I do not see much value in the wrap-around behavior in general, but 'unsigned' is already existing and its wrap-around behavior is guaranteed by the standards. What Bonita asked for was a signed int with defined wrap-around behavior. But what would be the use of it? What's so special about the number -2147483648? Not to speak about the complications arising in non-two's-complement implementations, hypothetical or not. For unsigned types at least the wrap-over appears at number 0, which is pretty special, the binary representation is fixed, simple and maps well to the behavior of existing hardware bit registers. So it makes a bit more sense. But TBH, in my mind also the unsigned wrap-around is mostly a misfeature just hiding some bugs and allowing some too clever folks to cut some corners. In a language like C++ it should be a library feature (and of course also allow for other ranges than 0..256^N). |
David Brown <david.brown@hesbynett.no>: Feb 19 09:01AM +0100 On 19/02/2019 07:39, Paavo Helde wrote: > wrap-around behavior is guaranteed by the standards. > What Bonita asked for was a signed int with defined wrap-around > behavior. Actually, I am not sure that is what he asked for. I think he asked for a way to turn off checking, not necessarily to define the behaviour. Disabling the sanitizer of a particular function is the best I could offer. > But what would be the use of it? What's so special about the > number -2147483648? Not to speak about the complications arising in > non-two's-complement implementations, hypothetical or not. That is, IMHO, a key point in all this. People often say they want wrapping signed integers because they don't want undefined behaviour - but wrapping behaviour is almost invariably /wrong/. The underlying hardware does not have wrapping signed overflow because two's complement wrapping is desirable - it has it because it is cheap, simple, is easy to extend (think of doing 32-bit arithmetic on 8-bit processors), and re-uses the same hardware logic as wrapping unsigned overflow. Some languages, like java and "gcc -fwrapv" C, make signed overflow defined behaviour with wrapping - that just hides problems and means the compiler can't be of any help with warnings or run-time checks that might have helped you find your bugs. And having a defined behaviour limits the optimiser - it limits how expressions can be re-arranged and simplified. > pretty special, the binary representation is fixed, simple and maps well > to the behavior of existing hardware bit registers. So it makes a bit > more sense. Unsigned overflow wrapping is usually a mistake too, but it is perhaps a little more useful than with signed. And it is useful to have /some/ way to get wrapping overflow for the rare occasions when it is needed. > just hiding some bugs and allowing some too clever folks to cut some > corners. In a language like C++ it should be a library feature (and of > course also allow for other ranges than 0..256^N). Agreed. Surely it shouldn't be too hard to make a standardised library for integers with a selection of traits that could be chosen, such as sizes and behaviour on overflow (undefined behaviour, unspecified result, wrapping, saturation, exception, errno, fatal error, etc.) |
Bart <bc@freeuk.com>: Feb 19 11:03AM On 19/02/2019 08:01, David Brown wrote: >> more sense. > Unsigned overflow wrapping is usually a mistake too, but it is perhaps a > little more useful than with signed. That's the question that should be asked: what's so special about unsigned over signed that overflow is tolerated with one but that the other. People might use unsigned for various reasons, and probably not often that they actually want modular arithmetic. I might use it to get a more useful numeric range than the same width signed. There are few things a language can do, but what is really undesirable is the compiler (it tends not to be the language), deciding that the programmer could never have been so crass as to deliberately allow signed overflow, so it assumes it can't happen and takes the opportunity to do something entirely unexpected and unintuitive. More importantly, especially with C compilers (as there are necessarily more of these around), to exhibit behaviour which is at odds with most other compilers. Or sometimes even the same one, if that behaviour is only turned on with optimisation. |
David Brown <david.brown@hesbynett.no>: Feb 19 01:05PM +0100 On 19/02/2019 12:03, Bart wrote: > That's the question that should be asked: what's so special about > unsigned over signed that overflow is tolerated with one but that the > other. I agree. I suppose the answer might be that wrapping is occasionally useful, and it can be easily supported for unsigned types in C (unlike for signed types, where it could at most be implementation dependent), so it is then reasonable to support wrapping on unsigned types. Personally, I'd prefer that usually unsigned overflow was also undefined - as my unsigned arithmetic seldom needs to wrap, that would let me get the advantages of undefined behaviour in optimisation. > programmer could never have been so crass as to deliberately allow > signed overflow, so it assumes it can't happen and takes the opportunity > to do something entirely unexpected and unintuitive. It is the language in this case. C assumes the programmer writes correct code. /Compilers/ may choose not to make that assumption in some cases. > more of these around), to exhibit behaviour which is at odds with most > other compilers. Or sometimes even the same one, if that behaviour is > only turned on with optimisation. Of course C compilers exhibit behaviour that is different from other compilers - do you think they should work like Pascal or Fortan compilers? It is a different language. And of course they exhibit different behaviour with different flags. But if your code is accurate, then the results will be the same regardless of optimisation (baring compiler bugs, obviously). If your code fails when optimising, due to signed integer overflow issues, then it will also fail when not optimising - unless you seriously misunderstand the language (in which case there is little hope for any compiler to guess what you might have meant). They symptoms might be different, but the results will be wrong. |
Manfred <noname@add.invalid>: Feb 19 01:54PM +0100 On 2/19/2019 12:03 PM, Bart wrote: >> little more useful than with signed. > That's the question that should be asked: what's so special about > unsigned over signed that overflow is tolerated with one but that the other. Trivial observation, but that answers the question: in unsigned types the first bit is a digit bit. In signed types it is a sign bit. This makes the unsigned type suitable for bit-oriented arithmetic, that is obviously not the same as usual arithmetic, in which wrapping from 0xFFFFFFFF to 0x00000000 can make sense. A good example is given in: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1969.htm under the section "Testing Return Values For Errors" |
Ralf Goertz <me@myprovider.invalid>: Feb 19 03:14PM +0100 Am Tue, 19 Feb 2019 13:54:49 +0100 > This makes the unsigned type suitable for bit-oriented arithmetic, > that is obviously not the same as usual arithmetic, in which wrapping > from 0xFFFFFFFF to 0x00000000 can make sense. Does that really matter? If I have an unsigned short set to its maximum possible value (32767 here) then add 1 I get -32768. Modulo 65536 (the same module that unsigned short has) this is the same as 32768. So with signed you get just another representation of the integers modulo 2^n where n is the number of bits. Some mathematicians prefer one others the other. No big deal. No? |
Bart <bc@freeuk.com>: Feb 19 03:01PM On 19/02/2019 12:54, Manfred wrote: >> other. > Trivial observation, but that answers the question: in unsigned types > the first bit is a digit bit. In signed types it is a sign bit. It is open to interpretation: unsigned int a=2000000000,b=1000000000; signed int c=2000000000,d=1000000000; a+b; c+d; Both of these operations generate the same x86 'add' operation for example. How is it treating the top bit? It doesn't matter. The x86 will set flags both for an unsigned add, and a signed one. The resulting bit pattern is the same in both cases. C however (and presumably C++ inherits) says that the result is undefined for the second 'add' operation, even it is always well-defined for x86. But take this case: unsigned int a=5, b=2; a-b; b-a; C says that both of these are well-defined, even though a-b gives you the expected 3, while b-a gives you 4294967293. No signed arithmetic is going on. But if you interpret the 4294967293 result as two's complement, with the top bit as a sign bit, then you will get -3 if printed. Similarly, the c+d in the first example gives -1294967296, but 3000000000 if the same bit pattern is interpreted as an unsigned value. The two representations are closely related, in that the corresponding bit patterns are interchangeable, but C refuses to acknowledge that. That might be because two's complement representation is not universal, but it means some C compiler ending up doing unexpected things even on the 99.99% of machines which do use it. |
Bart <bc@freeuk.com>: Feb 19 03:11PM On 19/02/2019 12:05, David Brown wrote: > Of course C compilers exhibit behaviour that is different from other > compilers - do you think they should work like Pascal or Fortan > compilers? It is a different language. No, I meant at odds even with other C compilers. |
Manfred <noname@add.invalid>: Feb 19 04:35PM +0100 On 2/19/2019 4:01 PM, Bart wrote: > example. How is it treating the top bit? It doesn't matter. The x86 will > set flags both for an unsigned add, and a signed one. The resulting bit > pattern is the same in both cases. I know, but in the case of 'signed' It Happens to Work™, under the assumption of an underlying two's complement representation. In the case of 'unsigned' It Just Works™, guaranteed by the standard and with no assumptions required. When dealing with any kind of workable logic (math, algorithms, etc.) the possibility of dropping an assumption is an added value. > C however (and presumably C++ inherits) says that the result is > undefined for the second 'add' operation, even it is always well-defined > for x86. You know that C as a language and x86 as an architecture are two different things. > No signed arithmetic is going on. But if you interpret the 4294967293 > result as two's complement, with the top bit as a sign bit, then you > will get -3 if printed. This too is based on the assumption of two's complement. > Similarly, the c+d in the first example gives -1294967296, but > 3000000000 if the same bit pattern is interpreted as an unsigned value. As a confirmation of earlier arguments, here -1294967296 as a result falls into the category of 'rubbish', 3000000000 does not. > The two representations are closely related, in that the corresponding > bit patterns are interchangeable, but C refuses to acknowledge that. The two representations are interchangeable under the assumption of two's complement. C has made the choice of lifting this constraint. > That might be because two's complement representation is not universal, > but it means some C compiler ending up doing unexpected things even on > the 99.99% of machines which do use it. I wouldn't look at this this way. I believe the rationale is that wrapping overflow has negligible use for signed types, while it has clear value for unsigned ones (see earlier example). More than the compiler doing unexpected things, it is that C requires the programmer to pay attention to details. It is not distracted-friendly. |
"Öö Tiib" <ootiib@hot.ee>: Feb 19 08:12AM -0800 On Monday, 18 February 2019 20:56:42 UTC+2, David Brown wrote: > I am sorry, but you appear to be confused. The "Meltdown and Spectre" > stuff is a hardware problem, not a software problem. Compiler writers > have tried to find ways to work around the hardware flaw. The hardware vendors are major employers of those compiler writers. > only work on benchmark" claims - the people writing them are widely > dispersed with totally different kinds of employers. In particular, the > gcc developers fall into several categories: Note that "they only work on benchmark" is not quote of mine. I can't defend positions of straw-men that you build. ;-) Will you tell next that I wrote somewhere that "I do not care about performance"? > chips, so their concern is that you (the programmer) get the best from > the compiler and their chip. Benchmarks for the compiler don't matter - > support for the chip matters. The market statements are not mostly about performance of those chips and the likes of "Meltdown and Spectre" are not result of overeager optimizing of those and even when it seems to be so then nothing of it is measured with benchmarks compiled on those compilers? > provide tools and services to developers. They want programmers to be > happy with the tools - they don't care if you use a different compiler > instead. The programmers they target are not overly concerned with performance? My impression is that programmers start to post about performance before figuring out how to turn the optimizations on. > 3. Those working for big users, like Google and Facebook. They don't > care about benchmarks - they care about performance on their own software. They do not use compilers to build their software and so do not care about compiler optimizations? > 4. The independent and volunteer developers. They care about the > quality of their code, and making something worthwhile - they don't care > about benchmark performances. Again there are developers who don't care about performance? > with newer versions, but not more than that. And those that pick a > compiler for its speed, do so based on the speed for their own source > code, not for some benchmark. However it seems that there are only few weirdos like me who think that it does not matter how fast the wrong answers are calculated and consider it better when those wrong answers are not calculated at all. > money. Who would profit from making compilers focused on benchmark > performance as the main goal, with a disregard for support for existing > C or C++ sources? What conspiracy theory? Where did I say that they disregard support for existing source code? If to follow money then Google, Apple, Microsoft and Oracle have tons of own C and C++ source code that they want to be performant and don't want to break but they want developers to use Go, Swift, C# or Java. So indeed they might want to "extend" crappy "features" and "optimizations" into C++ that they won't ever use in and with their own code. > you can often find these early and fix them, but sometimes the flaws are > discovered quite late. Hardware flaws are harder to fix - but very easy > for amateurs to condemn once they are found. The cache, branch prediction and speculative execution are performance optimizations piled together. That can be tricky to get such a pile correct and if to prioritize correctness below performance then defects slip through. Same things do happen with compiler optimizations. > code in the manner the programmer appeared to expect. But for most > undefined behaviour, it is hard or impossible to guess what the > programmer expected - that is the nature of undefined behaviour. What is so controversial what is the behavior that programmer expects on case of -ftrapv? Fortunately has been obvious that I want division-by-zero to trap (even on MIPS and ARM, without special compiler options) but that might also change out of blue when a way to "optimize" it will be discovered, and then we need to add some -fplease-dont-remove-divide-by-zero-trap I suspect. > particular, it does not trap in all cases. Personally, I think the flag > should be dropped in favour of the sanitizer, which is a more modern and > flexible alternative and which is actively maintained. Sanitizers sound like debugging options. Why two almost equal features are developed into same tool? With such logic one day the -fsanitize=signed-integer-overflow will also become "poor" and "unreliable" and then some third feature will be the "correct" way to make programs to crash on signed integer overflow. With feature creep after a while nothing is reliable. > usually aim to right correct code, and have that correct code run as > fast as reasonably possible. If you want software that is full of > run-time checks, you don't program in C or C++. See? We have modern, branch-predicting and eagerly executing hardware with megabytes of cache but reasonability of -ftrapv usage is questionable. Sure, it is not for omnipotent programmers, without even measuring what it costs but I am fallible and have never met anyone almighty. How is it so self-evident that -ftrapv is unreasonable option? Correctness of behavior is more important than performance of behavior and incorrect behavior is often worse than no behavior whatsoever. > In C and C++, you can always manually add any checks you want. With > C++, you can make your own types that do checking in the manner that > suits your needs. Why? I agree that signed integer overflow is programming error and so if the integer calculations happen to be important then the -ftrapv handles it correctly (when it happens to work). Also how can I be sure that they don't "optimize" my manual checks away? After a while it can be like with null pointers. One optimizes dereference away, other optimizes null pointer check away (since it was "after" dereference) and result is that neither dereference does crash nor check work. |
Manfred <noname@add.invalid>: Feb 19 05:43PM +0100 On 2/18/2019 7:56 PM, David Brown wrote: > only work on benchmark" claims - the people writing them are widely > dispersed with totally different kinds of employers. In particular, the > gcc developers fall into several categories: I believe you about the categories below, but I find it hard to believe that benchmarks don't matter. In fact whenever results from different compilers are presented, performance is one of the key metrics that is discussed, which is obvious since correctness of the generated binary is no issue for any self-respecting compiler. Moreover, most occurrences of "undefined behavior" are usually justified by increased performance of compiler optimizations. > chips, so their concern is that you (the programmer) get the best from > the compiler and their chip. Benchmarks for the compiler don't matter - > support for the chip matters. But they care that the chip is properly supported by the compiler, and "properly" includes performance. > provide tools and services to developers. They want programmers to be > happy with the tools - they don't care if you use a different compiler > instead. You may have a point in this category. I would say that more than making /programmers/ happy, they want to make software project /mangers/ happy, which brings into the picture features like time-to-market, man-hours cost etc., all goals that come prior to performance of the final product. This is IMHO one reason for which in many benchmarks MSVC comes usually after clang and gcc (in this order) > 3. Those working for big users, like Google and Facebook. They don't > care about benchmarks - they care about performance on their own software. True, but they know that in order to get performance out of their application software, they need performing compilers. I believe clang developers know this very well. > 4. The independent and volunteer developers. They care about the > quality of their code, and making something worthwhile - they don't care > about benchmark performances. Not really. Performance may not be their first goal (compared to e.g. being the first to support the latest technology for the open source community), but sure it is one of the goals, especially after first introducing such new technologies. |
jameskuyper@alumni.caltech.edu: Feb 19 10:04AM -0800 On Monday, February 18, 2019 at 6:40:35 PM UTC-5, Chris M. Thomasson wrote: > On 2/18/2019 2:45 PM, jameskuyper@alumni.caltech.edu wrote: ... > > different from the code that has been presented makes me unwilling to > > invest much time on the code that has been presented. However, a quick > > scan lead me to the following questions: ... > You snipped way to much of my code. Your reddit critic quoted text, presumable from your code that he was commenting about, that doesn't match any of the code you've posted on this thread. The topic of this thread being the validity of his comments, I've no particular interest in diving deeply into code that isn't the code he was commenting about. I'm certainly not interested in your code for it's own sake. I just scanned your code, made a few comments, and cut everything not directly relevant to the comments I made. Note - another indication that he was commenting on different code than I was looking at is his comment that m_count is never decremented. The code you presented does decrement m_count. If the code he was commenting on also had a decrement in it, then when he made that comment your best response would have been to point out where that decrement was. Why didn't you respond in that fashion? If it didn't have a decrement, then one of the dangerous cases is not when you have LONG_MAX simultaneous lockers, but whenever you've had a total of LONG_MAX lockers, whether or not they've ever been unlocked. For programs that run 24/7 for long periods of time (which some do), that could be a significant problem I've no idea whether or not this is the case, since I haven't seen the code he was commenting on. ... > ... Why did you snip > so much context? As explained above. > > You said that m_count can have any value in the range -LONG_MAX to > > LONG_MAX, inclusive. Under what circumstances would m_count be positive? > m_count is initialized to LONG_MAX. It means no activity. For the second time in a row, you responded to a question about m_count being positive, by addressing only the possibility that m_count is LONG_MAX. What about values for m_count from 1 to LONG_MAX-1? The code I saw did nothing to prevent a call to fetch_add() when m_count happens to have a value that trigger overflow inside fetch_add(). Your responses seem to all take the form of saying that it never makes sense to call the relevant functions when m_count has such a value. If that is true (and for the reasons I've given above, I haven't bothered checking whether it is), then explaining the correct rules for using your package so as to avoid signed integer overflow would have been a much better response than asserting, without explanation, that your critics have failed to understand your package. ... > > very same question on Reddit (in less detail) just denigrated his > > understanding without bother to explain what those reasons are. > You should take a deeper look. Post the code that your critic was commenting on, and I might bother examining it more carefully. Thinking about this issue led me to another point. Could you describe what m_count means? I'm looking for a description that remains accurate before and after you add LONG_MAX to m_count. Similarly, it should remain accurate before and after you subtract LONG_MAX from m_count. To put it in another, more formal way, can you define a class invariant involving m_count which remains unchanged after calling the member function that adds LONG_MAX to it? It should also remain unchanged after calling the member function that subtracts LONG_MAX from it. I'm not saying that this can't be done, but only that I'm having a hard timing imagining how it could be done. My imagination has often failed me in similar circumstances. I might be able to figure this out on my own from close examination of your code, but for the reasons given above, I'm unwilling to perform such an examination. In any event, I shouldn't have to - a description of what m_oount means should be part of the documentation for your design of this class. Not everyone bothers thinking explicitly in terms of class invariants (I certainly can't claim that I do so on every occasion), but well-designed classes should have such invariants, even if the designer wasn't thinking explicitly about them. |
Bart <bc@freeuk.com>: Feb 19 07:57PM On 19/02/2019 15:35, Manfred wrote: > with no assumptions required. > When dealing with any kind of workable logic (math, algorithms, etc.) > the possibility of dropping an assumption is an added value. I wonder how many programs would stop working if they were recompiled for a processor that didn't use two's complement (assume everything else needed to run them is the same). >> result as two's complement, with the top bit as a sign bit, then you >> will get -3 if printed. > This too is based on the assumption of two's complement. I think I said I chose x86 as an example, but it's a VERY common example of how things actually work. Do you have any instance of actual machines that don't use two's complement? I'd imagine they would either be obsolete, be odd micro-controllers, or be some IBM mainframe. However, looking at the numeric limits of various other languages for a 32-bit signed int type: C# -2**31 to +2**31-1 Java -2**31 to +2**31-1 Rust believed to be -2**31 to +2**31-1 Go -2**31 to +2**31-1 D not specified that I could see Where known, they all assume two's complement, regardless of machine. And all, including D, seem to have agreed on widths of 8/16/32/64 bits. C doesn't even stipulate that. Can you blame anyone for wanting a language to acknowledge something that is pretty much universal? > As a confirmation of earlier arguments, here -1294967296 as a result > falls into the category of 'rubbish', 3000000000 does not. So just like 1u-2u, but that one apparently is not classed as rubbish. > I wouldn't look at this this way. I believe the rationale is that > wrapping overflow has negligible use for signed types, while it has > clear value for unsigned ones (see earlier example). After 40 years of experiencing exactly that behaviour on a fair number of machines and languages, it's what I'm come to expect. Take 8-bit signed to keep the numbers simple: (120+20) will overflow the upper limit of +127. But what about (120+20)-20? This should end up back as 120 using two's complement, but according to C, it's undefined because an intermediate value overflows. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 01:43PM -0800 >> Here is the code: Still have to try it out on GCC in c++17 mode. >> Can anybody run it? >> https://pastebin.com/raw/xCBHY9qd [...] > Testing: Chris M. Thomasson's Experimental Read/Write Mutex > msec = 40027 > shared.m_state = 160000000 [...] > Testing: Chris M. Thomasson's Experimental Read/Write Mutex > msec = 37518 > shared.m_state = 160000000 [...] > VS SHARED MUTEX: > Ran for 15 minutes then i stopped it. It's not even close. Wow. Thank you so much for taking the time to test it for yourself. I really do appreciate it. It seems that some of the std::shared_mutex impls are created around pure mutex/condvar. So, it is only slow pathed (always blocking). My algorithms try to create fast-paths (non-blocking) paths that can cleverly "skip" calls into the "slow" paths, so to speak. I am working on another type of benchmark code. Will get back to you. Thanks again. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 18 08:55PM -0800 On 2/17/2019 3:06 PM, Chris M. Thomasson wrote: > Writers would push and pop items from the list. > How many reads and writes can be completed per-second, per-thread if the > test was run for a fixed amount of time... Fwiw, here is such a test using Relacy Race Detector, you know, just to keep things in line: https://groups.google.com/d/msg/lock-free/zzZX4fvtG04/-SFh9r_JAQAJ https://pastebin.com/raw/xCBHY9qd (link to raw code) Readers iterate all the nodes in struct ct_simple_stack. Writers push and pop from it. Next step is coding it up in pure 100% standard C++11. Well, C++17 wrt comparing it against std::shared_mutex... ;^) _________________________ // Experimental Read-Write Mutex Test // More "realistic" test, in Relacy... // By: Chris M. Thomasson //_______________________________________________ //#define RL_DEBUGBREAK_ON_ASSERT //#define RL_MSVC_OUTPUT //#define RL_FORCE_SEQ_CST //#define RL_GC #include <relacy/relacy_std.hpp> #include <iostream> // Simple macro based redirection of the verbose std membars. #define CT_MB_ACQ std::memory_order_acquire #define CT_MB_REL std::memory_order_release #define CT_MB_RLX std::memory_order_relaxed #define CT_MB_ACQ_REL std::memory_order_acq_rel #define CT_MB_SEQ_CST std::memory_order_seq_cst #define CT_MB(mp_0) std::atomic_thread_fence(mp_0) #define READERS 4 #define WRITERS 2 #define THREADS (READERS + WRITERS) #define ITERS 3 //////////////// // bare bones mutex/condvar based semaphore struct ct_slow_semaphore { VAR_T(unsigned long) m_state; std::mutex m_mutex; std::condition_variable m_cond; ct_slow_semaphore(unsigned long state) : m_state(state) {} void inc() { m_mutex.lock($); ++VAR(m_state); m_mutex.unlock($); m_cond.notify_one($); } void add(unsigned long addend) { m_mutex.lock($); VAR(m_state) += addend; m_mutex.unlock($); m_cond.notify_all($); } void dec() { m_mutex.lock($); while (VAR(m_state) == 0) m_cond.wait(m_mutex, $); --VAR(m_state); m_mutex.unlock($); } }; // bin-sema struct ct_auto_reset_event { bool m_state; std::mutex m_mutex; std::condition_variable m_cond; ct_auto_reset_event() : m_state(false) {} void signal() { m_mutex.lock($); m_state = true; m_cond.notify_one($); m_mutex.unlock($); } void wait() { m_mutex.lock($); while (m_state == false) m_cond.wait(m_mutex, $); m_state = false; // auto-reset m_mutex.unlock($); } }; // just a layer over an auto-reset event struct ct_fast_mutex { std::atomic<unsigned int> m_state; ct_auto_reset_event m_waitset; ct_fast_mutex() : m_state(0) {} void lock() { if (m_state.exchange(1, std::memory_order_acquire)) { while (m_state.exchange(2, std::memory_order_acquire)) { m_waitset.wait(); } } } void unlock() { if (m_state.exchange(0, std::memory_order_release) == 2) { m_waitset.signal(); } } }; // Chris M. Thomassons Experimental Read/Write Mutex // Yeah, it is pretty damn fat wrt the state, however // it has some interesting properties... // The state can be compressed a bit... // btw, it has no loops... // Take a look at the lock_shared and unlock_shared functions #define RWMUTEX_COUNT_MAX LONG_MAX struct ct_rwmutex { // shared state std::atomic<long> m_wrstate; std::atomic<long> m_count; std::atomic<long> m_rdwake; ct_slow_semaphore m_rdwset; ct_slow_semaphore m_wrwset; ct_fast_mutex m_wrlock; ct_rwmutex() : m_wrstate(1), m_count(RWMUTEX_COUNT_MAX), m_rdwake(0), m_rdwset(0), m_wrwset(0) { } // READ, pretty slim... void lock_shared() { if (m_count.fetch_add(-1, std::memory_order_acquire) < 1) { m_rdwset.dec(); } } void unlock_shared() { if (m_count.fetch_add(1, std::memory_order_release) < 0) { if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1) { m_wrwset.inc(); } } } // WRITE, more hefty void lock() { m_wrlock.lock(); long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX, std::memory_order_acquire); if (count < RWMUTEX_COUNT_MAX) { long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count, std::memory_order_acquire); if (rdwake + RWMUTEX_COUNT_MAX - count) { m_wrwset.dec(); } } } // write unlock void unlock() { long count = m_count.fetch_add(RWMUTEX_COUNT_MAX, std::memory_order_release); if (count < 0) { m_rdwset.add(-count); } m_wrlock.unlock(); } }; //////////////// struct ct_simple_stack { struct node { VAR_T(node*) m_next; VAR_T(unsigned int) m_tid; node(unsigned int tid) : m_tid(tid) {} }; VAR_T(node*) m_head; ct_simple_stack() : m_head(nullptr) {} void push(node* n) { VAR(n->m_next) = VAR(m_head); VAR(m_head) = n; } node* pop() { node* n = VAR(m_head); if (n) { VAR(m_head) = VAR(n->m_next); } return n; } node* flush() { node* n = VAR(m_head); VAR(m_head) = nullptr; return n; } }; void ct_destroy_nodes(ct_simple_stack& s) { ct_simple_stack::node* n = s.flush(); while (n) { ct_simple_stack::node* next = VAR(n->m_next); delete n; n = next; } } // Relacy Stack Test... struct ct_fast_mutex_test : rl::test_suite<ct_fast_mutex_test, THREADS> { ct_rwmutex g_rwmutex; ct_simple_stack g_stack; void before() { } void after() { ct_destroy_nodes(g_stack); } // reader void reader(unsigned int tidx) { g_rwmutex.lock_shared(); for (unsigned int i = 0; i < ITERS; ++i) { ct_simple_stack::node* h = VAR(g_stack.m_head); while (h) { ct_simple_stack::node* next = VAR(h->m_next); if (i % 512 == 0) { rl::yield(1, $); } h = next; } } g_rwmutex.unlock_shared(); } // writer void writer(unsigned int tidx) { g_rwmutex.lock(); g_stack.push(new ct_simple_stack::node(tidx)); g_stack.push(new ct_simple_stack::node(tidx)); g_rwmutex.unlock(); g_rwmutex.lock(); ct_simple_stack::node* n = g_stack.pop(); g_rwmutex.unlock(); if (n) { // destroy delete n; } } void thread(unsigned int tidx) { if (tidx < READERS) { reader(tidx); } else { writer(tidx); } } }; // Test away... Or fly? Humm... int main() { { rl::test_params p; p.iteration_count = 10000000; //p.execution_depth_limit = 33333; //p.search_type = rl::sched_bound; //p.search_type = rl::fair_full_search_scheduler_type; //p.search_type = rl::fair_context_bound_scheduler_type; rl::simulate<ct_fast_mutex_test>(p); } return 0; } _________________________ |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment