- is there any UB in this integer math... - 10 Updates
- new benchmark for my read/write algorithm... - 15 Updates
Paavo Helde <myfirstname@osa.pri.ee>: Feb 21 11:56AM +0200 On 21.02.2019 11:24, Öö Tiib wrote: > a) With -ftrapv our automated crash reporting shows 0.2% crashes per > usage. The patch is up within a week. > b) Without it 0.2% runs result with incorrect answers. This does not follow. As pointed out by others, the intermediate overflows can still produce correct results when balanced with opposite overflows or cast to unsigned. Based on my limited experience, over half of -ftrapv crashes would be false alarms. Depends on the codebase, of course. Now it would be up to the management to decide whether the problems caused by false alarms on customer sites justify the earlier catching of non-false alarms. |
Manfred <noname@add.invalid>: Feb 19 09:55PM +0100 On 2/19/2019 8:57 PM, Bart wrote: >> As a confirmation of earlier arguments, here -1294967296 as a result >> falls into the category of 'rubbish', 3000000000 does not. > So just like 1u-2u, but that one apparently is not classed as rubbish. 1u-2u is rubbish in conventional arithmetic, but it is not in wrapping arithmetic. Any programmer knows that the result is 0xFFFFFFFF, and this is what C guarantees. >> clear value for unsigned ones (see earlier example). > After 40 years of experiencing exactly that behaviour on a fair number > of machines and languages, it's what I'm come to expect. I'll rephrase: C requires, like for most program behaviours, that wrapping be intentional for the programmer to use. Since an intentional use of wrapping implies that all bits follow the same semantics, then the unsigned type is the correct choice. Using wrapping arithmetic with a signed type is not correct - purely from a binary logic point of view, because that first bit would be arbitrarily supposed to behave like the others - thus C assumes that a careful programmer will not use it with signed types, and leaves freedom to compiler writers to use this feature space anyhow they like, by specifying undefined behavior. I would guess that since this has been clearly and intentionally specified in the standard, at least /some/ use for this 'undefinedness' must have been foreseen by the committee. |
Manfred <noname@add.invalid>: Feb 19 04:35PM +0100 On 2/19/2019 4:01 PM, Bart wrote: > example. How is it treating the top bit? It doesn't matter. The x86 will > set flags both for an unsigned add, and a signed one. The resulting bit > pattern is the same in both cases. I know, but in the case of 'signed' It Happens to Work™, under the assumption of an underlying two's complement representation. In the case of 'unsigned' It Just Works™, guaranteed by the standard and with no assumptions required. When dealing with any kind of workable logic (math, algorithms, etc.) the possibility of dropping an assumption is an added value. > C however (and presumably C++ inherits) says that the result is > undefined for the second 'add' operation, even it is always well-defined > for x86. You know that C as a language and x86 as an architecture are two different things. > No signed arithmetic is going on. But if you interpret the 4294967293 > result as two's complement, with the top bit as a sign bit, then you > will get -3 if printed. This too is based on the assumption of two's complement. > Similarly, the c+d in the first example gives -1294967296, but > 3000000000 if the same bit pattern is interpreted as an unsigned value. As a confirmation of earlier arguments, here -1294967296 as a result falls into the category of 'rubbish', 3000000000 does not. > The two representations are closely related, in that the corresponding > bit patterns are interchangeable, but C refuses to acknowledge that. The two representations are interchangeable under the assumption of two's complement. C has made the choice of lifting this constraint. > That might be because two's complement representation is not universal, > but it means some C compiler ending up doing unexpected things even on > the 99.99% of machines which do use it. I wouldn't look at this this way. I believe the rationale is that wrapping overflow has negligible use for signed types, while it has clear value for unsigned ones (see earlier example). More than the compiler doing unexpected things, it is that C requires the programmer to pay attention to details. It is not distracted-friendly. |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 19 10:31PM +0200 On 19.02.2019 21:57, Bart wrote: > upper limit of +127. But what about (120+20)-20? > This should end up back as 120 using two's complement, but according to > C, it's undefined because an intermediate value overflows. And rightly so. Try e.g. (120+20)/2 - 20, with those imaginary 8-bit wrap-around rules the result would be -78, not the expected 50. In C++ 8-bit ints get promoted to normal ints, so the above is not a real example, but there are actual problems with larger numbers. Effectively you are saying that some sloppy code is OK because it accidentally happens to work most of the time, whereas some other very similar sloppy code is not OK. On the same token one could claim that writing to a location after a dynamic array of odd number of elements is OK as the memory allocator would leave an unused alignment gap at that place anyway. |
David Brown <david.brown@hesbynett.no>: Feb 20 12:26PM +0100 On 19/02/2019 17:12, Öö Tiib wrote: > Note that "they only work on benchmark" is not quote of mine. I can't > defend positions of straw-men that you build. ;-) Will you tell next > that I wrote somewhere that "I do not care about performance"? I did not mean to imply that you used those words - I meant it as a general sort of remark that I have seen made many times. Some people are convinced that compiler makers are concerned only about the speed of benchmarks, not how their compiler works for other code. > and the likes of "Meltdown and Spectre" are not result of overeager > optimizing of those and even when it seems to be so then nothing of it > is measured with benchmarks compiled on those compilers? Meltdown and Spectre are both /hardware/ problems. Can we at least agree on that? Compilers have been able to add additional code to reduce the effect of the problems - but this is /not/ because the compilers were previously generating incorrect or suspect code. (And while it's easy to see in hindsight that Intel should have been more careful with speculation across privilege levels, it was harder to see that in advance. There is no way to make a system where code from multiple sources can run and share resources, without information leakage at least in theory.) >> happy with the tools - they don't care if you use a different compiler >> instead. > The programmers they target are not overly concerned with performance? Of course performance is important to many people - compiler writers and compiler users included. Performance on /benchmarks/ is of smaller concern, especially of artificial benchmarks. Programmers want /their/ programs to run fast. > My impression is that programmers start to post about performance before > figuring out how to turn the optimizations on. It is often the case that people who understand less, post more. (That is in no way a comment about anyone posting here, in case you feel I was trying to be insulting.) >> care about benchmarks - they care about performance on their own software. > They do not use compilers to build their software and so do not care about > compiler optimizations? I really don't understand your point. I said they /do/ care about performance - and therefore also about compiler optimisations. >> quality of their code, and making something worthwhile - they don't care >> about benchmark performances. > Again there are developers who don't care about performance? I think you have been misunderstanding my point all along here. I apologise if I have been unclear. Yes, compiler writers care about performance and optimisations. I have said this all along. No, compiler writers are /not/ solely concerned about performance and optimisations. It is merely one of many factors. The same applies to people choosing compilers. And it is definitely not the case that compiler writers care /primarily/ about the speed on artificial benchmark programs (like SPEC benchmarks). They do not write their compilers with an aim of getting top marks on these benchmarks. That does not mean they don't run them sometimes, and compare figures (especially if there are regressions). It just means that they do not use them as driving forces. It is not the case that compiler writers are always looking for new ways to take undefined behaviour in source code and turn it into a mess in the object code. Rather, it /is/ the case that compiler writers have no responsibility to try to generate "correct" results from incorrect source code. They have /never/ had that responsibility. From the days of the first C compilers 40+ years ago, signed integer overflow has been undefined behaviour and compilers could assume it never happened if that helped optimisation. Code that assumes wrapping behaviour is either tied to a specific compiler (if the compiler documents that behaviour), or it is /wrong/. As compilers have got more sophisticated, there have been more occasions where incorrect code had predictable behaviour in the past. That is why compiler writers add extra "this code has mistakes, but worked with the behaviour of the old compiler" flags like "-fwrapv" or "-fno-strict-aliasing". > However it seems that there are only few weirdos like me who think that > it does not matter how fast the wrong answers are calculated and consider > it better when those wrong answers are not calculated at all. /I/ don't care how fast wrong answers are calculated. I just don't expect compilers to calculate right answers from wrong code. And I do care how fast right answers are calculated from correct code. >> C or C++ sources? > What conspiracy theory? Where did I say that they disregard support for > existing source code? You wrote "Since compiler writers are people with extremely benchmark-oriented head shape" and then suggested it is because that's what their employers tell them to do - /that/ is the "conspiracy theory". > Go, Swift, C# or Java. So indeed they might want to "extend" crappy > "features" and "optimizations" into C++ that they won't ever use in > and with their own code. You think Microsoft would intentionally introduce broken features or mistakes into their C++ compilers to push people towards using C# ? > optimizations piled together. That can be tricky to get such a pile > correct and if to prioritize correctness below performance then defects > slip through. Same things do happen with compiler optimizations. My understanding of the hardware problems here is that the operation was perfectly correct - but there was information leakage. A certain degree of information leakage is inevitable when there are shared resources, but this situation could be exploited more conveniently than many others. Compiler optimisations are a different game altogether. The main similarity seems to be "they are both hard, so someone might make a mistake". > compiler options) but that might also change out of blue when > a way to "optimize" it will be discovered, and then we need to > add some -fplease-dont-remove-divide-by-zero-trap I suspect. "-ftrapv" was somewhat poorly conceived and specified in the first place (IMHO), and has only got worse. Originally compilers were relatively straightforward - if your code said "x = y + z;", you'd get an "add" assembler instruction. "-ftrapv" just meant adding a "trap if overflow flag set" instruction after it. Then things got complicated - what about "x = y * 5", that can be done with a single "lea" instruction (on x86)? What about "x = y + z - y;" - do you simplify without trapping, or do you detect overflows on the intermediary values? The original idea was to have a cheap and simple way to spot overflow errors, but the practice is a good deal more complicated, and subject to a lot of opinions as to what kinds of checks should be done. So if I wanted code to check for overflows, I'd do it specifically in the situations where it was appropriate - using __builtin_add_overflow and friends, or more manual checking (using C++ classes to keep it all neat). And for debugging or testing, -fsanitize=undefined is a good choice. >> should be dropped in favour of the sanitizer, which is a more modern and >> flexible alternative and which is actively maintained. > Sanitizers sound like debugging options. Yes. > Why two almost equal features > are developed into same tool? -ftrapv is a debugging tool. It is not really appropriate for final binaries - it is too unpredictable, and plays havoc with code optimisation. In particular, expressions involving several parts are either significantly slower as each bit needs to be handled separately, or you don't get full overflow checking. (I have no idea where the lines are drawn in practice in gcc.) > and "unreliable" and then some third feature will be the "correct" > way to make programs to crash on signed integer overflow. With > feature creep after a while nothing is reliable. There is /no/ "correct way to make programs crash on undefined behaviour". There never has been, and never will be. The best you can get are tools to help you find the bugs in your program when testing and debugging. Maybe one day other tools will replace the sanitize options. Options like "-ftrapv" have never been about changing the semantics of C to give a defined behaviour to signed integer overflow. (This is different from -fwrapv.) If you want to complain about poor or misleading documentation in the gcc manual page, then I can agree with that. C and C++ are languages which trust the programmer - they expect you to write code that is correct and will not misbehave. They have little or no checking unless you add it manually, which is how you can get efficient code. But there is nothing hindering you from manual checks - especially in C++ where you can make classes with operators to simplify the process. (Yes, I know that means you can get overflow checking in your own code, but can't easily add it to existing code.) > How is it so self-evident that -ftrapv is unreasonable option? > Correctness of behavior is more important than performance of behavior > and incorrect behavior is often worse than no behavior whatsoever. "-ftrapv" does not turn incorrect behaviour into correct behaviour. At best, it helps you spot the incorrect behaviour so that you can fix the program. (And that's a useful thing for testing and debugging.) > dereference away, other optimizes null pointer check away (since it > was "after" dereference) and result is that neither dereference does > crash nor check work. You can be sure it works by writing the code correctly. The "null pointer" incident you are alluding to was a serious bug in the Linux source code - changes to the compiler optimisation affected the consequences of the bug, but did not introduce the bug. Compilers assume that the programmer obeys the rules of the language - including the "thou shalt not dereference a null pointer" rule. |
David Brown <david.brown@hesbynett.no>: Feb 21 10:58PM +0100 On 21/02/2019 18:35, Öö Tiib wrote: >> In what world are these checks "free" ? > Free in sense that I do not have to write and to use classes that do > same thing and likely even less efficiently than those two options. Classes could be more flexible and more efficient (certainly more efficient than the -ftrapv code), and give you accurate control of when you use overflow detection instead of an "all or nothing" choice. But I agree that it takes effort to make and use them. > I have repeatedly expressed my lack of knowledge about any beings in > this universe who can write correct programs. Only "programmers" who > do not write incorrect programs are those who do not write programs. It is entirely possible to eliminate large classes of bugs - at least on certain kinds of programming. For most of my work, I know that my code has /no/ bugs with dynamic memory allocation or freeing. It has /no/ bugs on signed integer overflow. It never has divide by zeros, or shifts of negative numbers. It does not have code that accidentally uses variables without initialising them. I can't promise I have no out-of-bounds array accesses, but they should be extraordinarily rare - as will buffer overflows. And even if you assume that it is plausible that you have signed integer overflow, how do you think -ftrapv improves matters? Clearly it could be useful in testing and debugging - that is what it is for. But in released code, who would be happy with a crash and an overflow error message? I could understand this more if we were talking about a type of error that is harder to avoid, or harder to catch in testing - things like race conditions and deadlocks in multi-threaded code. But it is /easy/ to avoid signed integer overflow. It is /easy/ to write your code in a way that it can't happen, or that can detect it appropriately and respond sensibly, rather than crashing. >> that your car breaks down on the journey. > So you would prefer b) or have some kind of c) that results with something > better in that situation or what you were trying to say? Sorry - it looks like you have cut part of your answer here. I am not sure what you meant to write. |
Rosario19 <Ros@invalid.invalid>: Feb 22 11:05AM +0100 On Sun, 17 Feb 2019 20:23:13 -0800, "Chris M. Thomasson" > gcount += addend; > return lcount; >} the behaviour can not be definited because long is not a fixed size type is it 32 bit? is it 64? than even if it is fixed to 64 bit for example: is the operation + defined from standard to all value in the range over is defined? |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 22 12:33PM On Thu, 2019-02-21, David Brown wrote: > <https://godbolt.org> and try some samples with optimisation on (-O1 or > -O2), with and without -"ftrapv" and > "-fsanitize=signed-integer-overflow". Let me give you a couple of examples: ... > In what world are these checks "free" ? Devil's advocate: when you have way more computing power than you need, you can spend it on things like this. ... >> leave -ftrapv on forever. > In my world, a program that crashes with a message "overflow detected" > is /not/ correct. But at least you don't get incorrect calculations, which seems to be his point. It's not hard to see what you both mean. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
David Brown <david.brown@hesbynett.no>: Feb 22 01:31PM +0100 On 22/02/2019 12:05, Öö Tiib wrote: > tells that they found 43 sites of undefined integer overflows in > Microsoft SafeInt library (that is supposed to be such classes and > functions). What can I say? You are right that most software has bugs - and that many projects don't have a development methodology that is strong enough to stop such preventable bugs. Integer overflows are just bugs like anything else - but they are preventable bugs as long as you take reasonable care. As for the particular case of MS's "SafeInt" library - let me just say that finding 43 bugs in it does not harm their reputation in my eyes. I am only familiar with one name on that paper - John Regehr. I have read other publications by him. He seems to make a living from making condescending and sarcastic criticisms of C and C++ without really appreciating the point of the languages. His is particularly critical of compilers that support experienced developers and assume the programmers actually understand the language they are using - he appears to think C programmers believe the language is just Java without classes, and compilers should behave accordingly. Still, the paper had some interesting points. One surprising case is the expression "char c = CHAR_MAX; c++;" which they say varies according to compiler. "The question is: Does c get promoted to int before being incremented? If so, the behaviour is well-defined. We found disagreement between compiler vendors' implementations of this construct". The C standards are quite clear on the matter - "c++;" has the result of the original value of c, and the side effect of "c = c + 1;", for which c is first promoted to "int". If CHAR_MAX = INT_MAX, this is undefined - otherwise it is well defined. Converting back to "char" is implementation-defined if "char" is signed, well-defined modulo conversion if it is unsigned. The paper rightly points out that some programmers /don't/ understand that that signed integer overflow is undefined behaviour - or that they "know" the rules but think it is safe to break them and assume wrapping behaviour. (Typically this is because they have tested some code and seen that it works - for that code, with that compiler version, and those flags.) This is one reason why gcc has the "-fwrapv" flag - if you are faced with third-party code and you are not sure the programmer understands that signed integers do not wrap, you can use the "-fwrapv" flag. If the code is good, it will make no difference to the results (but have some impact on the efficiency). If the code is bad, it will run the way the programmer expected. > Everybody. The sole correct handling of programming defects is to fix > those. Crashes get fixed fastest and so the total count of that defect > ever manifesting in practice will be smallest. Wrong. An alternative handling of defects is to minimise the damage, to maximise the use of the imperfect program. It is a well established "fact" that all complex programs have bugs in them. If they were all to stop with an error report on each bug, the software would be useless until all bugs were eliminated. That might be nice in a perfect world - in the real world, however, it would mean waiting decades for testing and development to finish on big systems. People complain about Windows - imagine how much more they would complain if it blue-screened for every minor bug. And imagine the "fun" if your aeroplane flight controller stopped with an error message because of a bug in a function for a warning light. Even during testing, stopping dead on a bug is not necessarily the best choice - it may be better to collect more data first. >> respond sensibly, rather than crashing. > Misbehavior on case of unusually large numbers for example because > of corrupt data can be quite tricky to test and quite easy to miss. If your data comes from a source where it could be bad, then you check it. > Yes. I am still confused, it does read like you are not discussing the > practical example above that I gave (and to what you seem to respond > with above). I think the direction of the discussion has slid somewhat (no doubt it is my fault), and we may not be discussing the same things. Perhaps it is time to wind it down a bit? |
David Brown <david.brown@hesbynett.no>: Feb 22 04:35PM +0100 On 22/02/2019 13:33, Jorgen Grahn wrote: >> In what world are these checks "free" ? > Devil's advocate: when you have way more computing power than you > need, you can spend it on things like this. In such cases, C or C++ is probably not the language you want. And even if it /is/ the language you want, you would be better using integer classes that are checked and where you have clear and user-defined control of what happens on overflow, rather than using a poorly specified and imperfectly implemented compiler-specific flag. (Though apparently you should avoid using MS's SafeInt library!) >> is /not/ correct. > But at least you don't get incorrect calculations, which seems to be > his point. Agreed. But "-ftrapv" is not, as far as I can see, a good way to get that. > It's not hard to see what you both mean. Yes. |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 20 10:01PM +0200 On 20.02.2019 21:24, Bonita Montero wrote: >>> volatile DWORD m_dwRecursionCount; >> 'volatile' is neither needed nor sufficient for multithreading code. > It is just to document that it may change asyncronously. If so, I see they are accessed without proper locking all over the place. For example, there is no memory barrier after assigning to m_dwOwningThreadId in Lock() and before potential reading it in in another Lock() in another thread, so there is no guarantee the other thread ever sees the changed value. However, now I looked at the MSDN site and saw that by Microsoft rules indeed volatile causes additional memory barriers when compiled with /volatile:ms (which is the default, except for ARM). So my original comment was wrong, volatile is indeed needed for this code to work, but the code is restricted to not just to Windows, but also requires MSVC compiler and its /volatile:ms regime. |
Bonita Montero <Bonita.Montero@gmail.com>: Feb 20 09:34PM +0100 > If so, I see they are accessed without proper locking all over the > place. The flags for the reader- and writer-count and writer-flag (sign bit) are the lock themselfes. And the other two members are only modified by the writer that owns the mutex. > For example, there is no memory barrier after assigning to > m_dwOwningThreadId in Lock() .... That's not explicitly necesary because the Xchg-Intrinsic before has aquire and release behaviour. > ... and before potential reading it in in another Lock() > in another thread, so there is no guarantee the other I think you consider this code in the beginning of Lock(): if( m_dwOwningThreadId == (dwCurrentThreadId = GetCurrentThreadId()) ) { m_dwRecursionCount++; return; } There is absolutely no barrier necessary here! If the thread that does this check is the same that owns the mutex everything is ok because the barrier is the Xchg in the outer Lock(). If the thread doesn't own the mutex, the m_dwOwningThreadId may change at any time and may have any value different than the current thread id. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 12:46PM -0800 On 2/20/2019 4:50 AM, Paavo Helde wrote: >> (0D8B00Ch)] > What I see from here is that apparently you are running in 32-bit mode. > Who would do this in year 2019? Recompile in 64-bit and measure again! OUCH! ARGH.... Will do. Thanks. ;^o |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 12:53PM -0800 On 2/20/2019 10:25 AM, Bonita Montero wrote: > } > } > I'm pretty sure you won't understand the code. Why not? I need to give it a good read, and then test it out in Relacy Race Detector: http://www.1024cores.net/home/relacy-race-detector > But the performance should be unbeatable. ;-) Nice. Okay, I will try to get it up and running in Relacy. It can emulate Windows sync primitives, pretty nice. Once I get it running in there, to make sure it passes tests, then I can add it to my benchmark code. It will be fun. Thanks for the algorithm Bonita. :^) There is a main different between your algorithm and mine, notice that I avoid the use of CAS and loops. I am looking forward to giving your algorithm a go. Thanks again. :^) |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 12:41PM -0800 On 2/20/2019 10:20 PM, Bonita Montero wrote: > And you disbled recursive locking. I think if it is possible to > do shared locking recursively implicitly it should be able to to > exclusive locking explicitly. After I get the weakest memory model, it is going to be fun to test your work against Windows SRW. It should beat it because of the way your are waiting on the semaphore and event. In other words, they are fire and forget, and not hooked up to any looping. Nice. :^) |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 12:11AM -0800 On 2/19/2019 11:59 PM, Ian Collins wrote: >> #define CRUDE_CACHE_PAD 256 // crude cache pad >> #define TEST_DURATION_SEC 60 // number of seconds for the test > General whinge: these ugly macros are bad form in C++! Yeah... Habit from C. Wrt, THREADS being 8, well, yeah. That can get gnarly. Sorry about that. > $ g++ -Ofast x.cc -lpthread -std=c++17 > $ ./a.out > Testing: std::shared_mutex [...] > writes_per_tick = 8 > Ticks = 60.0055 > ___________________________________ [...] > $ g++ -Ofast x.cc -lpthread -std=c++17 > $ ./a.out > Testing: Chris M. Thomasson's Experimental Read/Write Mutex [...] > writes_per_tick = 61 > Ticks = 60.0135 > ___________________________________ Ahh, mine loses here by 98 reads per-second. Yet wins wrt writers by 53 reads a second. Humm... This is definitely different than my other benchmark. Need to think about this... Thanks for giving it a go Ian. :^) |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 01:21AM -0800 On 2/19/2019 10:49 PM, Chris M. Thomasson wrote: [..] > ___________________________________ [...] > unsigned int m_tid; > node(unsigned int tid) : m_tid(tid) {} > }; [...] Humm... The ct_simple_stack::node::m_next should probably be a volatile pointer... > { > shared.m_std_rwmutex.lock_shared(); > ct_simple_stack::node* n = shared.m_stack.m_head; /////////////////////////// > ct_simple_stack::node* next = n->m_next; > n = next; > } /////////////////////////// You know... I am wondering if the optimizer can omit the while loop above. Shi%. > shared.m_reads += reads; > shared.m_std_rwmutex.unlock(); > } [...] Let me change it to a volatile pointer and run the tests again... I edited the code at: https://pastebin.com/raw/1QtPCGhV Just reload the page and you should see version 0.1 pop up. The new code will always have Testing Version 0.1 in front of it. Here are my much different timings: Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex ___________________________________ Raw Reads: 54195 Raw Writes: 3232 reads_per_tick = 902 writes_per_tick = 53 Ticks = 60.0432 ___________________________________ Testing Version 0.1: std::shared_mutex ___________________________________ Raw Reads: 23452 Raw Writes: 1873 reads_per_tick = 390 writes_per_tick = 31 Ticks = 60.0513 ___________________________________ Very different. Mine is still beating std::shared_mutex on my end. Still have to check if std::shared_mutex is using SRW on my end to address Paavo. Sorry about that. But, can you try running Version 0.1? |
Bonita Montero <Bonita.Montero@gmail.com>: Feb 20 07:29PM +0100 And just ignore that there is no error-handling for CreateEvent, CreateSemaphore, SetEvent and ReleaseSemaphore. This is just to show the basic concent. And no, there are no additional barriers necessary! |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 20 02:50PM +0200 On 20.02.2019 11:57, Chris M. Thomasson wrote: > 00D7C8F7 call dword ptr [__imp__AcquireSRWLockShared@4 (0D8B00Ch)] What I see from here is that apparently you are running in 32-bit mode. Who would do this in year 2019? Recompile in 64-bit and measure again! |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 20 09:22PM +0200 On 20.02.2019 20:25, Bonita Montero wrote: > // sign-bit: readers active > volatile DWORD m_dwOwningThreadId; > volatile DWORD m_dwRecursionCount; 'volatile' is neither needed nor sufficient for multithreading code. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 03:28PM -0800 >> be performed in this test at 60 seconds. The number of threads is >> variable and is determined by std::hardware_concurrency * THREADS, set >> at 8 in the test. So on my system the setup is: [...] Wrt Version 0.1, remember my first ct_rwmutex that did not use ct_fast_mutex, but instead ct_fast_semaphore with an initial count of 1? Now, it would be beneficial to test against this implementation as well. Also, another poster, Bonita showed another algorithm that uses CAS, kind of like Windows SRW. It has a lot of explicit looping, my algorithm has no loops in ct_rwmutex. Fwiw, it seems like my benchmark code can be used to test many read/write algorithms. Humm... |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 20 02:26PM -0800 On 2/20/2019 4:50 AM, Paavo Helde wrote: >> (0D8B00Ch)] > What I see from here is that apparently you are running in 32-bit mode. > Who would do this in year 2019? Recompile in 64-bit and measure again! Just tested with a x64 build, and here are the results wrt version 0.1: Testing Version 0.1: Chris M. Thomasson's Experimental Read/Write Mutex ___________________________________ cpu_threads_n = 4 threads_n = 32 writers = 16 readers = 16 test duration = 60 seconds ___________________________________ ___________________________________ Raw Reads: 19658 Raw Writes: 1516 reads_per_tick = 327 writes_per_tick = 25 Ticks = 60.0547 ___________________________________ Testing Version 0.1: std::shared_mutex ___________________________________ cpu_threads_n = 4 threads_n = 32 writers = 16 readers = 16 test duration = 60 seconds ___________________________________ ___________________________________ Raw Reads: 8241 Raw Writes: 1114 reads_per_tick = 137 writes_per_tick = 18 Ticks = 60.1291 ___________________________________ Nice! Humm... Iirc, unsigned long on Windows is only 32-bit? My algorithm is beating 64-bit SRW on my end using only 32-bit atomics. ;^) My algorithm is getting 190 more reads per-second on a linked shared data-structure that has NODE_PRIME nodes, or 1,000,000 nodes... So each read actually iterated NODE_PRIME nodes. NICE! |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 01:04PM -0800 > writes_per_tick = 451 > Ticks = 60.0368 > ___________________________________ [...] Thank you! I need to try to organize all of the various results into a single database. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 02:50PM -0800 On 2/21/2019 12:41 PM, Chris M. Thomasson wrote: > work against Windows SRW. It should beat it because of the way your are > waiting on the semaphore and event. In other words, they are fire and > forget, and not hooked up to any looping. Nice. :^) Ahh, my waits on these "slow-paths" are fire and forget as well. In that sense, they are "similar". This is a good property to have Bonita. |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Feb 21 12:26PM -0800 On 2/20/2019 10:20 PM, Bonita Montero wrote: >> = lOldRWCounters) > Your additional barrier is not necessary here and in the other places! > And m_lRWCounters doesn't need to be an atomic variable. Of course you are correct on this point. :^) Fwiw, I just quickly mocked it up using seq_cst. Now, I can add fine-grain membars. Fwiw, I can basically already see what is needed and where in your algorithm. I need to add them to your Interlocked*'s as well. The next step will be your algorithm using the weakest memory model that I can possibly get, and still still pass Relacy testing. Then it will be ready for testing in my read/write benchmark. Nice. :^) Does this sound okay to you Bonita? > And you disbled recursive locking. I think if it is possible to > do shared locking recursively implicitly it should be able to to > exclusive locking explicitly. I disabled recursion simply because my test does not need any, and it would just add a little extra overhead to your algorithm. So, why have it in there? Do you want me to model your recursion with Relacy? I personally don't see a real need right now, but will do it. Well, perhaps I am "biased" against recursive locking in the first place. Some think it is a bit "evil"? ;^) |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment