"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Nov 08 03:39PM -0800 On 11/8/2022 3:29 AM, Juha Nieminen wrote: > of, and genuinely made programming easier. C++14 and C++17 fixed and patched > many of the minor problems and defects that turned out to exist in C++11, > so C++17 felt like "what C++11 should have been in the first place". [...] I was really excited and happy when C++ finally made atomics and membars part of the actual standard, C++11 iirc. Before that, I would have to code these things up in assembly language. |
"Öö Tiib" <ootiib@hot.ee>: Nov 09 12:22AM -0800 On Tuesday, 8 November 2022 at 15:26:21 UTC+2, Stuart Redmann wrote: > > std::vector is magical class now. > Could you elaborate this? Or maybe post a link? A quick web search yields > nothing usable :-/ Before C++11 an element of vector had to be CopyAssignable and CopyConstructible and that was constraining but doable. After C++11 it was required to be Erasable and more constraints were split between individual methods. At the same time the object lifetime grew dim and complicated so it was unclear how the requirements can be met without magic when just requiring Erasable. In C++14 it felt still fixable by correcting some problems. C++17 however added explicit undefined behaviors to nasty places (that compilers can exploit in optimizations) and so made quite explicit that what is required takes magic. The std::launder() added does look from afar that it might help but on closer inspection does not. For example the P0532 of Nikolai Josuttis elaborates how launder does not help at all on example of vector. |
David Brown <david.brown@hesbynett.no>: Nov 09 09:29AM +0100 On 09/11/2022 00:39, Chris M. Thomasson wrote: >> patched >> many of the minor problems and defects that turned out to exist in C++11, >> so C++17 felt like "what C++11 should have been in the first place". I agree with that. A challenge for C++ is that even when a new and better feature is added, the older and clumsier methods still have to be supported. This also means that syntax can be awkward because it can't conflict with existing syntax, and the details get more complex all the time. > I was really excited and happy when C++ finally made atomics and membars > part of the actual standard, C++11 iirc. Before that, I would have to > code these things up in assembly language. Standard atomics would be great if they worked for my targets. The gcc implementations (and I haven't seen any others) for "advanced" use (read-modify-write, or sizes larger than a standard register) is completely broken for single-core systems, and even on multi-core systems it is limited if you use thread priorities. The trouble with them is that no one has addressed the elephant in the room - in general, you need OS support and locks to implement large atomics. |
Sam <sam@email-scan.com>: Nov 09 07:59AM -0500 Juha Nieminen writes: > C++20, however, doesn't feel like this anymore. It has a few new features > that genuinely help in programming, but most of it feels like just adding > features for the sake of adding them. C++23 even moreso. Or the features were specifically added to make sucky operating systems suck a little less. Specifically: co-routines. Microsoft hijacked the standardization process to push through co-routines, because real multiple execution threads on MS-Windows blows chunks, and the OS can only implement co-routines in a passable manner. |
scott@slp53.sl.home (Scott Lurndal): Nov 09 02:39PM > The trouble with >them is that no one has addressed the elephant in the room - in general, >you need OS support and locks to implement large atomics. IFF the target architecture doesn't have a comprehensive set of atomic access instructions, perhaps. ARMv8 LSE, for example, has individual instructions for most of the gcc atomic intrinsics (e.g. __sync_fetch_and_add will generate a single LDADD atomic instruction). The instructions support the common arithmetic operations (add, or, etc). Before LSE, the ARMv8 implementations were built using the arm LL/SC equivalent (load exclusive/store exclusive) instructions. |
David Brown <david.brown@hesbynett.no>: Nov 09 05:05PM +0100 On 09/11/2022 15:39, Scott Lurndal wrote: > arithmetic operations (add, or, etc). > Before LSE, the ARMv8 implementations were built using the arm > LL/SC equivalent (load exclusive/store exclusive) instructions. You are more familiar with the details of these things than most people, so I hope you (or someone else) will correct me if my logic below is wrong. There's no problem when the target has a single unbreakable instruction for the action. And LL/SC are fine for atomic loads or stores of different sizes. But LL/SC is not sufficient for read-modify-write sequences of a size larger than can be handled by a single atomic instruction. Imagine you have a processor that can atomically read or write an unsigned integer type "uint". Your sequence for "uint_inc" will be : retry: load link x = *p x++ if (store conditional *p = x fails) goto retry If two processes try this, they can interleave and be started or stopped without trouble - the result will be an atomic increment. Now consider a double-sized type containing two "uint" fields: retry: load link x_lo = *p x_hi = *(p + 1) x_lo++ if (!x_lo) x_hi++ if (store conditional *p = x_lo fails) goto retry *(p + 1) = x_hi If the process executing this is stopped after the first write, and a second process is run that calls a similar function, then the new process will see a half-changed value for the object resulting in a corrupted object. Resumption of the first process will half-change the value again. Different combinations of using "store_conditional" on the two stores will result in similar problems. The only way to make a multi-unit RMW operation work is if other processes are /blocked/ from breaking in during the actual write sequence. Reads and the calculation can be re-retried, but not the writes - they must be made an unbreakable sequence. And that, in general, means a lock and OS support to ensure that the locking process gets to finish. The gcc implementation of atomic operations (larger than can be handled with a single instruction) uses simple user-space spin locks (the lock can be accessed atomically - with an LL/SC sequence, for the ARM). If one process tries to access the atomic while another process has the lock, it will spin - running a busy wait loop. As long as these processes are running on different cores, there's no problem with one core running a few rounds of a tight loop while another core does a quick load or store. Given that contention is rare and cores are often plentiful, this results in a very efficient atomic operation. But it can deadlock - a process could take the spin lock and then get descheduled by the OS, and other threads wanting the lock could be activated. If these fill up the cores (maybe you have multiple threads all using the same supposedly lock-free atomic structure), you are screwed. And if you have only one core (like almost all microcontrollers), and the thread that has the lock is interrupted by an interrupt routine that wants to access the same atomic variable, you are /really/ screwed. This can happen with such simple code as a 64-bit atomic counter in an interrupt routine that is also accessed atomically from a background task. It's very unlikely that you'll hit a problem, but it is possible. To me, that is useless - atomics need guaranteed forward progress. That means the std::atomic<> stuff needs to use OS-level locks for advanced cases that can't be handled directly by instructions or LL/SC sequences, or for a microcontroller you'd want to disable interrupts around the access. The alternative is to refuse to compile the operations and only support atomics that are smaller or simpler. |
scott@slp53.sl.home (Scott Lurndal): Nov 09 05:58PM >There's no problem when the target has a single unbreakable instruction >for the action. And LL/SC are fine for atomic loads or stores of >different sizes. Here's the code generated by GCC for q = __sync_fetch_and_add(&q, 1u); Without LSE (atomics) support: 401034: 885ffc60 ldaxr w0, [x3] 401038: 11000401 add w1, w0, #0x1 40103c: 8804fc61 stlxr w4, w1, [x3] c01040: 35ffffa4 cbnz w4, 401034 <main+0x34> With LSE (atomics) support: 12c: b8e10001 ldaddal w1, w1, [x0] > if (!x_lo) x_hi++ > if (store conditional *p = x_lo fails) goto retry > *(p + 1) = x_hi For such sequences, one uses the LL/SC as a spinlock; acquire the spinlock, perform the non-atomic operation and release the spinlock. On uniprocessor systems, alternate mechanisms like disabling interrupts are the common solution. Although in this case, using a wider type if available is a better option. >descheduled by the OS, and other threads wanting the lock could be >activated. If these fill up the cores (maybe you have multiple threads >all using the same supposedly lock-free atomic structure), you are screwed. This is a typical priority inheritance problem. >And if you have only one core (like almost all microcontrollers), and >the thread that has the lock is interrupted by an interrupt routine that >wants to access the same atomic variable, you are /really/ screwed. To be fair, the programmer should be aware of these issues and not use mechanisms subject to deadlock. As noted above, the typical solution is to disable interrupts during a critical section. >It's very unlikely that you'll hit a problem, but it is possible. Famous last words, indeed. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Nov 09 11:47AM -0800 On 11/9/2022 12:29 AM, David Brown wrote: > systems it is limited if you use thread priorities. The trouble with > them is that no one has addressed the elephant in the room - in general, > you need OS support and locks to implement large atomics. Are you referring to double-width compare-and-swap (DWCAS)? C++ should be able to handle it directly using the processors instruction set. Say C++ on a modern 64 bit x64 system, well CMPXCHG16B should be used for a double word. Double word in the sense that they are two _contiguous_ words. In other words, a lock-free CAS of a double word on a 64 bit x64 should use CMPXCHG16B. |
scott@slp53.sl.home (Scott Lurndal): Nov 09 08:43PM >double word. Double word in the sense that they are two _contiguous_ >words. In other words, a lock-free CAS of a double word on a 64 bit x64 >should use CMPXCHG16B. David works with low-end embedded processors, as I understand it, with limited and/or restricted instruction sets. |
olcott <polcott2@gmail.com>: Nov 09 01:12PM -0600 void D(void (*x)()) { H(x, x); } If H correctly simulates D would the simulated D ever stop running without being aborted? A PhD computer scientist seems to believe that (ignoring stack overflow) D correctly simulated by H would stop running and terminate normally without having its simulation aborted. *Is he correct or just trolling me* -- Copyright 2022 Pete Olcott "Talent hits a target no one else can hit; Genius hits a target no one else can see." Arthur Schopenhauer |
Michael S <already5chosen@yahoo.com>: Nov 08 03:37PM -0800 > > implementations on all x86-64 systems that I ever tested. > The MinGW (that is, MinGW64) g++ implementation was not reasonable, and > probably isn't today either. IMO, you are confused. From the very beginning mingw64 std::chrono::steady_clock() is based on QueryPerformanceCounter(). It's not the best possible clock available on Windows, but certainly not broken. It has sub-microsecond resolution. Typically ~331ns. Most likely you had seen std::chrono::high_resolution_clock broken (and this one is broken indeed) and remembered that *something* is broken, but forgot what is broken and what not. > PS: I'm sorry that I by mistake sent essentially this reply via mail, > presumably to an unused spam-dump e-mail address, but. Not sure how > Thunderbird managed to trick me into doing that. That's o.k. This mail address is not totally unused, but it is used may be once or twice per year. |
"Öö Tiib" <ootiib@hot.ee>: Nov 09 06:14AM -0800 On Wednesday, 9 November 2022 at 01:37:23 UTC+2, Michael S wrote: > Most likely you had seen std::chrono::high_resolution_clock broken (and > this one is broken indeed) and remembered that *something* is broken, > but forgot what is broken and what not. From that steady_clock we are supposed to get time since arbitrary time point of past. No years and months that OP had. That is useful only in context of checking durations of works or time-ordering of events of (threads of) current process. If used as (parts of) file names (like OP seemed to want) the ordering and time distances become wrong with restart of system. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment