- merge-sort with alloca()-buffer - 2 Updates
- Multi-threading question - 5 Updates
- C++17 and alignment... - 2 Updates
Bonita Montero <Bonita.Montero@gmail.com>: Jan 22 06:59PM +0100 > Did you read my first message at all? You: > It's hard to compete with introsort for the vast majority of input. So introsort is a bad advice here. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 22 02:22PM -0800 On 1/21/2021 12:10 AM, Bonita Montero wrote: >> No, they are not. ... > They aren't recommended, i.e. they should be replaced by inline-code > as most as possible. Macros tend to be cryptic and can't be debugged. Agreed. |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 22 07:00PM +0100 > And I don't know why you bother asking at all, because > you seem completely unwilling to read any answers. Your introsort-advice was a bad advice. |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 22 08:01PM +0100 > | atomic_spin_nop (); > That's just a PAUSE-instruction I just tested the "performance" of the PAUSE-instruction: #include <iostream> #include <chrono> #include <intrin.h> using namespace std; using namespace chrono; int main() { using hrc_tp = time_point<high_resolution_clock>; hrc_tp start = high_resolution_clock::now(); for( size_t i = 0; i != 1'000'000; ++i ) _mm_pause(); double ns = (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / 1'000'000.0; cout << ns << endl; } But for me (Ryzen Threadripper 3990X) each PAUSE-iteration is only 0.7ns. That doesn't make sense to me because the instruction should delay spin-loops that power is saved and they have a close timing -relationship to a nearly immediate wakeup from the kernel so that kernel-calls are likely to be avoided. But glibc has a static upper limit of 100 iterations while spinning; and I hardly doubt that any kernel could give a so fast futex-call and -wakeup. So can anyone here verify this with his Intel-hardware; maybe Intel -CPUs have more appropriate PAUSE delays. But there's also TPAUSE as I see which waits until the timestamp -counter reaches a certain value. But glibc does use PAUSE: #define atomic_spin_nop() __asm ("pause") |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 22 08:54PM +0100 > But there's also TPAUSE as I see which waits until the timestamp > -counter reaches a certain value. But glibc does use PAUSE: > #define atomic_spin_nop() __asm ("pause") I just found that all AMD-CPUs don't support TPAUSE. What a mess. That's such a useful instruction. I thought I could use it for my shared lock. |
Ian Collins <ian-news@hotmail.com>: Jan 23 10:30AM +1300 On 23/01/2021 06:31, Juha Nieminen wrote: >> I don't know how to ask for this somewhere else. > And I don't know why you bother asking at all, because you seem > completely unwilling to read any answers. As I said a while back, you are arguing with someone who can't even quote correctly. You are wasting your time. -- Ian. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 22 02:21PM -0800 On 1/22/2021 8:40 AM, Bonita Montero wrote: >> But there are spinlocks in the kernel, but that's a facility >> not provided to the userland. > I just see that there are POSIX userland spinlocks ([*]). Then there are adaptive locks that are a hybrid of a spinlock and a blocking mutex. They can spin a couple of times using, say, a backoff scheme. This might be a single PAUSE instruction wrt x86, or even some more exotic exponential backoffs. They spin a while before resorting to going into kernel land to actually block. > said that userland-spinlocks are a bad idea. > [*] > https://pubs.opengroup.org/onlinepubs/009696699/functions/pthread_spin_lock.html 100% pure spin locks can use a variety of backoff schemes. Have you ever experimented with them? I have. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 22 02:02PM -0800 On 1/22/2021 12:10 AM, Bonita Montero wrote: >> Why not? > Because from the language-perspective a structure only needs to be > aligned to the structure element with the largest alignment-requirement. Keep in mind that I am over aligning on purpose. This allows me to do some fun things. For instance... Think about being able to get at a page "header" by simply rounding down any address in said page to a page boundary. This is extremely useful. Think about a memory allocator where a free consists of rounding down to page boundary, where a header exists that has a lock-free stack. Now, I can just add the freed memory to that list. over aligning has many uses. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 22 02:10PM -0800 On 1/21/2021 11:43 PM, David Brown wrote: > And then there is the possibility of using the low bits in an aligned > address for odd purposes - the bigger the alignment, the more bits you > have to play with. Exactly. I remember using the low bits for a very special reference counter. Also remember using them to embed state for a mutex. I used them for many other things, but that was a while ago. 15+ years. However, all of the code was non-standard. I like the fact that C++17 allows for over alignment, and that it integrates it directly with new and delete. I might port an old proxy garbage collector thing I did way back into C++17. I have some older memory allocators that could simply round freed memory down to a page boundary where a header is, and link it into a lock-free list. This allowed the granularity of a memory block to be the size of a pointer. Pretty nice, and fast. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment