- About FlushProcessWriteBuffers() and IPIs.. - 1 Update
Horizon68 <horizon@horizon.com>: Feb 09 08:24AM -0800 Hello.. About FlushProcessWriteBuffers() and IPIs.. It seems that the implementation of the sys_membarrier on Linux 4.3 is too slow. Starting with kernel 4.14, there is a new flag MEMBARRIER_CMD_PRIVATE_EXPEDITED that enables much faster implementation of the syscall using IPI. See https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/ for some details. And read the following about Userspace RCU, it is also using IPIs: membarrier system call performance and the future of Userspace RCU on Linux Read more here: https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/ Cache-coherency protocols do not use IPIs, and as a user-space level developer you do not care about IPIs at all. One is most interested in the cost of cache-coherency itself. However, Win32 API provides a function that issues IPIs to all processors (in the affinity mask of the current process) FlushProcessWriteBuffers(). You can use it to investigate the cost of IPIs. When i do simple synthetic test on a dual core machine I've obtained following numbers. 420 cycles is the minimum cost of the FlushProcessWriteBuffers() function on issuing core. 1600 cycles is mean cost of the FlushProcessWriteBuffers() function on issuing core. 1300 cycles is mean cost of the FlushProcessWriteBuffers() function on remote core. Note that, as far as I understand, the function issues IPI to remote core, then remote core acks it with another IPI, issuing core waits for ack IPI and then returns. And the IPIs have indirect cost of flushing the processor pipeline. My C++ synchronization objects library was updated, and now i have invented and added the scalable Fast_RWLock and the scalable Fast_RWLockX and they are better than scalable Asymmetric RWLocks that use IPIs, and they are costless on the reader side and they don't use IPIs on the writer side and they are starvation-free, so they are really powerful, and they are now working with Windows and with Linux, i have tested thoroughly my C++ synchronization objects library and i think it is much more stable and fast. You can read about it and download it from my website here: https://sites.google.com/site/scalable68/c-synchronization-objects-library The source code is inside my zip files here(they are called Fast_RWLockX.pas and LW_Fast_RWLockX.pas): https://sites.google.com/site/scalable68/scalable-rwlock Thank you, Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment