soft and program: Digest for comp.programming.threads@googlegroups.com

Sunday, February 10, 2019

Digest for comp.programming.threads@googlegroups.com - 1 update in 1 topic

comp.programming.threads@googlegroups.com

Google Groups

About FlushProcessWriteBuffers() and IPIs.. - 1 Update

About FlushProcessWriteBuffers() and IPIs..

Horizon68 <horizon@horizon.com>: Feb 09 08:24AM -0800

Hello..

About FlushProcessWriteBuffers() and IPIs..

It seems that the implementation of the sys_membarrier on Linux 4.3 is
too slow. Starting with kernel 4.14, there is a new flag
MEMBARRIER_CMD_PRIVATE_EXPEDITED that enables much faster implementation
of the syscall using IPI.

See
https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/

for some details.

And read the following about Userspace RCU, it is also using IPIs:

membarrier system call performance and the future of Userspace RCU on Linux

Read more here:

https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/

Cache-coherency protocols do not use IPIs, and as a user-space level
developer you do not care about IPIs at all. One is most interested in
the cost of cache-coherency itself. However, Win32 API provides a
function that issues IPIs to all processors (in the affinity mask of the
current process) FlushProcessWriteBuffers(). You can use it to
investigate the cost of IPIs.

When i do simple synthetic test on a dual core machine I've obtained
following numbers.

420 cycles is the minimum cost of the FlushProcessWriteBuffers()
function on issuing core.

1600 cycles is mean cost of the FlushProcessWriteBuffers() function on
issuing core.

1300 cycles is mean cost of the FlushProcessWriteBuffers() function on
remote core.

Note that, as far as I understand, the function issues IPI to remote
core, then remote core acks it with another IPI, issuing core waits for
ack IPI and then returns.

And the IPIs have indirect cost of flushing the processor pipeline.

My C++ synchronization objects library was updated,
and now i have invented and added the scalable Fast_RWLock and
the scalable Fast_RWLockX and they are better than scalable Asymmetric
RWLocks that use IPIs, and they are costless on the reader side and
they don't use IPIs on the writer side and they are starvation-free, so
they are really powerful, and they are now working with Windows and with
Linux, i have tested thoroughly my C++ synchronization objects library
and i think it is much more stable and fast.

You can read about it and download it from my website here:

https://sites.google.com/site/scalable68/c-synchronization-objects-library

The source code is inside my zip files here(they are called
Fast_RWLockX.pas and LW_Fast_RWLockX.pas):

https://sites.google.com/site/scalable68/scalable-rwlock

Thank you,
Amine Moulay Ramdane.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

soft and program

Sunday, February 10, 2019

Digest for comp.programming.threads@googlegroups.com - 1 update in 1 topic

No comments:

Blog Archive

About Me