soft and program: Digest for comp.programming.threads@googlegroups.com

Friday, December 13, 2019

Digest for comp.programming.threads@googlegroups.com - 2 updates in 2 topics

comp.programming.threads@googlegroups.com

Google Groups

Canada ranked 3rd best country in the world for education - 1 Update
More precision about the Linux sys_membarrier() expedited and the windows FlushProcessWriteBuffers(), read again - 1 Update

Canada ranked 3rd best country in the world for education

aminer68@gmail.com: Dec 12 01:44PM -0800

Hello,

I am a white arab who live in Canada since year 1989, so
read my following writing about Canada:

Canada ranked 3rd best country in the world for education

Read more here:

https://dailyhive.com/toronto/canada-ranked-best-country-education-2019

Also Canada is becoming very attractive with its remarkable Economy,
read more to notice it:

Why so many Silicon Valley companies are moving to Vancouver Canada

https://www.straight.com/tech/1261681/why-so-many-silicon-valley-companies-are-moving-vancouver

And look at this interesting video about Canadian economy:

The Remarkable Economy of Canada

https://www.youtube.com/watch?v=uRnIHyI02EU

Thank you,
Amine Moulay Ramdane.

More precision about the Linux sys_membarrier() expedited and the windows FlushProcessWriteBuffers(), read again

aminer68@gmail.com: Dec 12 12:33PM -0800

Hello,

More precision about the Linux sys_membarrier() expedited and the windows FlushProcessWriteBuffers(), read again:

I have just read the following webpage:

https://lwn.net/Articles/636878/

And it is interesting and it says:

---

Results in liburcu:

Operations in 10s, 6 readers, 2 writers:

memory barriers in reader: 1701557485 reads, 3129842 writes
signal-based scheme: 9825306874 reads, 5386 writes
sys_membarrier expedited: 6637539697 reads, 852129 writes
sys_membarrier non-expedited: 7992076602 reads, 220 writes

---

Look at how "sys_membarrier expedited" is powerful.

So as you have noticed i have already implemented my following scalable scalable Asymmetric RWLocks that use the windows FlushProcessWriteBuffers(), they are called Fast_RWLockX and LW_Fast_RWLockX and they are limited to 400 threads but you can manually extended the maximum number of threads by setting the NbrThreads parameter of the constructor, and you have to start once and for all your threads and work with all your threads, don't start every time a thread and exit from the thread. Fast_RWLockX and LW_Fast_RWLockX don't use any atomic operations and/or StoreLoad style memory barriers on the reader side, so they are scalable and very fast, and i will soon port them to Linux and they will support both sys_membarrier expedited and sys_membarrier non-expedited.

You can download my inventions of scalable Asymmetric RWLocks that use
IPIs and that are costless on the reader side from here:

https://sites.google.com/site/scalable68/scalable-rwlock

Cache-coherency protocols do not use IPIs, and as a user-space level developer you do not care about IPIs at all. One is most interested in the cost of cache-coherency itself. However, Win32 API provides a function that issues IPIs to all processors (in the affinity mask of the current process) FlushProcessWriteBuffers(). You can use it to investigate the cost of IPIs.

FlushProcessWriteBuffers() API does the following:

- Implicitly execute full memory barrier on all other processors.
- Generates an interprocessor interrupt (IPI) to all processors that
are part of the current process affinity.
- Uses IPI to "synchronously" signal all processors.
- It guarantees the visibility of write operations performed on one
processor to the other processors.
- Supported since Windows Vista and Windows Server 2008

If we investigate the cost of IPIs, when i do simple synthetic test on a
Quad core machine I've obtained the following numbers:

420 cycles is the minimum cost of the FlushProcessWriteBuffers() function on issuing core.

1600 cycles is mean cost of the FlushProcessWriteBuffers() function on issuing core.

1300 cycles is mean cost of the FlushProcessWriteBuffers() function on remote core.

Note that, as far as I understand, the function issues IPI to remote core, then remote core acks it with another IPI, issuing core waits for ack IPI and then returns.

And the IPIs have indirect cost of flushing the processor pipeline.

Thank you,
Amine Moulay Ramdane.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

soft and program

Friday, December 13, 2019

Digest for comp.programming.threads@googlegroups.com - 2 updates in 2 topics

No comments:

Blog Archive

About Me