soft and program: Digest for comp.programming.threads@googlegroups.com

comp.programming.threads@googlegroups.com

Google Groups

About McCabe's Cyclomatic Complexity - 1 Update
About the strategy of "work depth-first; steal breadth-first".. - 1 Update
My new scalable algorithm is here.. - 1 Update
About modelizations and detection of race conditions and deadlocks in parallel programming.. - 1 Update
Here is the events that cause the Store buffer to drain on x86 CPUs - 1 Update

aminer68@gmail.com: Jan 20 01:40PM -0800

Hello,

High "complexity" of the software code is directly translated to low readability and high maintenance costs, so read about McCabe's Cyclomatic Complexity and Why We Don't Use It:

https://www.cqse.eu/en/blog/mccabe-cyclomatic-complexity/

Thank you,
Amine Moulay Ramdane.

About the strategy of "work depth-first; steal breadth-first"..

aminer68@gmail.com: Jan 20 12:54PM -0800

Hello,

About the strategy of "work depth-first; steal breadth-first"..

I have just read the following webpage:

Why Too Many Threads Hurts Performance, and What to do About It

https://www.codeguru.com/cpp/sample_chapter/article.php/c13533/Why-Too-Many-Threads-Hurts-Performance-and-What-to-do-About-It.htm

Also I have just looked at the following interesting video about Go scheduler and Go concurrency:

Dmitry Vyukov — Go scheduler: Implementing language with lightweight concurrency

https://www.youtube.com/watch?v=-K11rY57K7k

And i have just read the following webpage about the Threadpool of microsoft .NET 4.0:

https://blogs.msdn.microsoft.com/jennifer/2009/06/26/work-stealing-in-net-4-0/

And as you are noticing the first web link above is speaking about the strategy of "work depth-first; steal breadth-first" , but we have to be more smart because i think that this strategy, that is advantageous for cache locality, works best for recursive algorithms, because a thread is taking the first task and after that the algorithm is recursive, so it will put the childs tasks inside the local work-stealing queue, and the other threads will start to take from the work-stealing queue, so the work will be distributed correctly, but as you will notice that this strategy works best for recursive algorithms, but when you you iteratively start many tasks, i think we will have much more contention on the work-stealing queue and this is a weakness of this strategy, other than that when it is not a recursive algorithm and the threads are receiving from the global queue so there will be high contention on the global queue and this is not good. MIT's Cilk and Go scheduler and the Threadpool of Microsoft and Intel® C++ TBB are using this strategy of "work depth-first; steal breadth-first". And as you are noticing that they are giving more preference to cache locality than scalability.

But in my following invention of a Threadpool that scales very well i am
giving more preference to scalability than to cache locality:

https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well

Other than that when you are doing IO with my Threadpool, you can
use asychronous IO by starting a dedicated thread to IO to be more efficient, or you can start another of my Threadpool and use it for tasks that uses IO, you can use the same method when threads of the my Threadpool are waiting or sleeping..

Other than that for recursion and the stack overflow problem
you can convert your function from a recursive to iterative
to solve the problem of stack overflow.

Other than that to be able to serve a great number of internet connections or TCP/IP socket connections you can use my Threadpool
with my powerful Object oriented Stackful coroutines library for Delphi and FreePascal here:

https://sites.google.com/site/scalable68/object-oriented-stackful-coroutines-library-for-delphi-and-freepascal

Thank you,
Amine Moulay Ramdane.

My new scalable algorithm is here..

aminer68@gmail.com: Jan 20 12:51PM -0800

Hello,

My new scalable algorithm is here..

As you have noticed i am an inventor of many scalable algorithms and
there implementations, but i have just thought more and i think
i have just again invented a new algorithm that is scalable, i will
explain it, read my following previous writing:

-----------------------------------------------------------
About parallel programming and concurrency..

Look at the following concurrency abstractions of microsoft:

https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitany?view=netframework-4.8

https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitall?view=netframework-4.8

I will soon implement waitany() and waitall() concurrency abstractions for Delphi and Freepascal, with the timeout in milliseconds of course, and they will work with my efficient implementation of a Future, so you can will be able to wait for many futures with waitany() and waitall().

And about task canceletion like in microsoft TPL, i think it is
not a good abstraction, because how do you know when you have to efficiently cancel a task or tasks ? so you are understanding that task cancelation is not a so efficient abstraction , so i will not implement it, because i think the waitany() and waitall() with Futures with the "timeout" in milliseconds are good concurrency abstractions.
--------------------------------------------------------------

But my new algorithm is a WaitAny() that is fully "scalable", and it can be called from mutiple threads and it will be scalable, but my WaitAny() and WaitAll() will work with Futures and with Threads and with Event object and such, and it will be portable and scalable.

Thank you,
Amine Moulay Ramdane.

About modelizations and detection of race conditions and deadlocks in parallel programming..

aminer68@gmail.com: Jan 20 07:49AM -0800

Hello,

About modelizations and detection of race conditions and deadlocks in parallel programming..

I have just taken further a look at the following project in Delphi called DelphiConcurrent by an engineer called Moualek Adlene from France:

https://github.com/moualek-adlene/DelphiConcurrent/blob/master/DelphiConcurrent.pas

And i have just taken a look at the following webpage of Dr Dobb's journal:

Detecting Deadlocks in C++ Using a Locks Monitor

https://www.drdobbs.com/detecting-deadlocks-in-c-using-a-locks-m/184416644

And i think that both of them are using technics that are not as good
as analysing deadlocks with Petri Nets in parallel applications ,
for example the above two methods are only addressing locks or mutexes
or reader-writer locks , but they are not addressing semaphores
or event objects and such other synchronization objects, so they
are not good, this is why i have written a tutorial that shows my methodology of analysing and detecting deadlocks in parallel applications with Petri Nets, my methodology is more sophisticated because it is a generalization and it modelizes with Petri Nets the broader range of synchronization objects, and in my tutorial i will add soon other synchronization objects, you have to look at it, here it is:

https://sites.google.com/site/scalable68/how-to-analyse-parallel-applications-with-petri-nets

You have to get the powerful Tina software to run my Petri Net examples inside my tutorial, here is the powerful Tina software:

http://projects.laas.fr/tina/

Also to detect race conditions in parallel programming you have to take a look at the following new tutorial that uses the powerful Spin tool:

https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

This is how you will get much more professional at detecting deadlocks and race conditions in parallel programming.

Thank you,
Amine Moulay Ramdane.

Here is the events that cause the Store buffer to drain on x86 CPUs

aminer68@gmail.com: Jan 20 07:19AM -0800

Hello,

Here is the events that cause the Store buffer to drain on x86 CPUs, in Section 11.10 of Volume 3 of the Intel SW Developer's Guide, the following list of store-buffer-draining events is included:

- Generation of an Exception and/or Interrupt
- Execution of a serializing instruction (CPUID, IRET, and RSM are the only non-privileged serializing instructions)
- Execution of an I/O instruction
- Execution of a LOCK operation
- Execution of the BINIT operation (an external reset operation using the BINIT pin)
- Execution of an SFENCE instruction
- Execution of an MFENCE instruction

So as you are noticing that the LOCK instruction also causes the store buffer to drain, and i am using it in my parallel software projects.

Also read this about the store buffer:

"However under x86-TSO, the stores are cached in the store buffers,
a load consult only shared memory and the store buffer of the given thread, wich means it can load data from memory and ignore values from
the other thread."

Read more here:

https://books.google.ca/books?id=C2R2DwAAQBAJ&pg=PA127&lpg=PA127&dq=immediately+visible+and+m+fence+and+store+buffer+and+x86&source=bl&ots=yfGI17x1YZ&sig=ACfU3U2EYRawTkQmi3s5wY-sM7IgowDlWg&hl=en&sa=X&ved=2ahUKEwi_nq3duYPkAhVDx1kKHYoyA5UQ6AEwAnoECAgQAQ#v=onepage&q=immediately%20visible%20and%20m%20fence%20and%20store%20buffer%20and%20x86&f=false

Thank you,
Amine Moulay Ramdane.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

soft and program

Tuesday, January 21, 2020

Digest for comp.programming.threads@googlegroups.com - 5 updates in 5 topics

No comments:

Blog Archive

About Me