- About McCabe's Cyclomatic Complexity - 1 Update
- About the strategy of "work depth-first; steal breadth-first".. - 1 Update
- My new scalable algorithm is here.. - 1 Update
- About modelizations and detection of race conditions and deadlocks in parallel programming.. - 1 Update
- Here is the events that cause the Store buffer to drain on x86 CPUs - 1 Update
aminer68@gmail.com: Jan 20 01:40PM -0800 Hello, High "complexity" of the software code is directly translated to low readability and high maintenance costs, so read about McCabe's Cyclomatic Complexity and Why We Don't Use It: https://www.cqse.eu/en/blog/mccabe-cyclomatic-complexity/ Thank you, Amine Moulay Ramdane. |
aminer68@gmail.com: Jan 20 12:54PM -0800 Hello, About the strategy of "work depth-first; steal breadth-first".. I have just read the following webpage: Why Too Many Threads Hurts Performance, and What to do About It https://www.codeguru.com/cpp/sample_chapter/article.php/c13533/Why-Too-Many-Threads-Hurts-Performance-and-What-to-do-About-It.htm Also I have just looked at the following interesting video about Go scheduler and Go concurrency: Dmitry Vyukov — Go scheduler: Implementing language with lightweight concurrency https://www.youtube.com/watch?v=-K11rY57K7k And i have just read the following webpage about the Threadpool of microsoft .NET 4.0: https://blogs.msdn.microsoft.com/jennifer/2009/06/26/work-stealing-in-net-4-0/ And as you are noticing the first web link above is speaking about the strategy of "work depth-first; steal breadth-first" , but we have to be more smart because i think that this strategy, that is advantageous for cache locality, works best for recursive algorithms, because a thread is taking the first task and after that the algorithm is recursive, so it will put the childs tasks inside the local work-stealing queue, and the other threads will start to take from the work-stealing queue, so the work will be distributed correctly, but as you will notice that this strategy works best for recursive algorithms, but when you you iteratively start many tasks, i think we will have much more contention on the work-stealing queue and this is a weakness of this strategy, other than that when it is not a recursive algorithm and the threads are receiving from the global queue so there will be high contention on the global queue and this is not good. MIT's Cilk and Go scheduler and the Threadpool of Microsoft and Intel® C++ TBB are using this strategy of "work depth-first; steal breadth-first". And as you are noticing that they are giving more preference to cache locality than scalability. But in my following invention of a Threadpool that scales very well i am giving more preference to scalability than to cache locality: https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well Other than that when you are doing IO with my Threadpool, you can use asychronous IO by starting a dedicated thread to IO to be more efficient, or you can start another of my Threadpool and use it for tasks that uses IO, you can use the same method when threads of the my Threadpool are waiting or sleeping.. Other than that for recursion and the stack overflow problem you can convert your function from a recursive to iterative to solve the problem of stack overflow. Other than that to be able to serve a great number of internet connections or TCP/IP socket connections you can use my Threadpool with my powerful Object oriented Stackful coroutines library for Delphi and FreePascal here: https://sites.google.com/site/scalable68/object-oriented-stackful-coroutines-library-for-delphi-and-freepascal Thank you, Amine Moulay Ramdane. |
aminer68@gmail.com: Jan 20 12:51PM -0800 Hello, My new scalable algorithm is here.. As you have noticed i am an inventor of many scalable algorithms and there implementations, but i have just thought more and i think i have just again invented a new algorithm that is scalable, i will explain it, read my following previous writing: ----------------------------------------------------------- About parallel programming and concurrency.. Look at the following concurrency abstractions of microsoft: https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitany?view=netframework-4.8 https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitall?view=netframework-4.8 I will soon implement waitany() and waitall() concurrency abstractions for Delphi and Freepascal, with the timeout in milliseconds of course, and they will work with my efficient implementation of a Future, so you can will be able to wait for many futures with waitany() and waitall(). And about task canceletion like in microsoft TPL, i think it is not a good abstraction, because how do you know when you have to efficiently cancel a task or tasks ? so you are understanding that task cancelation is not a so efficient abstraction , so i will not implement it, because i think the waitany() and waitall() with Futures with the "timeout" in milliseconds are good concurrency abstractions. -------------------------------------------------------------- But my new algorithm is a WaitAny() that is fully "scalable", and it can be called from mutiple threads and it will be scalable, but my WaitAny() and WaitAll() will work with Futures and with Threads and with Event object and such, and it will be portable and scalable. Thank you, Amine Moulay Ramdane. |
aminer68@gmail.com: Jan 20 07:49AM -0800 Hello, About modelizations and detection of race conditions and deadlocks in parallel programming.. I have just taken further a look at the following project in Delphi called DelphiConcurrent by an engineer called Moualek Adlene from France: https://github.com/moualek-adlene/DelphiConcurrent/blob/master/DelphiConcurrent.pas And i have just taken a look at the following webpage of Dr Dobb's journal: Detecting Deadlocks in C++ Using a Locks Monitor https://www.drdobbs.com/detecting-deadlocks-in-c-using-a-locks-m/184416644 And i think that both of them are using technics that are not as good as analysing deadlocks with Petri Nets in parallel applications , for example the above two methods are only addressing locks or mutexes or reader-writer locks , but they are not addressing semaphores or event objects and such other synchronization objects, so they are not good, this is why i have written a tutorial that shows my methodology of analysing and detecting deadlocks in parallel applications with Petri Nets, my methodology is more sophisticated because it is a generalization and it modelizes with Petri Nets the broader range of synchronization objects, and in my tutorial i will add soon other synchronization objects, you have to look at it, here it is: https://sites.google.com/site/scalable68/how-to-analyse-parallel-applications-with-petri-nets You have to get the powerful Tina software to run my Petri Net examples inside my tutorial, here is the powerful Tina software: http://projects.laas.fr/tina/ Also to detect race conditions in parallel programming you have to take a look at the following new tutorial that uses the powerful Spin tool: https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html This is how you will get much more professional at detecting deadlocks and race conditions in parallel programming. Thank you, Amine Moulay Ramdane. |
aminer68@gmail.com: Jan 20 07:19AM -0800 Hello, Here is the events that cause the Store buffer to drain on x86 CPUs, in Section 11.10 of Volume 3 of the Intel SW Developer's Guide, the following list of store-buffer-draining events is included: - Generation of an Exception and/or Interrupt - Execution of a serializing instruction (CPUID, IRET, and RSM are the only non-privileged serializing instructions) - Execution of an I/O instruction - Execution of a LOCK operation - Execution of the BINIT operation (an external reset operation using the BINIT pin) - Execution of an SFENCE instruction - Execution of an MFENCE instruction So as you are noticing that the LOCK instruction also causes the store buffer to drain, and i am using it in my parallel software projects. Also read this about the store buffer: "However under x86-TSO, the stores are cached in the store buffers, a load consult only shared memory and the store buffer of the given thread, wich means it can load data from memory and ignore values from the other thread." Read more here: https://books.google.ca/books?id=C2R2DwAAQBAJ&pg=PA127&lpg=PA127&dq=immediately+visible+and+m+fence+and+store+buffer+and+x86&source=bl&ots=yfGI17x1YZ&sig=ACfU3U2EYRawTkQmi3s5wY-sM7IgowDlWg&hl=en&sa=X&ved=2ahUKEwi_nq3duYPkAhVDx1kKHYoyA5UQ6AEwAnoECAgQAQ#v=onepage&q=immediately%20visible%20and%20m%20fence%20and%20store%20buffer%20and%20x86&f=false Thank you, Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment