- About my new invention that is my scalable reference counting with efficient support for weak references - 1 Update
- About my ParallelFor() that scales very well that uses my efficient Threadpool that scales very well - 1 Update
- My portable and efficient implementation of a future in Delphi and FreePascal was updated to version 1.2 - 1 Update
- Here is my new variants of Scalable RWLocks that are powerful.. - 1 Update
aminer68@gmail.com: Aug 07 02:24PM -0700 Hello, About my new invention that is my scalable reference counting with efficient support for weak references: As you have noticed i have provided you with my new invention that is a scalable reference counting with efficient support for weak references, here it is: https://sites.google.com/site/scalable68/scalable-reference-counting-with-efficient-support-for-weak-references But i have to explain something: There is an "atomic" increment in my algorithm when it first automatically "register" from a started thread, but it will be "scalable" after the first step of registration and it will not register again from the started thread. But even though, it is really powerful. But i have just enhanced my algorithm to a new fully scalable algorithm that will not need this atomic increment in my scalable algorithm when it first automatically register from a started thread, so it will be fully scalable, and i will perhaps sell this new fully scalable algorithm of mine and its implementation in Delphi to Embarcadero or to Microsoft or to Google or such software companies, this is why i will soon contact Embarcadero software company for that. Thank you, Amine Moulay Ramdane. |
About my ParallelFor() that scales very well that uses my efficient Threadpool that scales very well
aminer68@gmail.com: Aug 07 12:32PM -0700 Hello, About my ParallelFor() that scales very well that uses my efficient Threadpool that scales very well: With ParallelFor() you have to: 1- Ensure Sufficient Work Each iteration of a loop involves a certain amount of work, so you have to ensure a sufficient amount of the work, read below about "grainsize" that i have implemented. 2- In OpenMP we have that: Static and Dynamic Scheduling One basic characteristic of a loop schedule is whether it is static or dynamic: • In a static schedule, the choice of which thread performs a particular iteration is purely a function of the iteration number and number of threads. Each thread performs only the iterations assigned to it at the beginning of the loop. • In a dynamic schedule, the assignment of iterations to threads can vary at runtime from one execution to another. Not all iterations are assigned to threads at the start of the loop. Instead, each thread requests more iterations after it has completed the work already assigned to it. But with my ParallelFor() that scales very well, since it is using my efficient Threadpool that scales very well, so it is using Round-robin scheduling and it uses also work stealing, so i think that this is sufficient. Read the rest: My Threadpool engine with priorities that scales very well is really powerful because it scales very well on multicore and NUMA systems, also it comes with a ParallelFor() that scales very well on multicores and NUMA systems. You can download it from: https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well Here is the explanation of my ParallelFor() that scales very well: I have also implemented a ParallelFor() that scales very well, here is the method: procedure ParallelFor(nMin, nMax:integer;aProc: TParallelProc;GrainSize:integer=1;Ptr:pointer=nil;pmode:TParallelMode=pmBlocking;Priority:TPriorities=NORMAL_PRIORITY); nMin and nMax parameters of the ParallelFor() are the minimum and maximum integer values of the variable of the ParallelFor() loop, aProc parameter of ParallelFor() is the procedure to call, and GrainSize integer parameter of ParallelFor() is the following: The grainsize sets a minimum threshold for parallelization. A rule of thumb is that grainsize iterations should take at least 100,000 clock cycles to execute. For example, if a single iteration takes 100 clocks, then the grainsize needs to be at least 1000 iterations. When in doubt, do the following experiment: 1- Set the grainsize parameter higher than necessary. The grainsize is specified in units of loop iterations. If you have no idea of how many clock cycles an iteration might take, start with grainsize=100,000. The rationale is that each iteration normally requires at least one clock per iteration. In most cases, step 3 will guide you to a much smaller value. 2- Run your algorithm. 3- Iteratively halve the grainsize parameter and see how much the algorithm slows down or speeds up as the value decreases. A drawback of setting a grainsize too high is that it can reduce parallelism. For example, if the grainsize is 1000 and the loop has 2000 iterations, the ParallelFor() method distributes the loop across only two processors, even if more are available. And you can pass a parameter in Ptr as pointer to ParallelFor(), and you can set pmode parameter of to pmBlocking so that ParallelFor() is blocking or to pmNonBlocking so that ParallelFor() is non-blocking, and the Priority parameter is the priority of ParallelFor(). Look inside the test.pas example to see how to use it. Thank you, Amine Moulay Ramdane. |
aminer68@gmail.com: Aug 07 12:12PM -0700 Hello, My portable and efficient implementation of a future in Delphi and FreePascal was updated to version 1.2 I have just enhanced it to be used easily to compose Active Object Pattern using my efficient Threadpool engine with priorities that scales very well, you can download it from: https://sites.google.com/site/scalable68/a-portable-and-efficient-implementation-of-a-future-in-delphi-and-freepascal And you can download my new efficient Threadpool engine with priorities that scales very well version 3.83 from: https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well Thank you, Amine Moulay Ramdane. |
aminer68@gmail.com: Aug 07 11:16AM -0700 Hello, Here is my new variants of Scalable RWLocks that are powerful.. Author: Amine Moulay Ramdane Description: A fast, and scalable and starvation-free and fair and lightweight Multiple-Readers-Exclusive-Writer Lock called LW_RWLockX, the scalable LW_RWLockX does spin-wait, and also a fast and scalable and starvation-free and fair Multiple-Readers-Exclusive-Writer Lock called RWLockX, the scalable RWLockX doesn't spin-wait but uses my portable SemaMonitor and portable event objects , so it is energy efficient. The parameter of the constructors is the size of the array of the readers , so if the size of the array is equal to the number of parallel readers, so it will be scalable, but if the number of readers are greater than the size of the array , you will start to have contention, please look at the source code of my scalable algorithms to understand. I have used my following hash function to make my new variants of RWLocks scalable: --- function DJB2aHash(key:int64):uint64; var i: integer; key1:uint64; begin Result := 5381; for i := 1 to 8 do begin key1:=(key shr ((i-1)*8)) and $00000000000000ff; Result := ((Result shl 5) xor Result) xor key1; end; end; --- You can download them from: https://sites.google.com/site/scalable68/new-variants-of-scalable-rwlocks Thank you, Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment