- We have to be smart , please read again... - 1 Update
- The benchmark results and my conclusion - 1 Update
- Please run this benchmark - 6 Updates
- About Delphi and FreePascal - 4 Updates
- HashStringList is here... - 1 Update
- Parallel Sort library version 3.32 is here... - 1 Update
- StringTree is here... - 1 Update
Ramine <ramine@1.1>: Feb 13 07:43PM -0800 Hello, We have to be smart , please read what's follow... I have said before that my parallel heapsort is more cache efficient it is why it scales almost perfectly on an 8 cores machine, but i think i have made a mistake , cause i have just looked carefully at my parallel heapsort and what i have noticed that it contains two string's compares, but my parallel quicksort contains one string compare on it's partition function, so from the Amdahl's equation since the string's compare is more expensive , the parallel heapsort will scale almost perfectly on 8 cores machines, but i don't think it will scale on more than 8 cores machines... it's the Amdahl's equation that says so, and i think all my parallel algorithms have the same cache efficiency.. so by nature parallel sort algorithms such us parallel mergesort and parallel quicksort and parallel heapsort have a scalability limit at 8X or so, and they don't scale at more than 8X with more and more cores than 8 cores, so the solution is to implement a NUMA-aware parallel sort algorithm to make it scale on more and more NUMA nodes... Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Feb 13 07:12PM -0800 Hello, Here is the results of my benchmark on an 8 cores machine that was posted by a guy that is called Melzzzz on usenet , read it carefully, and read my conclusion bellow: === Please press a key to exit...: [bmaxa@maxa-pc aminer]$ taskset -c 0,1,2,3,4,5,6,7 wine test.exe Number of cores is: 8 Scalability with parallel mergesort on 8 cores is: 3.23 Time of parallel mergesort on 8 cores is: 298091 microseconds Number of cores is: 8 Scalability with parallel quicksort on 8 cores is: 3.52 Time of parallel quicksort on 8 cores is: 348340 microseconds Number of cores is: 8 Scalability with parallel heapsort on 8 cores is: 7.36 Time of parallel heapsort on 8 cores is: 807979 microseconds === I think i have finally understood my parallel algorithms: Look at my parallel heapsort results: Number of cores is: 8 Scalability with parallel heapsort on 8 cores is: 7.36 Time of parallel heapsort on 8 cores is: 807979 microseconds I think that my parallel heapsort algorithm is by nature more cache efficient this is why it scales very well on more and more cores, so if you have more cores than 8 cores, i think that my parallel heapsort of my parallel sort library will be better to replace the other parallel algorithms such as my parallel mergesort and my parallel quicksort of my parallel sort library. The benchmark's results also inform us on an important think: it is that the parallel mergesort and parallel quicksort of my parallel sort library are by nature much less cache efficient than my parallel heapsort of my parallel sort library. Thank you Melzzzz, you are such a great guy that you have helped me to run the benchmark. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Feb 13 05:43PM -0800 Hello, I have implemented a benchmark for my Parallel sort library, and i want to test it on other computers, please help me to do it, this benchmark runs on windows, please download it and run it and report to me your kind of processor and if it's possible how many L2 caches you have, and report to me the output of this benchmark, this benchmark is testing my parallel mergesort of my parallel sort library, please download the benchmark from here and run it: https://sites.google.com/site/aminer68/benchmark-for-parallel-sort-library To download it please click on the small "arrow" on the right of the "test.zip" text on your screen... Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Feb 13 06:14PM -0800 On 2/13/2015 5:43 PM, Ramine wrote: > "test.zip" text on your screen... > Thank you, > Amine Moulay Ramdane. If you have more than 4 cores on your computer, it will be really interresting to see what's the result of my benchmark on your computer. If you have an Intel i7 CPU , this will be also interresting to see the result on it... So please help me to see what's the results of my parallel sort library, so run the benchmark. Thank you, Amine Moulay Ramdane. |
Melzzzzz <mel@zzzzz.com>: Feb 14 12:34AM +0100 On Fri, 13 Feb 2015 17:43:16 -0800 > "test.zip" text on your screen... > Thank you, > Amine Moulay Ramdane. [bmaxa@maxa-pc aminer]$ taskset -c 0,1,2,3 wine test.exe Number of cores is: 4 Scalability on 4 cores is: 2.42 Please press a key to exit...: [bmaxa@maxa-pc aminer]$ taskset -c 0,1,2,3,4,5,6,7 wine test.exe Number of cores is: 8 Scalability on 8 cores is: 3.30 Please press a key to exit...: [bmaxa@maxa-pc aminer]$ [bmaxa@maxa-pc aminer]$ inxi -C CPU: Quad core Intel Core i7-4790 (-HT-MCP-) cache: 8192 KB clock speeds: max: 4000 MHz 1: 3958 MHz 2: 3977 MHz 3: 3995 MHz 4: 3994 MHz 5: 4000 MHz 6: 3968 MHz 7: 3998 MHz 8: 3924 MHz [bmaxa@maxa-pc aminer]$ |
Ramine <ramine@1.1>: Feb 13 06:53PM -0800 Hello, Please Melzzzz i have just implemented and uploaded a full benchmark, this one will test all the sorting algorithms, can you download the new benchmark again and test it with your computers that have 4 cores and the other that have 8 cores. Here it is: https://sites.google.com/site/aminer68/benchmark-for-parallel-sort-library Thank you, Amine Moulay Ramdane. |
Melzzzzz <mel@zzzzz.com>: Feb 14 12:57AM +0100 On Fri, 13 Feb 2015 18:53:05 -0800 > this one will test all the sorting algorithms, can you download the > new benchmark again and test it with your computers that have 4 cores > and the other that have 8 cores. It's i7 4790. 4 cores with HT > https://sites.google.com/site/aminer68/benchmark-for-parallel-sort-library > Thank you, > Amine Moulay Ramdane. [bmaxa@maxa-pc aminer]$ taskset -c 0,1,2,3 wine test.exe Number of cores is: 4 Scalability with parallel mergesort on 4 cores is: 2.38 Time of parallel mergesort on 4 cores is: 304864 microseconds Number of cores is: 4 Scalability with parallel quicksort on 4 cores is: 2.64 Time of parallel quicksort on 4 cores is: 334282 microseconds Number of cores is: 4 Scalability with parallel heapsort on 4 cores is: 4.30 Time of parallel heapsort on 4 cores is: 753477 microseconds Please press a key to exit...: [bmaxa@maxa-pc aminer]$ taskset -c 0,1,2,3,4,5,6,7 wine test.exe Number of cores is: 8 Scalability with parallel mergesort on 8 cores is: 3.23 Time of parallel mergesort on 8 cores is: 298091 microseconds Number of cores is: 8 Scalability with parallel quicksort on 8 cores is: 3.52 Time of parallel quicksort on 8 cores is: 348340 microseconds Number of cores is: 8 Scalability with parallel heapsort on 8 cores is: 7.36 Time of parallel heapsort on 8 cores is: 807979 microseconds Please press a key to exit...: |
Ramine <ramine@1.1>: Feb 13 07:08PM -0800 Hello, I think i have finally understood my parallel algorithms: Look at my parallel heapsort results: Number of cores is: 8 Scalability with parallel heapsort on 8 cores is: 7.36 Time of parallel heapsort on 8 cores is: 807979 microseconds I think that my parallel heapsort algorithm is by nature more cache efficient this is why it scales very well on more and more cores, so if you have more cores than 8 cores, i think that my parallel heapsort of my parallel sort library will be better to replace the other parallel algorithms such as my parallel mergesort and my parallel quicksort of my parallel sort library. The benchmark's results also inform us on an important think: it is that the parallel mergesort and parallel quicksort of my parallel sort library are by nature much less cache efficient than my parallel heapsort of my parallel sort library. Thank you Melzzzz, you are such a great guy that you have helped me to run the benchmark. Amine Moulay Ramdanae. |
Ramine <ramine@1.1>: Feb 13 03:10PM -0800 Hello, As you have noticed i am implementing my libraries using the Delphi and FreePascal compilers, i must say that the Object Pascal language that i am using is a fantastic language, cause it has allowed me for example to code 2000 lines of "stable" code of StringTree in one day , that's really amazing how is efficient Delphi and FreePascal language, what is much more amazing is that Object Pascal is so much easy that i have not even used a debugger in all my projects that i think are stable now, i have used only some few writeln() and that's all, so i think FreePascal and Delphi are powerful compilers, also i have tested the new 64 bit FPC here: ftp://ftp.freepascal.org/pub/fpc/snapshot/trunk/ and i have tried to do the scimark2 benchmarks and it is giving a really amazing performance on 64 bit compiler that was optimized more: For Visual C++ 32 bit it gives: Using 2.00 seconds min time per kenel. Composite Score: 701.59 FFT Mflops: 519.89 (N=1024) SOR Mflops: 622.40 (100 x 100) MonteCarlo: Mflops: 101.07 Sparse matmult Mflops: 893.77 (N=1000, nz=5000) LU Mflops: 1370.82 (M=100, N=100) For csharp it gives: Composite Score: 531.97 MFlops FFT : 501.18 - (1024) SOR : 711.62 - (100x100) Monte Carlo : 31.85 Sparse MatMult : 553.74 - (N=1000, nz=5000) LU : 861.49 - (100x100) And for FreePascal 64 bit (from the trunk that was optimized) Composite Score MFlops: 581.20 FFT Mflops: 404.46 (N=1024) SOR Mflops: 717.01 (100 x 100) MonteCarlo: Mflops: 113.12 Sparse matmult Mflops: 810.49 (N=1000, nz=5000) LU Mflops: 860.90 (M=100, N=100) So all in all that's a good news for FreePascal and Delphi... Thank you, Amine Moulay Ramdane, |
Bonita Montero <Bonita.Montero@gmail.com>: Feb 13 10:02PM +0100 Ramine wrote: > For Visual C++ 32 bit it gives: ^^ > And for FreePascal 64 bit (from the trunk that was optimized) ^^ Comparing 32 and 64 bit code ist comparing apples and pears. 64 bit code has more registers and therefore is faster. I'll bet 64 bit C++ code optimized the same way like pascal -code will outperform the pascal-code because of the better C++-compilers. And just ignoring, that C++ is the smarter language. |
Ramine <ramine@1.1>: Feb 13 04:23PM -0800 Hello, I have compiled the scimark2 benchmark with a newer gcc mingw64 64 bit and with -O2 optimization and it has given a composite score of: Composite Score: 760.43 The composite score of FreePascal 64 bit (the optimized version from the trunk) is: Composite Score MFlops: 581.20 So FreePascal is slower than gcc mingw64 by only 24%, that's a great news for FreePascal. Thank you, Amine Moulay Ramdane. |
Melzzzzz <mel@zzzzz.com>: Feb 13 10:25PM +0100 On Fri, 13 Feb 2015 22:02:38 +0100 > -code will outperform the pascal-code because of the better > C++-compilers. > And just ignoring, that C++ is the smarter language. For reference: [bmaxa@maxa-pc sci]$ java jnt.scimark2.commandline SciMark 2.0a Composite Score: 2266.6183001230893 FFT (1024): 1389.279726779432 SOR (100x100): 1653.3291557147395 Monte Carlo : 877.5985388498531 Sparse matmult (N=1000, nz=5000): 1828.6989227937966 LU (100x100): 5584.185156477624 java.vendor: Oracle Corporation java.version: 1.7.0_76 os.arch: amd64 os.name: Linux os.version: 3.18.6-1-MANJARO [bmaxa@maxa-pc sci]$ gcc -Wall -O3 -march=native *.c -o scimark2 -lm [bmaxa@maxa-pc sci]$ ./scimark2 ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 2813.76 FFT Mflops: 2292.31 (N=1024) SOR Mflops: 2446.00 (100 x 100) MonteCarlo: Mflops: 658.08 Sparse matmult Mflops: 3028.21 (N=1000, nz=5000) LU Mflops: 5644.21 (M=100, N=100) [bmaxa@maxa-pc sci]$ clang -Wall -O3 -march=native *.c -o scimark2 -lm [bmaxa@maxa-pc sci]$ ./scimark2 ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 2743.54 FFT Mflops: 1640.43 (N=1024) SOR Mflops: 1875.82 (100 x 100) MonteCarlo: Mflops: 629.14 Sparse matmult Mflops: 2904.31 (N=1000, nz=5000) LU Mflops: 6668.00 (M=100, N=100) [bmaxa@maxa-pc sci]$ |
Ramine <ramine@1.1>: Feb 13 02:48PM -0800 Hello, I was implementing StringTree, and i have used first TStringList, but it was too slow, and i have used after that THashedStringList of the inifiles unit and i have found it also slow, this is why i have decided to implement a much faster HashStringList that is even faster than THashedStringList of inifiles unit, i have benchmarked my new HashStringList and it is giving really amazing performance, so i will advise you to use my new HashStringList that you will find inside the zip file of StringTree... You can download my new HashStringList by downloading the zip file of StringTree from: https://sites.google.com/site/aminer68/stringtree I have implemented the necessary methods and it is working with all the Delphi versions and also it is working with freepascal, and it compiles for 32 bit and 64 bit binaries form. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Feb 13 02:18PM -0800 Hello, I have updated my Parallel Sort library to version 3.32, i have just corrected a minor bug and i have stress tested it and it i think it is now really stable. You can download my Parallel Sort library version 3.32 from: https://sites.google.com/site/aminer68/parallel-sort-library But don't forget to put the "cmem" unit as the first unit in your "uses" statement when you want to use my Parallel Sort library with the Delphi graphical user interface, that's mandatory, and that's mandatory also with my other parallel libraries. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Feb 13 01:59PM -0800 Hello, As i have promised, i have finally implemented a very fast StringTree that will also be used to design a kind of graphical interface for my parallel archiver, i have worked hard and i have wrote 2000 lines code of StringTree in one day, and i have also took one day to stress test it , and i think it is now stable, please read more the description bellow... Authors: Amine Moulay Ramdane and Kjell Hasthi (It's based on Kjell Hasthi unit) Description: TStringStree class implements a non-visual tree structure like that found in TreeView. TStringTree is a class for handling a tree-structured stringlist. TStringTree is very similar to directory structures , it uses the familiar terms of "directories" and "files" instead of nodes and child nodes. This unit is based on Kjell Hasthi unit but i have redesigned it and enhanced it much more and it is now much more faster than Kjeli Hasthi unit and it also uses my Parallel Sort library and it uses my faster HashStringList. And please look at test.pas demo inside the zipfile - compile and execute it... You can download StringTree from: https://sites.google.com/site/aminer68/stringtree You have to download the zipfile called stringtree_xe.zip for Delphi XE versions, the other zip file to download called stringtree.zip is for freepascal and delphi 7 to 2007. Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/ Operating Systems: Win , Linux and Mac (x86). Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal -Sd for delphi mode.... Required Delphi 7 to 2007 switches: -$H+ -DDelphi Required Delphi XE switches: -$H+ -DXE The defines options inside defines.inc are: {$DEFINE CPU3} and {$DEFINE Windows32} for 32 bit systems {$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems Thank you, Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment