- Optical processor startup claims to outperform GPUs for AI workloads - 1 Update
- I correct a typo, please read again - 1 Update
- I have thought more about the following paper about Lock Cohorting - 1 Update
- I think i have just made a mistake in my previous post - 1 Update
- I was just reading about the following paper about Lock Cohorting - 1 Update
- My Scalable reference counting with efficient support for weak references version 1.11 is here.. - 1 Update
- About my new Scalable reference counting with efficient support for weak references version 1.1 - 1 Update
- I want to share with you this beautiful song - 1 Update
- I correct a typo, please read again.. - 1 Update
- My new Scalable reference counting with efficient support for weak references version 1.1 is here.. - 2 Updates
Sky89 <Sky89@sky68.com>: Apr 29 12:25AM -0400 Hello.. Optical processor startup claims to outperform GPUs for AI workloads Fathom Computing, a startup founded by brothers William and Michael Andregg, is aiming to design an optical computer for neural networks. As performance increases in traditional transistor-based CPUs have relatively stagnated over the last few years, alternative computing paradigms such as quantum computers have been gaining traction. While optical computers are not a new concept—Coherent Optical Computers was published in 1972—they have been relegated to university research laboratories for decades. The design of the Fathom prototype performs mathematical operations by encoding numbers into light, according to this profile in Wired. The light is then passed through a series of lenses and other optical components. The measured result of this process is the calculated result of the operation. The Fathom prototype is not a general-purpose processor, it is designed to compute specific types of linear algebra operations. Specifically, Fathom is targeting the long short-term memory type of recurrent neural networks, as well as the non-recurrent feedforward neural network. This mirrors trends in quantum computers, as systems produced by D-Wave are similarly targeted toward quantum annealing, rather than general processing. In a recent blog post, Fathom indicated that they are still two years away from launching, but that they expect their platform to "significantly outperform state-of-the-art GPUs." The first systems will be available as a cloud service for researchers working in artificial intelligence. Read more here: https://www.techrepublic.com/article/optical-processor-startup-claims-to-outperform-gpus-for-ai-workloads/ Thank you, Amine Moulay Ramdane. |
Sky89 <Sky89@sky68.com>: Apr 28 11:12PM -0400 Hello.. I correct a typo, please read again: I have thought more about the following paper about Lock Cohorting: A General Technique for Designing NUMA Locks: http://groups.csail.mit.edu/mag/a13-dice.pdf I think it is not a "bright" idea, this NUMA Locks that uses Lock Cohorting do not optimize the inside of the critical sections protected by the NUMA Locks that use Lock Cohorting, because the inside of those critical sections may transfer/bring Data from different NUMA nodes, so this will cancel the gains that we have got from NUMA Locks that uses Lock cohorting. So i don't think i will implement Lock Cohorting. So then i have invented my scalable AMLock and scalable MLock, please read for example about my scalable MLock here: https://sites.google.com/site/aminer68/scalable-mlock You can download my scalable MLock for C++ by downloading my C++ synchronization objects library that contains some of my "inventions" here: https://sites.google.com/site/aminer68/c-synchronization-objects-library The Delphi and FreePascal version of my scalable MLock is here: https://sites.google.com/site/aminer68/scalable-mlock Thank you, Amine Moulay Ramdane. |
Sky89 <Sky89@sky68.com>: Apr 28 11:10PM -0400 Hello... I have thought more about the following paper about Lock Cohorting: A General Technique for Designing NUMA Locks: http://groups.csail.mit.edu/mag/a13-dice.pdf I think it is not a "bright" idea, this NUMA Locks that uses Lock Cohorting do not optimize the inside of the critical section protected by the NUMA Locks that use Lock Cohorting, because the inside of those criticals sections may transfer/bring Data from different NUMA nodes, so this will cancel the gains that we have got from NUMA Locks that uses Lock cohorting. So i don't think i will implement Lock Cohorting. So then i have invented my scalable AMLock and scalable MLock, please read for example about my scalable MLock here: https://sites.google.com/site/aminer68/scalable-mlock You can download my scalable MLock for C++ by downloading my C++ synchronization objects library that contains some of my "inventions" here: https://sites.google.com/site/aminer68/c-synchronization-objects-library The Delphi and FreePascal version of my scalable MLock is here: https://sites.google.com/site/aminer68/scalable-mlock Thank you, Amine Moulay Ramdane. |
Sky89 <Sky89@sky68.com>: Apr 28 07:11PM -0400 Hello.. I think i have just made a mistake in my previous post: I was just reading about the following paper about Lock Cohorting: A General Technique for Designing NUMA Locks: http://groups.csail.mit.edu/mag/a13-dice.pdf I think that this Lock cohorting optimizes more the cache etc. so i think it is great, so i will implement it in C++ and Delphi and FreePascal. Thank you, Amine Moulay Ramdane. |
Sky89 <Sky89@sky68.com>: Apr 28 06:57PM -0400 Hello, I was just reading about the following paper about Lock Cohorting: A General Technique for Designing NUMA Locks: http://groups.csail.mit.edu/mag/a13-dice.pdf And i have noticed that they are testing on other NUMA systems than Intel NUMA systems, i think that Intel NUMA systems are much more optimized and the costs of data transfer between NUMA nodes on Intel NUMA systems is "only" 1.6X of the a local NUMA node cost, so don't bother about Lock Cohorting on Intel NUMA systems. This is why i have invented my scalable AMLock and scalable MLock, please read for example about my scalable MLock here: https://sites.google.com/site/aminer68/scalable-mlock You can download my scalable MLock for C++ by downloading my C++ synchronization objects library that contains some of my "inventions" here: https://sites.google.com/site/aminer68/c-synchronization-objects-library The Delphi and FreePascal version of my scalable MLock is here: https://sites.google.com/site/aminer68/scalable-mlock Thank you, Amine Moulay Ramdane. |
Sky89 <Sky89@sky68.com>: Apr 28 05:08PM -0400 Hello, My Scalable reference counting with efficient support for weak references version 1.11 is here.. There was no bug in version 1.1, this new version 1.11 is just that i have switched the variables "head" and "tail" in my scalable reference counting algorithm. You can download my Scalable reference counting with efficient support for weak references version 1.11 from: https://sites.google.com/site/aminer68/scalable-reference-counting-with-efficient-support-for-weak-references Thank you, Amine Moulay Ramdane. |
Sky89 <Sky89@sky68.com>: Apr 28 04:36PM -0400 Hello.. About my new Scalable reference counting with efficient support for weak references version 1.1: Weak references support is done by hooking the TObject.FreeInstance method so every object destruction is noticed and if a weak reference for that object exists it gets removed from the internal dictionary where all weak references are stored. While it works I am aware that this is hacky approach and it might not work if someone overrides the FreeInstance method and does not call inherited. You can download and read about my new scalable reference counting with efficient support for weak references version 1.1 from: https://sites.google.com/site/aminer68/scalable-reference-counting-with-efficient-support-for-weak-references Thank you, Amine Moulay Ramdane. |
Sky89 <Sky89@sky68.com>: Apr 28 04:07PM -0400 Hello.. I want to share with you this beautiful song: Laid Back - Sunshine Reggae https://www.youtube.com/watch?v=bNowU63PF5E Thank you, Amine Moulay Ramdane. |
Sky89 <Sky89@sky68.com>: Apr 28 03:34PM -0400 Hello.. I correct a typo, please read again.. My new Scalable reference counting with efficient support for weak references version 1.1 is here.. I have enhanced my scalable algorithm and now it is much powerful, now my scalable algorithm implementation works also as a "scalable" counter that supports both "increment" and "decrement" using two scalable counting networks, please take a look at my new scalable algorithm implementation inside the source code.. You can download my new scalable reference counting with efficient support for weak references version 1.1 from: https://sites.google.com/site/aminer68/scalable-reference-counting-with-efficient-support-for-weak-references Description: This is my scalable reference counting with efficient support for weak references, and since problems that cannot be solved without weak references are rare, so this library does scale very well, this scalable reference counting is implemented using scalable counting networks that eliminate completely false sharing , so it is fully scalable on multicore processors and manycore processors and this scalable algorithm is optimized, and this library does work on both Windows and Linux (x86), and it is easy to port to Mac OS X. I have modified my scalable algorithm, now as you will notice i am not using decrement with support for antitokens in the balancers of the scalable counting networks, i am only using an "increment", please look at my new scalable algorithm inside the zip file, i think it is working correctly. Also notice that the returned value of _Release() method will be valid if it is equal to 0. I have optimized it more, now i am using only tokens and no antitokens in the balancers of the scalable counting networks, so i am only supporting increment, not decrement, so you have to be smart to invent it correctly, this is what i have done, so look at the AMInterfacedObject.pas file inside my zip file, you will notice that it uses counting_network_next_value() function, counting_network_next_value() increments the scalable counter network by 1, the _AddRef() method is simple, it increment by 1 to increment the reference to the object, but look inside the _Release() method it calls counting_network_next_value() three times, and my invention is calling counting_network_next_value(cn1) first inside the _Release() method to be able to make my scalable algorithm works, so just debug it more and you will notice that my scalable algorithm is smart and it is working correctly, i have debugged it and i think it is working correctly. I have to prove my scalable reference counting algorithm, like with mathematical proof, so i will use logic to prove like in PhD papers: You will find the code of my scalable reference counting inside AMInterfacedObject.pas inside the zip file here: If you look inside the code there is two methods, _AddRef() and _Release() methods, i am using two scalable counting networks, think about them like counters, so in the _AddRef() method i am executing the following: v1 := counting_network_next_value(cn1); cn1 is the scalable counting network, and counting_network_next_value() is a function that increment the scalable counting network by 1. In the _Release() method i am executing the following: v2 := counting_network_next_value(cn1); v1 := counting_network_next_value(cn2); v1 := counting_network_next_value(cn2); So my scalable algorithm is "smart", because the logical proof is that i am calling counting_network_next_value(cn1) first in the above, so this allows my scalable algorithm to work correctly, because we are advancing cn1 by 1 to obtain the value of cn1, so the other threads are advancing also cn1 by one inside _Release() , it is the last thread that is advancing cn1 by 1 that will make the reference counter equal to 0 , and _AddRef() method is the same and it is easy to reason about, so this scalable algorithm is working. Please look more carefully at my algorithm and you will notice that it is working as i have just logically proved it. Please read also the following to understand better: Here is the parameters of the constructor: First parameter is: The width of the scalable counting networks that permits my scalable refererence counting algorithm to be scalable, this parameter must be 1 to 31, it is now at 4 , this is the power, so it is equal to 2 power 4 , that means 24=16, and you have to pass this counting networks width to the n of following formula: (n*log(n)*(1+log(n)))/4 The log of the formula is in base 2 This formula gives the number of gates of the scalable counting networks, and if we replace n by 16, this will equal 80 gates, that means you can scale the scalable counting networks to 80 cores, and beyond 80 cores you will start to have contention. Second parameter is: a boolean that tells if reference counting is used or not, it is by default to true, that means that reference counting is used. About the weak references support: the Weak<T> type supports assignment from and to T and makes it usable as if you had a variable of T. It has the IsAlive property to check if the reference is still valid and not a dangling pointer. The Target property can be used if you want access to members of the reference. Note: the use of the IsAlive property on our weak reference, this tells us whether the referenced object is still available, and provides a safe way to get a concrete reference to the parent. I have ported efficient weak references support to Linux by implementing efficient code hooking, look at my DSharp.Core.Detour.pas file for Linux that i have written to see how i have implemented it in the Linux library. Please look at the example.dpr and test.pas demos to see how weak references work etc. Call _AddRef() and _Release() methods to manually increment or decrement the number of references to the object. - Platform: Windows and Linux(x86) Language: FPC Pascal v3.1.x+ / Delphi 2007+: http://www.freepascal.org/ Required FPC switches: -O3 -Sd -Sd for delphi mode.... Required Delphi switches: -$H+ -DDelphi For Delphi XE versions and Delphi Tokyo use the -DXE switch The defines options inside defines.inc are: {$DEFINE CPU32} for 32 bit systems {$DEFINE CPU64} for 64 bit systems Thank you, Amine Moulay Ramdane. |
Sky89 <Sky89@sky68.com>: Apr 28 02:46PM -0400 Hello.. Read this: My new Scalable reference counting with efficient support for weak references version 1.1 is here.. I have enhanced my scalable algorithm and now it is much powerful, now my scalable algorithm implemention works also as a "scalable" counter that supports both "increment" and "decrement" using two scalable counting networks, please take a look at my new scalable algorithm implementation inside the source code.. You can download my new scalable reference counting with efficient support for weak references version 1.1 from: https://sites.google.com/site/aminer68/scalable-reference-counting-with-efficient-support-for-weak-references Description: This is my scalable reference counting with efficient support for weak references, and since problems that cannot be solved without weak references are rare, so this library does scale very well, this scalable reference counting is implemented using scalable counting networks that eliminate completely false sharing , so it is fully scalable on multicore processors and manycore processors and this scalable algorithm is optimized, and this library does work on both Windows and Linux (x86), and it is easy to port to Mac OS X. I have modified a little bit my scalable algorithm, now as you will notice i am not using decrement with support for antitokens in the balancers of the scalable counting networks, i am only using an "increment", please look at my new scalable algorithm inside the zip file, i think it is working correctly. Also notice that the returned value of _Release() method will be valid if it is equal to 0. I have optimized it more, now i am using only tokens and no antitokens in the balancers of the scalable counting networks, so i am only supporting increment, not decrement, so you have to be smart to invent it correctly, this is what i have done, so look at the AMInterfacedObject.pas file inside my zip file, you will notice that it uses counting_network_next_value() function, counting_network_next_value() increments the scalable counter network by 1, the _AddRef() method is simple, it increment by 1 to increment the reference to the object, but look inside the _Release() method it calls counting_network_next_value() three times, and my invention is calling counting_network_next_value(cn1) first inside the _Release() method to be able to make my scalable algorithm works, so just debug it more and you will notice that my scalable algorithm is smart and it is working correctly, i have debugged it and i think it is working correctly. I have to prove my scalable reference counting algorithm, like with mathematical proof, so i will use logic to prove like in PhD papers: You will find the code of my scalable reference counting inside AMInterfacedObject.pas inside the zip file here: If you look inside the code there is two methods, _AddRef() and _Release() methods, i am using two scalable counting networks, think about them like counters, so in the _AddRef() method i am executing the following: v1 := counting_network_next_value(cn1); cn1 is the scalable counting network, and counting_network_next_value() is a function that increment the scalable counting network by 1. In the _Release() method i am executing the following: v2 := counting_network_next_value(cn1); v1 := counting_network_next_value(cn2); v1 := counting_network_next_value(cn2); So my scalable algorithm is "smart", because the logical proof is that i am calling counting_network_next_value(cn1) first in the above, so this allows my scalable algorithm to work correctly, because we are advancing cn1 by 1 to obtain the value of cn1, so the other threads are advancing also cn1 by one inside _Release() , it is the last thread that is advancing cn1 by 1 that will make the reference counter equal to 0 , and _AddRef() method is the same and it is easy to reason about, so this scalable algorithm is working. Please look more carefully at my algorithm and you will notice that it is working as i have just logically proved it. Please read also the following to understand better: Here is the parameters of the constructor: First parameter is: The width of the scalable counting networks that permits my scalable refererence counting algorithm to be scalable, this parameter must be 1 to 31, it is now at 4 , this is the power, so it is equal to 2 power 4 , that means 24=16, and you have to pass this counting networks width to the n of following formula: (n*log(n)*(1+log(n)))/4 The log of the formula is in base 2 This formula gives the number of gates of the scalable counting networks, and if we replace n by 16, this will equal 80 gates, that means you can scale the scalable counting networks to 80 cores, and beyond 80 cores you will start to have contention. Second parameter is: a boolean that tells if reference counting is used or not, it is by default to true, that means that reference counting is used. About the weak references support: the Weak<T> type supports assignment from and to T and makes it usable as if you had a variable of T. It has the IsAlive property to check if the reference is still valid and not a dangling pointer. The Target property can be used if you want access to members of the reference. Note: the use of the IsAlive property on our weak reference, this tells us whether the referenced object is still available, and provides a safe way to get a concrete reference to the parent. I have ported efficient weak references support to Linux by implementing efficient code hooking, look at my DSharp.Core.Detour.pas file for Linux that i have written to see how i have implemented it in the Linux library. Please look at the example.dpr and test.pas demos to see how weak references work etc. Call _AddRef() and _Release() methods to manually increment or decrement the number of references to the object. - Platform: Windows and Linux(x86) Language: FPC Pascal v3.1.x+ / Delphi 2007+: http://www.freepascal.org/ Required FPC switches: -O3 -Sd -Sd for delphi mode.... Required Delphi switches: -$H+ -DDelphi For Delphi XE versions and Delphi Tokyo use the -DXE switch The defines options inside defines.inc are: {$DEFINE CPU32} for 32 bit systems {$DEFINE CPU64} for 64 bit systems Thank you, Amine Moulay Ramdane. |
Sky89 <Sky89@sky68.com>: Apr 28 03:26PM -0400 Hello.. I correct a typo, please read again.. My new Scalable reference counting with efficient support for weak references version 1.1 is here.. I have enhanced my scalable algorithm and now it is much powerful, now my scalable algorithm implementation works also as a "scalable" counter that supports both "increment" and "decrement" using two scalable counting networks, please take a look at my new scalable algorithm implementation inside the source code.. You can download my new scalable reference counting with efficient support for weak references version 1.1 from: https://sites.google.com/site/aminer68/scalable-reference-counting-with-efficient-support-for-weak-references Description: This is my scalable reference counting with efficient support for weak references, and since problems that cannot be solved without weak references are rare, so this library does scale very well, this scalable reference counting is implemented using scalable counting networks that eliminate completely false sharing , so it is fully scalable on multicore processors and manycore processors and this scalable algorithm is optimized, and this library does work on both Windows and Linux (x86), and it is easy to port to Mac OS X. I have modified a little bit my scalable algorithm, now as you will notice i am not using decrement with support for antitokens in the balancers of the scalable counting networks, i am only using an "increment", please look at my new scalable algorithm inside the zip file, i think it is working correctly. Also notice that the returned value of _Release() method will be valid if it is equal to 0. I have optimized it more, now i am using only tokens and no antitokens in the balancers of the scalable counting networks, so i am only supporting increment, not decrement, so you have to be smart to invent it correctly, this is what i have done, so look at the AMInterfacedObject.pas file inside my zip file, you will notice that it uses counting_network_next_value() function, counting_network_next_value() increments the scalable counter network by 1, the _AddRef() method is simple, it increment by 1 to increment the reference to the object, but look inside the _Release() method it calls counting_network_next_value() three times, and my invention is calling counting_network_next_value(cn1) first inside the _Release() method to be able to make my scalable algorithm works, so just debug it more and you will notice that my scalable algorithm is smart and it is working correctly, i have debugged it and i think it is working correctly. I have to prove my scalable reference counting algorithm, like with mathematical proof, so i will use logic to prove like in PhD papers: You will find the code of my scalable reference counting inside AMInterfacedObject.pas inside the zip file here: If you look inside the code there is two methods, _AddRef() and _Release() methods, i am using two scalable counting networks, think about them like counters, so in the _AddRef() method i am executing the following: v1 := counting_network_next_value(cn1); cn1 is the scalable counting network, and counting_network_next_value() is a function that increment the scalable counting network by 1. In the _Release() method i am executing the following: v2 := counting_network_next_value(cn1); v1 := counting_network_next_value(cn2); v1 := counting_network_next_value(cn2); So my scalable algorithm is "smart", because the logical proof is that i am calling counting_network_next_value(cn1) first in the above, so this allows my scalable algorithm to work correctly, because we are advancing cn1 by 1 to obtain the value of cn1, so the other threads are advancing also cn1 by one inside _Release() , it is the last thread that is advancing cn1 by 1 that will make the reference counter equal to 0 , and _AddRef() method is the same and it is easy to reason about, so this scalable algorithm is working. Please look more carefully at my algorithm and you will notice that it is working as i have just logically proved it. Please read also the following to understand better: Here is the parameters of the constructor: First parameter is: The width of the scalable counting networks that permits my scalable refererence counting algorithm to be scalable, this parameter must be 1 to 31, it is now at 4 , this is the power, so it is equal to 2 power 4 , that means 24=16, and you have to pass this counting networks width to the n of following formula: (n*log(n)*(1+log(n)))/4 The log of the formula is in base 2 This formula gives the number of gates of the scalable counting networks, and if we replace n by 16, this will equal 80 gates, that means you can scale the scalable counting networks to 80 cores, and beyond 80 cores you will start to have contention. Second parameter is: a boolean that tells if reference counting is used or not, it is by default to true, that means that reference counting is used. About the weak references support: the Weak<T> type supports assignment from and to T and makes it usable as if you had a variable of T. It has the IsAlive property to check if the reference is still valid and not a dangling pointer. The Target property can be used if you want access to members of the reference. Note: the use of the IsAlive property on our weak reference, this tells us whether the referenced object is still available, and provides a safe way to get a concrete reference to the parent. I have ported efficient weak references support to Linux by implementing efficient code hooking, look at my DSharp.Core.Detour.pas file for Linux that i have written to see how i have implemented it in the Linux library. Please look at the example.dpr and test.pas demos to see how weak references work etc. Call _AddRef() and _Release() methods to manually increment or decrement the number of references to the object. - Platform: Windows and Linux(x86) Language: FPC Pascal v3.1.x+ / Delphi 2007+: http://www.freepascal.org/ Required FPC switches: -O3 -Sd -Sd for delphi mode.... Required Delphi switches: -$H+ -DDelphi For Delphi XE versions and Delphi Tokyo use the -DXE switch The defines options inside defines.inc are: {$DEFINE CPU32} for 32 bit systems {$DEFINE CPU64} for 64 bit systems Thank you, Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment