soft and program: Digest for comp.programming.threads@googlegroups.com

comp.programming.threads@googlegroups.com

Google Groups

Optical processor startup claims to outperform GPUs for AI workloads - 1 Update
I correct a typo, please read again - 1 Update
I have thought more about the following paper about Lock Cohorting - 1 Update
I think i have just made a mistake in my previous post - 1 Update
I was just reading about the following paper about Lock Cohorting - 1 Update
My Scalable reference counting with efficient support for weak references version 1.11 is here.. - 1 Update
About my new Scalable reference counting with efficient support for weak references version 1.1 - 1 Update
I want to share with you this beautiful song - 1 Update
I correct a typo, please read again.. - 1 Update
My new Scalable reference counting with efficient support for weak references version 1.1 is here.. - 2 Updates

Optical processor startup claims to outperform GPUs for AI workloads

Sky89 <Sky89@sky68.com>: Apr 29 12:25AM -0400

Hello..

Optical processor startup claims to outperform GPUs for AI workloads

Fathom Computing, a startup founded by brothers William and Michael
Andregg, is aiming to design an optical computer for neural networks. As
performance increases in traditional transistor-based CPUs have
relatively stagnated over the last few years, alternative computing
paradigms such as quantum computers have been gaining traction. While
optical computers are not a new concept—Coherent Optical Computers was
published in 1972—they have been relegated to university research
laboratories for decades.

The design of the Fathom prototype performs mathematical operations by
encoding numbers into light, according to this profile in Wired. The
light is then passed through a series of lenses and other optical
components. The measured result of this process is the calculated result
of the operation.

The Fathom prototype is not a general-purpose processor, it is designed
to compute specific types of linear algebra operations. Specifically,
Fathom is targeting the long short-term memory type of recurrent neural
networks, as well as the non-recurrent feedforward neural network. This
mirrors trends in quantum computers, as systems produced by D-Wave are
similarly targeted toward quantum annealing, rather than general processing.

In a recent blog post, Fathom indicated that they are still two years
away from launching, but that they expect their platform to
"significantly outperform state-of-the-art GPUs." The first systems will
be available as a cloud service for researchers working in artificial
intelligence.

Read more here:

https://www.techrepublic.com/article/optical-processor-startup-claims-to-outperform-gpus-for-ai-workloads/

Thank you,
Amine Moulay Ramdane.

I correct a typo, please read again

Sky89 <Sky89@sky68.com>: Apr 28 11:12PM -0400

Hello..

I correct a typo, please read again:

I have thought more about the following paper about Lock Cohorting: A
General Technique for Designing NUMA Locks:

http://groups.csail.mit.edu/mag/a13-dice.pdf

I think it is not a "bright" idea, this NUMA Locks that uses Lock
Cohorting do not optimize the inside of the critical sections protected
by the NUMA Locks that use Lock Cohorting, because the inside of those
critical sections may transfer/bring Data from different NUMA nodes, so
this will cancel the gains that we have got from NUMA Locks that uses
Lock cohorting. So i don't think i will implement Lock Cohorting.

So then i have invented my scalable AMLock and scalable MLock, please
read for example about my scalable MLock here:

https://sites.google.com/site/aminer68/scalable-mlock

You can download my scalable MLock for C++ by downloading
my C++ synchronization objects library that contains some of my
"inventions" here:

https://sites.google.com/site/aminer68/c-synchronization-objects-library

The Delphi and FreePascal version of my scalable MLock is here:

https://sites.google.com/site/aminer68/scalable-mlock

Thank you,
Amine Moulay Ramdane.

I have thought more about the following paper about Lock Cohorting

Sky89 <Sky89@sky68.com>: Apr 28 11:10PM -0400

Hello...

I have thought more about the following paper about Lock Cohorting: A
General Technique for Designing NUMA Locks:

http://groups.csail.mit.edu/mag/a13-dice.pdf

I think it is not a "bright" idea, this NUMA Locks that uses Lock
Cohorting do not optimize the inside of the critical section protected
by the NUMA Locks that use Lock Cohorting, because the inside of those
criticals sections may transfer/bring Data from different NUMA nodes, so
this will cancel the gains that we have got from NUMA Locks that uses
Lock cohorting. So i don't think i will implement Lock Cohorting.

So then i have invented my scalable AMLock and scalable MLock, please
read for example about my scalable MLock here:

https://sites.google.com/site/aminer68/scalable-mlock

You can download my scalable MLock for C++ by downloading
my C++ synchronization objects library that contains some of my
"inventions" here:

https://sites.google.com/site/aminer68/c-synchronization-objects-library

The Delphi and FreePascal version of my scalable MLock is here:

https://sites.google.com/site/aminer68/scalable-mlock

Thank you,
Amine Moulay Ramdane.

I think i have just made a mistake in my previous post

Sky89 <Sky89@sky68.com>: Apr 28 07:11PM -0400

Hello..

I think i have just made a mistake in my previous post:

I was just reading about the following paper about Lock Cohorting: A
General Technique for Designing NUMA Locks:

http://groups.csail.mit.edu/mag/a13-dice.pdf

I think that this Lock cohorting optimizes more the cache etc. so i
think it is great, so i will implement it in C++ and Delphi and FreePascal.

Thank you,
Amine Moulay Ramdane.

I was just reading about the following paper about Lock Cohorting

Sky89 <Sky89@sky68.com>: Apr 28 06:57PM -0400

Hello,

I was just reading about the following paper about Lock Cohorting: A
General Technique for Designing NUMA Locks:

http://groups.csail.mit.edu/mag/a13-dice.pdf

And i have noticed that they are testing on other NUMA systems than
Intel NUMA systems, i think that Intel NUMA systems are much more
optimized and the costs of data transfer between NUMA nodes on Intel
NUMA systems is "only" 1.6X of the a local NUMA node cost, so don't
bother about Lock Cohorting on Intel NUMA systems.

This is why i have invented my scalable AMLock and scalable MLock,
please read for example about my scalable MLock here:

https://sites.google.com/site/aminer68/scalable-mlock

You can download my scalable MLock for C++ by downloading
my C++ synchronization objects library that contains some of my
"inventions" here:

https://sites.google.com/site/aminer68/c-synchronization-objects-library

The Delphi and FreePascal version of my scalable MLock is here:

https://sites.google.com/site/aminer68/scalable-mlock

Thank you,
Amine Moulay Ramdane.

My Scalable reference counting with efficient support for weak references version 1.11 is here..

Sky89 <Sky89@sky68.com>: Apr 28 05:08PM -0400

Hello,

My Scalable reference counting with efficient support for weak
references version 1.11 is here..

There was no bug in version 1.1, this new version 1.11 is just that i
have switched the variables "head" and "tail" in my scalable reference
counting algorithm.

You can download my Scalable reference counting with efficient support
for weak references version 1.11 from:

https://sites.google.com/site/aminer68/scalable-reference-counting-with-efficient-support-for-weak-references

Thank you,
Amine Moulay Ramdane.

About my new Scalable reference counting with efficient support for weak references version 1.1

Sky89 <Sky89@sky68.com>: Apr 28 04:36PM -0400

Hello..

About my new Scalable reference counting with efficient support for weak
references version 1.1:

Weak references support is done by hooking the TObject.FreeInstance
method so every object destruction is noticed and if a weak reference
for that object exists it gets removed from the internal dictionary
where all weak references are stored. While it works I am aware that
this is hacky approach and it might not work if someone overrides the
FreeInstance method and does not call inherited.

You can download and read about my new scalable reference counting with
efficient support for weak references version 1.1 from:

https://sites.google.com/site/aminer68/scalable-reference-counting-with-efficient-support-for-weak-references

Thank you,
Amine Moulay Ramdane.

I want to share with you this beautiful song

Sky89 <Sky89@sky68.com>: Apr 28 04:07PM -0400

Hello..

I want to share with you this beautiful song:

Laid Back - Sunshine Reggae

https://www.youtube.com/watch?v=bNowU63PF5E

Thank you,
Amine Moulay Ramdane.

I correct a typo, please read again..

Sky89 <Sky89@sky68.com>: Apr 28 03:34PM -0400

Hello..

I correct a typo, please read again..

My new Scalable reference counting with efficient support for weak
references version 1.1 is here..

I have enhanced my scalable algorithm and now it is much powerful, now
my scalable algorithm implementation works also as a "scalable" counter
that supports both "increment" and "decrement" using two scalable
counting networks, please take a look at my new scalable algorithm
implementation inside the source code..

You can download my new scalable reference counting with efficient
support for weak references version 1.1 from:

https://sites.google.com/site/aminer68/scalable-reference-counting-with-efficient-support-for-weak-references

Description:

This is my scalable reference counting with efficient support for weak
references, and since problems that cannot be solved without weak
references are rare, so this library does scale very well, this scalable
reference counting is implemented using scalable counting networks that
eliminate completely false sharing , so it is fully scalable on
multicore processors and manycore processors and this scalable algorithm
is optimized, and this library does work on both Windows and Linux
(x86), and it is easy to port to Mac OS X.

I have modified my scalable algorithm, now as you will notice i am not
using decrement with support for antitokens in the balancers of the
scalable counting networks, i am only using an "increment", please look
at my new scalable algorithm inside the zip file, i think it is working
correctly. Also notice that the returned value of _Release() method will
be valid if it is equal to 0.

I have optimized it more, now i am using only tokens and no antitokens
in the balancers of the scalable counting networks, so i am only
supporting increment, not decrement, so you have to be smart to invent
it correctly, this is what i have done, so look at the
AMInterfacedObject.pas file inside my zip file, you will notice that it
uses counting_network_next_value() function,
counting_network_next_value() increments the scalable counter network by
1, the _AddRef() method is simple, it increment by 1 to increment the
reference to the object, but look inside the _Release() method it calls
counting_network_next_value() three times, and my invention is calling
counting_network_next_value(cn1) first inside the _Release() method to
be able to make my scalable algorithm works, so just debug it more and
you will notice that my scalable algorithm is smart and it is working
correctly, i have debugged it and i think it is working correctly.

I have to prove my scalable reference counting algorithm, like with
mathematical proof, so i will use logic to prove like in PhD papers:

You will find the code of my scalable reference counting inside
AMInterfacedObject.pas inside the zip file here:

If you look inside the code there is two methods, _AddRef() and
_Release() methods, i am using two scalable counting networks,
think about them like counters, so in the _AddRef() method i am
executing the following:

v1 := counting_network_next_value(cn1);

cn1 is the scalable counting network, and counting_network_next_value()
is a function that increment the scalable counting network by 1.

In the _Release() method i am executing the following:

v2 := counting_network_next_value(cn1);
v1 := counting_network_next_value(cn2);
v1 := counting_network_next_value(cn2);

So my scalable algorithm is "smart", because the logical proof is
that i am calling counting_network_next_value(cn1) first in the
above, so this allows my scalable algorithm to work correctly,
because we are advancing cn1 by 1 to obtain the value of cn1,
so the other threads are advancing also cn1 by one inside
_Release() , it is the last thread that is advancing cn1 by 1 that will
make the reference counter equal to 0 , and _AddRef() method is the same
and it is easy to reason about, so this scalable algorithm is working.
Please look more carefully at my algorithm and you will notice that it
is working as i have just logically proved it.

Please read also the following to understand better:

Here is the parameters of the constructor:

First parameter is: The width of the scalable counting networks that
permits my scalable refererence counting algorithm to be scalable, this
parameter must be 1 to 31, it is now at 4 , this is the power, so it is
equal to 2 power 4 , that means 24=16, and you have to pass this
counting networks width to the n of following formula:

(n*log(n)*(1+log(n)))/4

The log of the formula is in base 2

This formula gives the number of gates of the scalable counting
networks, and if we replace n by 16, this will equal 80 gates, that
means you can scale the scalable counting networks to 80 cores, and
beyond 80 cores you will start to have contention.

Second parameter is: a boolean that tells if reference counting is used
or not, it is by default to true, that means that reference counting is
used.

About the weak references support: the Weak<T> type supports assignment
from and to T and makes it usable as if you had a variable of T. It has
the IsAlive property to check if the reference is still valid and not a
dangling pointer. The Target property can be used if you want access to
members of the reference.

Note: the use of the IsAlive property on our weak reference, this tells
us whether the referenced object is still available, and provides a safe
way to get a concrete reference to the parent.

I have ported efficient weak references support to Linux by implementing
efficient code hooking, look at my DSharp.Core.Detour.pas file for Linux
that i have written to see how i have implemented it in the Linux
library. Please look at the example.dpr and test.pas demos to see how
weak references work etc.

Call _AddRef() and _Release() methods to manually increment or decrement
the number of references to the object.

- Platform: Windows and Linux(x86)

Language: FPC Pascal v3.1.x+ / Delphi 2007+:

http://www.freepascal.org/

Required FPC switches: -O3 -Sd

-Sd for delphi mode....

Required Delphi switches: -$H+ -DDelphi

For Delphi XE versions and Delphi Tokyo use the -DXE switch

The defines options inside defines.inc are:

{$DEFINE CPU32} for 32 bit systems

{$DEFINE CPU64} for 64 bit systems

Thank you,
Amine Moulay Ramdane.

My new Scalable reference counting with efficient support for weak references version 1.1 is here..

Sky89 <Sky89@sky68.com>: Apr 28 02:46PM -0400

Hello..

Read this:

My new Scalable reference counting with efficient support for weak
references version 1.1 is here..

I have enhanced my scalable algorithm and now it is much powerful, now
my scalable algorithm implemention works also as a "scalable" counter
that supports both "increment" and "decrement" using two scalable
counting networks, please take a look at my new scalable algorithm
implementation inside the source code..

You can download my new scalable reference counting with efficient
support for weak references version 1.1 from:

https://sites.google.com/site/aminer68/scalable-reference-counting-with-efficient-support-for-weak-references

Description:

This is my scalable reference counting with efficient support for weak
references, and since problems that cannot be solved without weak
references are rare, so this library does scale very well, this scalable
reference counting is implemented using scalable counting networks that
eliminate completely false sharing , so it is fully scalable on
multicore processors and manycore processors and this scalable algorithm
is optimized, and this library does work on both Windows and Linux
(x86), and it is easy to port to Mac OS X.

I have modified a little bit my scalable algorithm, now as you will
notice i am not using decrement with support for antitokens in the
balancers of the scalable counting networks, i am only using an
"increment", please look at my new scalable algorithm inside the zip
file, i think it is working correctly. Also notice that the returned
value of _Release() method will be valid if it is equal to 0.

I have optimized it more, now i am using only tokens and no antitokens
in the balancers of the scalable counting networks, so i am only
supporting increment, not decrement, so you have to be smart to invent
it correctly, this is what i have done, so look at the
AMInterfacedObject.pas file inside my zip file, you will notice that it
uses counting_network_next_value() function,
counting_network_next_value() increments the scalable counter network by
1, the _AddRef() method is simple, it increment by 1 to increment the
reference to the object, but look inside the _Release() method it calls
counting_network_next_value() three times, and my invention is calling
counting_network_next_value(cn1) first inside the _Release() method to
be able to make my scalable algorithm works, so just debug it more and
you will notice that my scalable algorithm is smart and it is working
correctly, i have debugged it and i think it is working correctly.

I have to prove my scalable reference counting algorithm, like with
mathematical proof, so i will use logic to prove like in PhD papers:

You will find the code of my scalable reference counting inside
AMInterfacedObject.pas inside the zip file here:

If you look inside the code there is two methods, _AddRef() and
_Release() methods, i am using two scalable counting networks,
think about them like counters, so in the _AddRef() method i am
executing the following:

v1 := counting_network_next_value(cn1);

cn1 is the scalable counting network, and counting_network_next_value()
is a function that increment the scalable counting network by 1.

In the _Release() method i am executing the following:

v2 := counting_network_next_value(cn1);
v1 := counting_network_next_value(cn2);
v1 := counting_network_next_value(cn2);

So my scalable algorithm is "smart", because the logical proof is
that i am calling counting_network_next_value(cn1) first in the
above, so this allows my scalable algorithm to work correctly,
because we are advancing cn1 by 1 to obtain the value of cn1,
so the other threads are advancing also cn1 by one inside
_Release() , it is the last thread that is advancing cn1 by 1 that will
make the reference counter equal to 0 , and _AddRef() method is the same
and it is easy to reason about, so this scalable algorithm is working.
Please look more carefully at my algorithm and you will notice that it
is working as i have just logically proved it.

Please read also the following to understand better:

Here is the parameters of the constructor:

First parameter is: The width of the scalable counting networks that
permits my scalable refererence counting algorithm to be scalable, this
parameter must be 1 to 31, it is now at 4 , this is the power, so it is
equal to 2 power 4 , that means 24=16, and you have to pass this
counting networks width to the n of following formula:

(n*log(n)*(1+log(n)))/4

The log of the formula is in base 2

This formula gives the number of gates of the scalable counting
networks, and if we replace n by 16, this will equal 80 gates, that
means you can scale the scalable counting networks to 80 cores, and
beyond 80 cores you will start to have contention.

Second parameter is: a boolean that tells if reference counting is used
or not, it is by default to true, that means that reference counting is
used.

About the weak references support: the Weak<T> type supports assignment
from and to T and makes it usable as if you had a variable of T. It has
the IsAlive property to check if the reference is still valid and not a
dangling pointer. The Target property can be used if you want access to
members of the reference.

Note: the use of the IsAlive property on our weak reference, this tells
us whether the referenced object is still available, and provides a safe
way to get a concrete reference to the parent.

I have ported efficient weak references support to Linux by implementing
efficient code hooking, look at my DSharp.Core.Detour.pas file for Linux
that i have written to see how i have implemented it in the Linux
library. Please look at the example.dpr and test.pas demos to see how
weak references work etc.

Call _AddRef() and _Release() methods to manually increment or decrement
the number of references to the object.

- Platform: Windows and Linux(x86)

Language: FPC Pascal v3.1.x+ / Delphi 2007+:

http://www.freepascal.org/

Required FPC switches: -O3 -Sd

-Sd for delphi mode....

Required Delphi switches: -$H+ -DDelphi

For Delphi XE versions and Delphi Tokyo use the -DXE switch

The defines options inside defines.inc are:

{$DEFINE CPU32} for 32 bit systems

{$DEFINE CPU64} for 64 bit systems

Thank you,
Amine Moulay Ramdane.

Sky89 <Sky89@sky68.com>: Apr 28 03:26PM -0400

Hello..

I correct a typo, please read again..

My new Scalable reference counting with efficient support for weak
references version 1.1 is here..

I have enhanced my scalable algorithm and now it is much powerful, now
my scalable algorithm implementation works also as a "scalable" counter
that supports both "increment" and "decrement" using two scalable
counting networks, please take a look at my new scalable algorithm
implementation inside the source code..

You can download my new scalable reference counting with efficient
support for weak references version 1.1 from:

https://sites.google.com/site/aminer68/scalable-reference-counting-with-efficient-support-for-weak-references

Description:

This is my scalable reference counting with efficient support for weak
references, and since problems that cannot be solved without weak
references are rare, so this library does scale very well, this scalable
reference counting is implemented using scalable counting networks that
eliminate completely false sharing , so it is fully scalable on
multicore processors and manycore processors and this scalable algorithm
is optimized, and this library does work on both Windows and Linux
(x86), and it is easy to port to Mac OS X.

I have modified a little bit my scalable algorithm, now as you will
notice i am not using decrement with support for antitokens in the
balancers of the scalable counting networks, i am only using an
"increment", please look at my new scalable algorithm inside the zip
file, i think it is working correctly. Also notice that the returned
value of _Release() method will be valid if it is equal to 0.

I have optimized it more, now i am using only tokens and no antitokens
in the balancers of the scalable counting networks, so i am only
supporting increment, not decrement, so you have to be smart to invent
it correctly, this is what i have done, so look at the
AMInterfacedObject.pas file inside my zip file, you will notice that it
uses counting_network_next_value() function,
counting_network_next_value() increments the scalable counter network by
1, the _AddRef() method is simple, it increment by 1 to increment the
reference to the object, but look inside the _Release() method it calls
counting_network_next_value() three times, and my invention is calling
counting_network_next_value(cn1) first inside the _Release() method to
be able to make my scalable algorithm works, so just debug it more and
you will notice that my scalable algorithm is smart and it is working
correctly, i have debugged it and i think it is working correctly.

I have to prove my scalable reference counting algorithm, like with
mathematical proof, so i will use logic to prove like in PhD papers:

You will find the code of my scalable reference counting inside
AMInterfacedObject.pas inside the zip file here:

If you look inside the code there is two methods, _AddRef() and
_Release() methods, i am using two scalable counting networks,
think about them like counters, so in the _AddRef() method i am
executing the following:

v1 := counting_network_next_value(cn1);

cn1 is the scalable counting network, and counting_network_next_value()
is a function that increment the scalable counting network by 1.

In the _Release() method i am executing the following:

v2 := counting_network_next_value(cn1);
v1 := counting_network_next_value(cn2);
v1 := counting_network_next_value(cn2);

So my scalable algorithm is "smart", because the logical proof is
that i am calling counting_network_next_value(cn1) first in the
above, so this allows my scalable algorithm to work correctly,
because we are advancing cn1 by 1 to obtain the value of cn1,
so the other threads are advancing also cn1 by one inside
_Release() , it is the last thread that is advancing cn1 by 1 that will
make the reference counter equal to 0 , and _AddRef() method is the same
and it is easy to reason about, so this scalable algorithm is working.
Please look more carefully at my algorithm and you will notice that it
is working as i have just logically proved it.

Please read also the following to understand better:

Here is the parameters of the constructor:

First parameter is: The width of the scalable counting networks that
permits my scalable refererence counting algorithm to be scalable, this
parameter must be 1 to 31, it is now at 4 , this is the power, so it is
equal to 2 power 4 , that means 24=16, and you have to pass this
counting networks width to the n of following formula:

(n*log(n)*(1+log(n)))/4

The log of the formula is in base 2

This formula gives the number of gates of the scalable counting
networks, and if we replace n by 16, this will equal 80 gates, that
means you can scale the scalable counting networks to 80 cores, and
beyond 80 cores you will start to have contention.

Second parameter is: a boolean that tells if reference counting is used
or not, it is by default to true, that means that reference counting is
used.

About the weak references support: the Weak<T> type supports assignment
from and to T and makes it usable as if you had a variable of T. It has
the IsAlive property to check if the reference is still valid and not a
dangling pointer. The Target property can be used if you want access to
members of the reference.

Note: the use of the IsAlive property on our weak reference, this tells
us whether the referenced object is still available, and provides a safe
way to get a concrete reference to the parent.

I have ported efficient weak references support to Linux by implementing
efficient code hooking, look at my DSharp.Core.Detour.pas file for Linux
that i have written to see how i have implemented it in the Linux
library. Please look at the example.dpr and test.pas demos to see how
weak references work etc.

Call _AddRef() and _Release() methods to manually increment or decrement
the number of references to the object.

- Platform: Windows and Linux(x86)

Language: FPC Pascal v3.1.x+ / Delphi 2007+:

http://www.freepascal.org/

Required FPC switches: -O3 -Sd

-Sd for delphi mode....

Required Delphi switches: -$H+ -DDelphi

For Delphi XE versions and Delphi Tokyo use the -DXE switch

The defines options inside defines.inc are:

{$DEFINE CPU32} for 32 bit systems

{$DEFINE CPU64} for 64 bit systems

Thank you,
Amine Moulay Ramdane.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

soft and program

Sunday, April 29, 2018

Digest for comp.programming.threads@googlegroups.com - 11 updates in 10 topics

No comments:

Blog Archive

About Me