Wednesday, June 17, 2020

Digest for comp.programming.threads@googlegroups.com - 1 update in 1 topic

aminer68@gmail.com: Jun 16 03:02PM -0700

Hello,
 
 
About beating Moore's Law with software..
 
bmoore has responded to me the following:
 
https://groups.google.com/forum/#!topic/soc.culture.china/Uu15FIknU0s
 
So as you are noticing he is asking me the following:
 
"Are you talking about beating Moore's Law with software?"
 
But i think that there is some of the following constraints:
 
"Modern programing environments contribute to the problem of software bloat by placing ease of development and portable code above speed or memory usage. While this is a sound business model in a commercial environment, it does not make sense where IT resources are constrained. Languages such as Java, C-Sharp, and Python have opted for code portability and software development speed above execution speed and memory usage, while modern data storage and transfer standards such as XML and JSON place flexibility and readability above efficiency."
 
Read the following:
 
https://smallwarsjournal.com/jrnl/art/overcoming-death-moores-law-role-software-advances-and-non-semiconductor-technologies
 
Also there remains the following to also beat Moores's Law:
 
"Improved Algorithms
 
Hardware improvements mean little if software cannot effectively use the resources available to it. The Army should shape future software algorithms by funding basic research on improved software algorithms to meet its specific needs. The Army should also search for new algorithms and techniques which can be applied to meet specific needs and develop a learning culture within its software community to disseminate this information."
 
 
And about scalable algorithms, as you know i am a white arab
that is an inventor of many scalable algorithms and there implementations, read my following thoughts to notice it:
 
About my new invention that is a scalable algorithm..
 
I am a white arab, and i think i am more smart,
and i think i am like a genius, because i have again just invented
a new scalable algorithm, but i will briefly talk about the following best scalable reader-writer lock inventions, the first one is the following:
 
Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks
 
https://www.usenix.org/system/files/conference/atc14/atc14-paper-liu.pdf
 
You will notice that it has a first weakness that it is for TSO hardware memory model and the second weakness is that the writers latency is very expensive when there is few readers.
 
And here is the other best scalable reader-writer lock invention of Facebook:
 
SharedMutex is a reader-writer lock. It is small, very fast, scalable
on multi-core
 
Read here:
 
https://github.com/facebook/folly/blob/master/folly/SharedMutex.h
 
 
But you will notice that the weakness of this scalable reader-writer lock is that the priority can only be configured as the following:
 
SharedMutexReadPriority gives priority to readers,
SharedMutexWritePriority gives priority to writers.
 
 
So the weakness of this scalable reader-writer lock is that
you can have starvation with it.
 
So this is why i have just invented a scalable algorithm that is
a scalable reader-writer lock that is better than the above and that is starvation-free and that is fair and that has a small writers latency.
 
So i think mine is the best and i will sell many of my scalable algorithms to software companies such as Microsoft or Google or Embardero..
 
 
What is it to be an inventor of many scalable algorithms ?
 
The Holy Grail of parallel programming is to provide good speedup while
hiding or avoiding the pitfalls of concurrency. You have to understand it to be able to understand what i am doing, i am an inventor of
many scalable algorithms and there implementations, but how can we define the kind of inventor like me? i think there is the following kinds of inventors, the ones that are PhD researchers and inventors like Albert Einstein, and the ones that are engineers and inventors like Nikola Tesla, and i think that i am of the kind of inventor of Nikola Tesla, i am not a PhD researcher like Albert Einstein, i am like an engineer who invented many scalable algorithms and there implementations, so i am like the following inventor that we call Nikola Tesla:
 
https://en.wikipedia.org/wiki/Nikola_Tesla
 
But i think that both those PhD researchers that are inventors and those Engineers that are inventors are powerful.
 
You have to understand deeply what is to invent my scalable algorithms
and there implementations so that to understand that it is powerful,
i give you an example: So i have invented a scalable algorithm that is a scalable Mutex that is remarkable and that is the Holy Grail of scalable Locks, it has the following characteristics, read my following thoughts
to understand:
 
About fair and unfair locking..
 
I have just read the following lead engineer at Amazon:
 
Highly contended and fair locking in Java
 
https://brooker.co.za/blog/2012/09/10/locking.html
 
So as you are noticing that you can use unfair locking that can have starvation or fair locking that is slower than unfair locking.
 
I think that Microsoft synchronization objects like the Windows critical section uses unfair locking, but they still can have starvation.
 
But i think that this not the good way to do, because i am an inventor and i have invented a scalable Fast Mutex that is much more powerful , because with my Fast Mutex you are capable to tune the "fairness" of the lock, and my Fast Mutex is capable of more than that, read about it on my following thoughts:
 
More about research and software development..
 
I have just looked at the following new video:
 
Why is coding so hard...
 
https://www.youtube.com/watch?v=TAAXwrgd1U8
 
I am understanding this video, but i have to explain my work:
 
I am not like this techlead in the video above, because i am also an "inventor" that has invented many scalable algorithms and there implementions, i am also inventing effective abstractions, i give you an example:
 
Read the following of the senior research scientist that is called Dave Dice:
 
Preemption tolerant MCS locks
 
https://blogs.oracle.com/dave/preemption-tolerant-mcs-locks
 
As you are noticing he is trying to invent a new lock that is preemption tolerant, but his lock lacks some important characteristics, this is why i have just invented a new Fast Mutex that is adaptative and that is much much better and i think mine is the "best", and i think you will not find it anywhere, my new Fast Mutex has the following characteristics:
 
1- Starvation-free
2- Tunable fairness
3- It keeps efficiently and very low its cache coherence traffic
4- Very good fast path performance
5- And it has a good preemption tolerance.
6- It is faster than scalable MCS lock
 
this is how i am an "inventor", and i have also invented other scalable algorithms such as a scalable reference counting with efficient support for weak references, and i have invented a fully scalable Threadpool, and i have also invented a Fully scalable FIFO queue, and i have
also invented other scalable algorithms and there implementations, and i think i will sell some of them to Microsoft or to Google or Embarcadero or such software companies.
 
And here is my other previous new invention of a scalable algorithm:
 
I have just read the following PhD paper about the invention that we call counting networks and they are better than Software combining trees:
 
Counting Networks
 
http://people.csail.mit.edu/shanir/publications/AHS.pdf
 
And i have read the following PhD paper:
 
http://people.csail.mit.edu/shanir/publications/HLS.pdf
 
So as you are noticing they are saying in the conclusion that:
 
"Software combining trees and counting networks which are the only techniques we observed to be truly scalable"
 
But i just found that this counting networks algorithm is not generally scalable, and i have the logical proof here, this is why i have just come with a new invention that enhance the counting networks algorithm to be generally scalable. And i think i will sell my new algorithm
of a generally scalable counting networks to Microsoft or Google or Embarcadero or such software companies.
 
So you have to be careful with the actual counting networks algorithm that is not generally scalable.
 
My other new invention is my scalable reference counting and here it is:
 
https://sites.google.com/site/scalable68/scalable-reference-counting-with-efficient-support-for-weak-references
 
And here is my just new invention of a scalable algorithm:
 
My Scalable RWLock that works across processes and threads was updated to version 4.62
 
Now i think it is working correctly in both Windows and Linux..
 
You can download it from my website here:
 
https://sites.google.com/site/scalable68/scalable-rwlock-that-works-accross-processes-and-threads
 
More about me as an inventor of many scalable algorithms..
 
I am a white arab and i think i am like a genius, because i have invented many scalable algorithms and there implementations, and look for example at my just new invention of a scalable algorithm here:
 
https://sites.google.com/site/scalable68/scalable-rwlock-that-works-accross-processes-and-threads
 
As you have noticed, you have to be like a genius to be able to invent
my above scalable algorithm of a scalable RWLock, because it has the following characteristics:
 
1- It is Scalable
2- It is Starvation-free
3- It is fair
4- It can be used across processes and threads
5- It can be used as a scalable Lock across processes and threads
by using my scalable AMLock that is FIFO fair on the writers side, or it can be
used as a scalable RWLock.
 
I am using my scalable Lock that is FIFO fair that is called scalable AMLock on the writers side.
 
Here is why scalable Locks are really important:
 
https://queue.acm.org/detail.cfm?id=2698990
 
So all in all it is a really good invention of mine.
 
Read my previous thoughts:
 
Here is how to use my new invention that is my scalable RWLock
across processes:
 
Just create an scalable rwlock object by giving a name in one process by calling the constructor like this:
 
scalable_rwlock.create('amine');
 
 
And you can use the scalable rwlock object from another process by calling the constructor by using the name like this:
 
scalable_rwlock.create('amine');
 
 
So as you are noticing i have abstracted it efficiently..
 
 
Read the rest of my previous thoughts:
 
My new invention of a Scalable RWLock that works across processes and threads is here, and now it works on both Windows and Linux..
 
Please download my source code and take a look at how i am making it work across processes by using FNV1a hash on both process ID and thread ID, FNV1a has a good dispersion, and FNV1a hash permits also my RWLock to be scalable.
 
 
You can download it from my website here:
 
https://sites.google.com/site/scalable68/scalable-rwlock-that-works-accross-processes-and-threads
 
Description:
 
This is my invention of a fast, and scalable and starvation-free and fair and lightweight Multiple-Readers-Exclusive-Writer Lock called LW_RWLockX, it works across processes and threads.
 
The parameters of the constructor are: first parameter is the name of the scalable RWLock to be used across processes, if the name is empty, it will only be used across threads. The second parameter is the size of the array of the readers, so if the size of the array is equal to the number of parallel readers, so it will be scalable, but if the number of readers are greater than the size of the array , you will start to have contention. The third parameter is the size of the array of my scalable Lock that is called AMLock, the number of threads can go beyond the size of the array of the scalable AMLock, please look at the source code of my scalable algorithms to understand.
 
I have also used my following implementation of FNV1a hash function to make my new variants of RWLocks scalable (since FNV1a is a hash algorithm that has good dispersion):
 
function FNV1aHash(key:int64): UInt64;
 
var
i: Integer;
key1:uint64;
 
const
 
FNV_offset_basis: UInt64 = 14695981039346656037;
FNV_prime: UInt64 = 1099511628211;
 
begin
 
//FNV-1a hash
 
Result := FNV_offset_basis;
 
for i := 1 to 8 do
begin
key1:=(key shr ((i-1)*8)) and $00000000000000ff;
Result := (Result xor key1) * FNV_prime;
end;
 
end;
 
- Platform: Windows, Unix and Linux on x86
 
Required FPC switches: -O3 -Sd
 
-Sd for delphi mode....
 
Required Delphi switches: -$H+ -DDelphi
 
For Delphi XE-XE7 and Delphi tokyo use the -DXE switch
 
You can configure it as follows from inside defines.inc file:
 
{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems
{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems
 
--'
 
I am a white arab, and why have i invented scalable RWLocks and scalable Locks ?
 
Because there is a disadvantage with Transactional memory and
here it is:
 
About Hardware Transactional Memory:
 
"As someone who has used TSX to optimize synchronization primitives, you can expect to see a ~15-20% performance increase, if (big if) your program is heavy on disjoint data access, i.e. a lock is needed for correctness, but conflicts are rare in practice. If you have a lot of threads frequently writing the same cache lines, you are probably going to see worse performance with TSX as opposed to traditional locking. It helps to think about TSX as transparently performing optimistic concurrency control, which is actually pretty much how it is implemented under the hood."
 
Read more here:
 
https://news.ycombinator.com/item?id=8169697
 
 
So as you are noticing, HTM (hardware transactional memory) and TM can not replace locks when doing IO and when we have a highly contended critical section.
 
 
Read the rest:
 
 
I have just read the following article that appeared in C/C++ Users Journal, 23(3), March 2005
 
The Trouble With Locks
 
http://gotw.ca/publications/mill36.htm
 
 
And here is my thoughts about how to avoid deadlocks and race conditions
in lock-based systems:
 
https://community.idera.com/developer-tools/general-development/f/getit-and-third-party/71464/about-turing-completeness-and-parallel-programming
 
Also i don't agree with him about composability of lock-based systems,
read the following to understand:
 
"About composability of lock-based systems now:
 
Design your systems to be composable. Among the more galling claims of the detractors of lock-based systems is the notion that they are somehow uncomposable:
 
"Locks and condition variables do not support modular programming," reads one typically brazen claim, "building large programs by gluing together smaller programs[:] locks make this impossible."9 The claim, of course, is incorrect. For evidence one need only point at the composition of lock-based systems such as databases and operating systems into larger systems that remain entirely unaware of lower-level locking.
 
There are two ways to make lock-based systems completely composable, and each has its own place. First (and most obviously), one can make locking entirely internal to the subsystem. For example, in concurrent operating systems, control never returns to user level with in-kernel locks held; the locks used to implement the system itself are entirely behind the system call interface that constitutes the interface to the system. More generally, this model can work whenever a crisp interface exists between software components: as long as control flow is never returned to the caller with locks held, the subsystem will remain composable.
 
Second (and perhaps counterintuitively), one can achieve concurrency and
composability by having no locks whatsoever. In this case, there must be
no global subsystem state—subsystem state must be captured in per-instance state, and it must be up to consumers of the subsystem to assure that they do not access their instance in
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

No comments: