comp.programming.threads@googlegroups.com | Google Groups | ![]() |
Unsure why you received this message? You previously subscribed to digests from this group, but we haven't been sending them for a while. We fixed that, but if you don't want to get these messages, send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
- I think i have found what is intelligence - 1 Update
- Read again - 1 Update
- What's intelligence - 1 Update
- Lets talk about it now - 1 Update
- About parallel programming - 1 Update
- An Architect way of thinking on parallel programming - 1 Update
- More about scalability - 2 Updates
- More optimization of the reader-writer algorithms - 1 Update
- More about reader-writer algorithms - 1 Update
- About the distributed reader-writer mutex - 2 Updates
- Here is my proof again - 2 Updates
- Here is my proof - 3 Updates
- Scalable distributed sequential lock version 1.11 - 1 Update
Ramine <ramine@1.1>: Dec 07 08:40PM -0800 Hello... I think i have found what is intelligence... When you look at neural networks, you will notice that there is neural networks that that can "learn" , so that the speed of "recognition" will become faster, problem solving in intelligence look like this , cause the way the neurals in the brain are interconnected does construct a different kind of architecture in the brain in each person, the way neurals in the brain are interconnected can also make the brain very fast, and the searching mechanism of the brain is not like a Breadth first search and deep first search , the neurals in the brain are interconnected in a way that can construct more faster architectures in the brain that don't use the same searching technics as Breadth first search and deep first search , but they use different ways of interconnecting the neurals of the brain so that the solution of the problem can come very fast , and this is what we call intelligence. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 07:25PM -0800 Hello, I have come to an interresting subject... I have thought for a long time to what is exactly is intelligence.. An here is my explanation of what is intelligence... When you see and say in mathematics: A < B and B < C => A < C I think problem solving in intelligence is in greater part due and influenced by the factor of "speed" of the brain.. because a fast brain will Breadth first search and deep first search the above mathematical implication faster, the brain will see A < B faster and B < C faster and will see there implication A < C faster that's is the big part of problem solving in intelligence, because the act of seeing the "bigger" or the "smaller" is something easy for humans, so i think that humans compose problem solving in intelligence by a divide and conquer way , but the "big" factor that influence problem solving in intelligence is the "speed" of the brain. This is my explanation of what is intelligence. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 07:22PM -0800 Hello, I have come to an interresting subject... I have thought for a long time to what is exactly is intelligence.. An here is my explanation of what is intelligence... When you see and say in mathematics: A < B and B < C => A < C I think problem solving in intelligence is in greater part due and influenced by the factor of "speed" of the brain.. because a fast brain will Breadth first search and deep first search the above mathematical implication faster, the brain will see A < B faster and B < C faster and will see there implication A < C faster that's is the big part of problem solving in intelligence, because the act of seeing the "bigger" of the "smaller" is something easy for humans, so i think that humans compose problem solving in intelligence by a divide and conquer way , but the "big" factor that influence problem solving in intelligence is the "speed" of the brain. This is my explanation of what is intelligence. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 06:42PM -0800 Hello, You will say that i am a smart person , this is why i have understood parallel programming like i have understood it... Perhaps... but when i have took IQ tests i have noticed that i am not good at the "speed" of solving problems, but i am good at "logic", i think Chris M. Thomasson and Dmitry Vyukov have faster brains than me, so if you give to them a problem to solve they will Breadth first search the solution with there brain faster than me, and this is my weakness, i don't have a faster brain as Chris M Thomasson or Dmitry Vyukov, but i am good at "logic" and the IQ tests that io have passed have proved so, now i want to explain to you something important, as you have seen me on this forum , i have told you that i have a socialist way of thinking, because i am a socialist, so my socialist way of thinking have forced me to favor one my concept that i call: "Intelligent easiness", intelligent easiness is the act of easying at the maximum possible knowledge and the processes and the learning processes and the life keeping at the same time a high degree of quality. This is my socialist way of of thinking, but we have to be frank , when you apply my concept that i call "intelligent easiness" , this act will have the effect of sharing more and more wealth between humans, but not all the people like my concept of "intelligent easiness" because its goal is to share wealth , so it's a kind of socialist way of thinking, this is why parallel programming seems hard to people even though parallel programming is an easy learning process.. because what you will notice is that people on internet don't want you to understand easily parallel programming, so the goal that is a good "quality" knowledge that easy the learning process for you is not there , this is the principal obstacle to learning parallel programming, people don't want to share with you wealth by applying my concept of "intelligent easiness" that also higher the quality, because people today are more capitalistic, this is why the learning process of parallel programming have took me years even though parallel programming is not so difficult. Thank you for your time. Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 06:00PM -0800 Hello... My dear programmers and hardware engineers , as you have seen me here on this forums, i have showed you how to reason about parallel programming and synchronization mechanisms, parallel programming is not hard i think, since i have worked on it for sometime now, because to know what is a race condition is something easy, and to know what is sequential consistency is simple and how to make a correct sequentialy consistent program on weak and strong memory models and such is also not so difficult... and to know what is lock convoy or priority inversion is something easy i think, and an important thing to know on parallel programming is how to do scalability prediction from the code , by using CPU cycles calculations for the serial part and the parallel part that's not so difficult also.. and to know how to optimize code by minimizing cache-coherence traffic is also not so difficult... so i don't think that programmers have to fear parallel programming, because the very important part is how to put your hand on quality information and quality explanation that easy the learning process for you about parallel programming, that's the good way to success i think on parallel programming, this is why you have seen me doing it here on this forum, what i have tried to do also is easy parallel programming for you. Thank you for your time, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 05:26PM -0800 Hello, This time since i have mastered to a certain level the reader-writer algorithms, i will this time think as an architect on parallel programming to show you something important, here it is: As you have seen me talking on my previous post, i have said that with the client-server mechanism that i have given you before , we can eliminate the lock synchronization mechanism from the writer side, and we can make the writer side very very cheap in term of running time compared to the reader section, so there is a very important question that we can ask then, does transactional memory, hardware or sofotware, will buy you much when designing a client-server parallel in-memory database or an client server parallel AVL tree or a client server parallel Red black tree ? i think it will not buy you much in term of runnning time, because with the client-server technic that i have given you above and that eliminates the lock synchronization from the writer side of the reader-writer algorithm , this will make the writer side very cheap in term of running time compared to the reader section that will be expensive in this scenario, because as i have said before is this: What i like about my scalable distributed sequential lock, is that if for example you want to implement a really fast and parallel in-memory database or a Parallel Redblack tree and a parallel AVL tree, the reader section will be much expensive than the writer section, cause in the writer side you first write the data outside the writer section and inside the writer section you just copy your pointer to the datastructure, but in the reader section you have to read the data from the memory from inside the reader section , that makes the reader section more expensive , so from the Amdahl's law this will make the parallel in-memory database that uses my scalable distributed sequential lock to scale very well(on NUMA etc.) or it will make the the parallel Redblack tree or parallel AVL tree that use my scalable distributed sequential lock to scale very well (on NUMA etc.). Hope you have understood my way of thinking like as an architect on parallel programming... Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 04:38PM -0800 Hello, What i like about my scalable distributed sequential lock, is that if for example you want to implement a really fast and parallel in-memory database or a Parallel Redblack tree and a parallel AVL tree, the reader section will be much expensive than the writer section, cause in the writer side you first write the data outside the writer section and inside the writer section you just copy your pointer to the datastructure, but in the reader section you have to read the data from the memory from inside the reader section , that makes the reader section more expensive , so from the Amdahl's law this will make the parallel in-memory database that uses my scalable distributed sequential lock to scale very well(on NUMA etc.) or it will make the the parallel Redblack tree or parallel AVL tree that use my scalable distributed sequential lock to scale very well (on NUMA etc.). That's very important to know. Thank you, Amine Moulay Ramdane,. |
Ramine <ramine@1.1>: Dec 07 05:10PM -0800 Hello, But you have to use this optimization: Hello, If you want to implement an in-memory database with my scalable distributed sequential lock, and you want to optimize more the writer side, i mean if you want to eliminate the expensive full memory barrier of the lock on the writer side , what can you do about it ? you can use for example a client server mechanism , that means the data is put by the producer threads on a queue , and the consumer thread will execute alone the writer section, and this way you can delete the lock from the writer side, this will make the writer section much much faster. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 04:23PM -0800 Hello, If you want to implement an in-memory database with my scalable distributed sequential lock, and you want to optimize more the writer side, i mean if you want to eliminate the expensive full memory barrier of the lock on the writer side , what can you do about it ? you can use for example a client server mechanism , that means the data is put by the producer threads on a queue , and the consumer thread will execute alone the writer section, and this way you can delete the lock from the writer side, this will make the writer section much much faster. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 03:49PM -0800 Hello, Hello again my dear programmers and hardware engineers... You have seen me talking about the reader-writer algorithms, i have played with the Dmitry Vyukov distributed reader-writer mutex, and i have implemented the SIDT(Store Interrupt Descriptor Table Register) that returns the IDT base to use it as a replacement for the windows GetCurrentProcessorNumber() used by the reader-writer distributed algorithm, but i have noticed that it SIDT doesn't work correctly, cause from time to time it doesn't return any IDT base, so i have decided to stay with the windows GetCurrentProcessorNumber() even if it is somewhat expensive, so now my question is where my scalable distributed sequential lock is used , is there any use cases for my new algorithm ? if you are using NUMA architecture this will parallelize the data bus to and from the memory and this will be very good for databases , so this will be a good use case for my scalable distibuted sequential lock if you want for example to implement an in-memory database that is really fast and that is parallelized, but even if you don't have NUMA , you can still benefit from my scalable distributed sequential lock cause it doesn't use any full memory barrier as PThread reader-write lock is doing and as the distributed reader-writer of Dmitry Vyukov is doing using an RWLock, so if for example you have took a look at my previous post you will notice that on small to medium reader sections on read-mostly scenarios my scalable distributed sequential lock will have the same speed and throughput as Seqlock, but it will be much faster on the throughput than the Dmitry Vyukov distributed reader-writer mutex using an RWLock, and it will be much much more faster on the throughput than the PThread reader-writer mutex, other than that my new algorithm of a scalable distributed sequential lock beats Seqlock on some characterists such as it doesn't starve and it doesn't livelock on scenarios with a greater percentage of writers, but Seqlock can livelock or starve. So hope you have understood the improvement that have brought my new algorithm over Seqlock and i think my new algorithm of a scalable distributed sequential lock is competitive with RCU on read-mostly scenarios. Finally my last word will be in the form of advice: Please make sure you understand my proof of correctness that i have presented to you and make sure to undertand very well my previous posts cause in this era of multicores systems you must be able to undertand Parallel programming and Synchronization algorithms, this is my advice. You can download my scalable distributed sequential lock version 1.11 from: https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 01:32PM -0800 Hello, http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex I have just done a calculation on small reader sections using the Amdahl's law(like when you are parallelizing Red Black trees or AVL trees) using this distributed reader-writer mutex above using an rwlock using the GetCurrentProcessorNumber() of windows, and i have compared it to Seqlock, at 4 cores the Seqlock has 15 times speed throughput of the distributed reader-writer mutex above, at 8 cores the Seqlock has 4 times the speed throughtput of the distributed reader-writer mutex above, at 16 cores the Seqlock has 2 times the speed throughtput of the distributed reader-writer mutex above, at 32 cores the Seqlock has the same speed throughput as the distributed reader-writer mutex above. So Seqlock and my scalable distributed reader-writer lock are much faster on small reader section than the distributed reader-writer mutex above. Thank you, Amine Moulay Ramdane, |
Ramine <ramine@1.1>: Dec 07 01:45PM -0800 Hello... This calculation was done on read-mostly scenarios. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 11:57AM -0800 Hello, I have added more to the sequential consistency proof, and i have corrected some typos , please read again... Here is my proof: Look at the source code, when Fcount5^.fcount5 equal 0 in the writer side , the reader side will be run in a Seqlock mode, , i don't need to proove Seqlock because my Seqlock algorithm is correct, just compare it to the other Seqlock algorithms and you will notice it, what i need to proof is when Fcount6^.fcount6 modulo "the number of cores" is equal to 0 , that means that we are in a distributed mode, so when we are in a distributed mode, the reader side of my algorithm will enter the distributed reader-writer lock or it will wait for the distributed reader-writer lock to exit, if the reader side enters the distributed reader-writer lock , if Fcount6^.fcount6 has not changed on the RUnlock(), the reader thread will exit with a "true" value, if Fcount6^.fcount6 has changed, the RUnlock() method will catch it and it will rollback with Seqlock mechanism, now if Fcount6^.fcount6 modulo "the number of cores" is equal to 0 on RLock() , and the reader side waits for the writer side to enter the distributed reader-writer lock, the reader side after that will enter the distributed reader-writer lock and if Fcount6^.fcount6 modulo "the number of cores" has changed on RUnlock(), the reader side will rollback in a Seqlock mode, if not, RUnlock() will exit with the value of "true". So all in all my algorithm is correct. About the sequential consistency of my scalable distributed sequential lock, my algorithm works on x86 architecture and i think my algorithm is correct cause look at the source code of the WLock() method, since i am using a Ticket spinlock with a proportional backoff on the writer side, the Ticket spinlock is using a "lock add" assembler instruction to increment a counter in the Enter() method of the ticket spinlock , and this "lock add" assembler instruction is a barrier for stores and loads on x86, and the stores of Fcount6^.fcount6 and FCount5^.fcount5 are not reordred with the stores of the writer section, so the WLock() method is sequential consistent and correct, now look at the WUnlock() , we don't need an "sfence" cause stores on WUnlock() are not reordered with older stores on x86 , so WUnlock() method is sequential consistent and correct, now look at the RLock() method, the loads inside RLock() method are not reordered with the loads of the reader section , and on RUnlock(), the loads of RUnlock() are not reordered with older loads of the critical section , so all in all my algorithm i think my algorithm is sequential consistent and correct on x86. So be confident cause i have reasoned correctly and i think my algorithm is correct and it is a powerful synchronization mechanism that can replace RCU and that can replace Seqlock cause it beats Seqlock. So this was my proof that my algorithm is correct. You can download my scalable distributed sequential lock version 1.11 from: https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock Read this about my scalable distributed sequential lock: Disclaimer: My software is provided on an "as-is" basis, with no warranties, express or implied. The entire risk and liability of using it is yours. Any damages resulting from the use or misuse of this software will be the responsibility of the user. Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 12:10PM -0800 On 12/7/2014 11:57 AM, Ramine wrote: > method are not reordered with the loads of the reader section , and on > RUnlock(), the loads of RUnlock() are not reordered with older loads of > the critical section , so all in all my algorithm i think my algorithm I mean here "reader section" , not critical section. |
Ramine <ramine@1.1>: Dec 07 11:19AM -0800 Hello, I have wrote in my previous post this: "Now i think my algorithm is correct." You will say: "Amine in computer science we need proof, so can you prove your algorithm is correct ?" Here is my proof: Look at the source code, when Fcount5^.fcount5 equal 0 in the writer side , the reader side will be run in a Seqlock mode, , i don't need to proove Seqlock because my Seqlock algorithm is correct, just compare it to the other Seqlock algorithms and you will notice it, what i need to proof is when Fcount6^.fcount6 modulo "the number of cores" is equal to 0 , that means that we are in a distributed mode, so when we are in a distributed mode, the reader side of my algorithm will enter the distributed reader-writer lock or it will wait for the distributed reader-writer lock to exit, if the reader side enters the distributed reader-writer lock , if Fcount6^.fcount6 has not changed on the RUnlock(), the reader thread will exit with a "true" value, if Fcount6^.fcount6 has changed, the RUnlock() method will catch it and it will rollback with Seqlock mechanism, now if Fcount6^.fcount6 modulo "the number of cores" is equal to 0 on RLock() , and the reader side waits for the writer side to enter the distributed reader-writer lock, the reader side after that will enter the distributed reader-writer lock and if Fcount6^.fcount6 modulo "the number of cores" has hanged on RUnlock(), the reader side will rollback in a Seqlock mode, if not, RUnlock() will exit with the value of "true". So all in all my algorithm is correct. About the sequential consistency of my scalable distributed sequential lock, my algorithm works on x86 architecture and i think my algorithm is correct cause look at the source code of the WLock() method, since i am using a Ticket spinlock with a proportional backoff on the writer side, the Ticket spinlock is using a "lock add" assembler instruction to increment a counter in the Enter() method of the ticket spinlock , and this "lock add" assembler instruction is a barrier for stores and loads on x86, so the WLock() method is sequential consistent and correct, now look at the WUnlock() , we don't need an "sfence" cause stores are not reordered with stores on x86 , so WUnlock() method is sequential consistent and correct, now look at the RLock() method, the loads inside RLock() method are not reordered with the loads of the reader section , and on RUnlock(), the loads of RUnlock() are not reordered with older loads of the critical section , so all in all my algorithm i think my algorithm is sequential consistent and correct on x86. So be confident cause i have reasoned correctly and i think my algorithm is correct and it is a powerful synchronization mechanism that can replace RCU and that can replace Seqlock cause it beats Seqlock. So this was my proof that my algorithm is correct. You can download my scalable distributed sequential lock version 1.11 from: https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock Thank you, Amine Moulay Ramdane. |
Ramine <ramine@1.1>: Dec 07 11:29AM -0800 Hello, Read this about my scalable distributed sequential lock: Disclaimer: My software is provided on an "as-is" basis, with no warranties, express or implied. The entire risk and liability of using it is yours. Any damages resulting from the use or misuse of this software will be the responsibility of the user. Thank you. Amine Moulay Ramdane. 1 |
Ramine <ramine@1.1>: Dec 07 11:36AM -0800 On 12/7/2014 11:19 AM, Ramine wrote: > waits for the writer side to enter the distributed reader-writer lock, > the reader side after that will enter the distributed reader-writer lock > and if Fcount6^.fcount6 modulo "the number of cores" has hanged A typo: i mean "has changed", not "has hanged". |
Ramine <ramine@1.1>: Dec 07 10:46AM -0800 Hello, I have updated my scalable distributed sequential lock to version 1.11, i have just corrected a bug, because before my algorithm was doing this on the writer side: In the WLock() it was doing this: If Fcount5^.fcount5=0 then fcount4^.fcount4:=Fcount4^.fcount4+1; And in the WUnlock() it was doing this: If Fcount5^.fcount5=0 then fcount4^.fcount4:=Fcount4^.fcount4+1; This has introduced a bug cause if we are just before that "FCount6^.fcount6 mod nbrcores" equal to 0 and the RLock() enters the reader section without calling the dw.RLock() and the writer side enters in a distributed mode and enters the writer section so they will both succeed to enter and the Runlock() will return true and this is a bug , to correct this bug we must delete the "If Fcount5^.fcount5=0" if the "count4^.fcount4:=Fcount4^.fcount4+1" like this: In the WLock() it was doing this: fcount4^.fcount4:=Fcount4^.fcount4+1; And in the WUnlock() it was doing this: fcount4^.fcount4:=Fcount4^.fcount4+1; Now i think my algorithm is correct. You can download the updated version 1.01 from: https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock Thank you, Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment