Thursday, July 4, 2019

Digest for comp.lang.c++@googlegroups.com - 8 updates in 6 topics

Horizon68 <horizon@horizon.com>: Jul 04 02:44PM -0700

Hello..
 
 
About the Active object pattern..
 
I think the proxy and scheduler of the Active object pattern are
embellishments, not essential. The core of the idea is simply a queue of
closures executed on different thread(s) to that of the client, and here
you are noticing that you can do the same thing as the Active object
pattern and more by using my powerful "invention" that is: An efficient
Threadpool engine with priorities that scales very well that you can
download from here:
 
https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well
 
 
This Threadpool of mine is really powerful because it scales very well
on multicore and NUMA systems, also it comes with a ParallelFor()
that scales very well on multicores and NUMA systems.
 
Here is the explanation of my ParallelFor() that scales very well:
 
I have also implemented a ParallelFor() that scales very well, here is
the method:
 
procedure ParallelFor(nMin, nMax:integer;aProc:
TParallelProc;GrainSize:integer=1;Ptr:pointer=nil;pmode:TParallelMode=pmBlocking;Priority:TPriorities=NORMAL_PRIORITY);
 
nMin and nMax parameters of the ParallelFor() are the minimum and
maximum integer values of the variable of the ParallelFor() loop, aProc
parameter of ParallelFor() is the procedure to call, and GrainSize
integer parameter of ParallelFor() is the following:
 
The grainsize sets a minimum threshold for parallelization.
 
A rule of thumb is that grainsize iterations should take at least
100,000 clock cycles to execute.
 
For example, if a single iteration takes 100 clocks, then the grainsize
needs to be at least 1000 iterations. When in doubt, do the following
experiment:
 
1- Set the grainsize parameter higher than necessary. The grainsize is
specified in units of loop iterations.
If you have no idea of how many clock cycles an iteration might take,
start with grainsize=100,000.
 
The rationale is that each iteration normally requires at least one
clock per iteration. In most cases, step 3 will guide you to a much
smaller value.
 
2- Run your algorithm.
 
3- Iteratively halve the grainsize parameter and see how much the
algorithm slows down or speeds up as the value decreases.
 
A drawback of setting a grainsize too high is that it can reduce
parallelism. For example, if the grainsize is 1000 and the loop has 2000
iterations, the ParallelFor() method distributes the loop across only
two processors, even if more are available.
 
And you can pass a parameter in Ptr as pointer to ParallelFor(), and you
can set pmode parameter of to pmBlocking so that ParallelFor() is
blocking or to pmNonBlocking so that ParallelFor() is non-blocking, and
the Priority parameter is the priority of ParallelFor(). Look inside the
test.pas example to see how to use it.
 
 
 
 
Thank you,
Amine Moulay Ramdane.
Horizon68 <horizon@horizon.com>: Jul 04 02:41PM -0700

Hello,
 
Read this:
 
 
What about garbage collection?
 
Read what said this serious specialist called Chris Lattner:
 
"One thing that I don't think is debatable is that the heap compaction
behavior of a GC (which is what provides the heap fragmentation win) is
incredibly hostile for cache (because it cycles the entire memory space
of the process) and performance predictability."
 
"Not relying on GC enables Swift to be used in domains that don't want
it - think boot loaders, kernels, real time systems like audio
processing, etc."
 
"GC also has several *huge* disadvantages that are usually glossed over:
while it is true that modern GC's can provide high performance, they can
only do that when they are granted *much* more memory than the process
is actually using. Generally, unless you give the GC 3-4x more memory
than is needed, you'll get thrashing and incredibly poor performance.
Additionally, since the sweep pass touches almost all RAM in the
process, they tend to be very power inefficient (leading to reduced
battery life)."
 
Read more here:
 
https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160208/009422.html
 
Here is Chris Lattner's Homepage:
 
http://nondot.org/sabre/
 
And here is Chris Lattner's resume:
 
http://nondot.org/sabre/Resume.html#Tesla
 
 
This why i have invented the following scalable algorithm and its
implementation that makes Delphi and FreePascal more powerful:
 
My invention that is my scalable reference counting with efficient
support for weak references version 1.35 is here..
 
Here i am again, i have just updated my scalable reference counting with
efficient support for weak references to version 1.35, I have just added
a TAMInterfacedPersistent that is a scalable reference counted version,
and now i think i have just made it complete and powerful.
 
Because I have just read the following web page:
 
https://www.codeproject.com/Articles/1252175/Fixing-Delphis-Interface-Limitations
 
But i don't agree with the writting of the guy of the above web page,
because i think you have to understand the "spirit" of Delphi, here is why:
 
A component is supposed to be owned and destroyed by something else,
"typically" a form (and "typically" means in english: in "most" cases,
and this is the most important thing to understand). In that scenario,
reference count is not used.
 
If you pass a component as an interface reference, it would be very
unfortunate if it was destroyed when the method returns.
 
Therefore, reference counting in TComponent has been removed.
 
Also because i have just added TAMInterfacedPersistent to my invention.
 
To use scalable reference counting with Delphi and FreePascal, just
replace TInterfacedObject with my TAMInterfacedObject that is the
scalable reference counted version, and just replace
TInterfacedPersistent with my TAMInterfacedPersistent that is the
scalable reference counted version, and you will find both my
TAMInterfacedObject and my TAMInterfacedPersistent inside the
AMInterfacedObject.pas file, and to know how to use weak references
please take a look at the demo that i have included called example.dpr
and look inside my zip file at the tutorial about weak references, and
to know how to use delegation take a look at the demo that i have
included called test_delegation.pas, and take a look inside my zip file
at the tutorial about delegation that learns you how to use delegation.
 
I think my Scalable reference counting with efficient support for weak
references is stable and fast, and it works on both Windows and Linux,
and my scalable reference counting scales on multicore and NUMA systems,
and you will not find it in C++ or Rust, and i don't think you will find
it anywhere, and you have to know that this invention of mine solves
the problem of dangling pointers and it solves the problem of memory
leaks and my scalable reference counting is "scalable".
 
And please read the readme file inside the zip file that i have just
extended to make you understand more.
 
You can download my new scalable reference counting with efficient
support for weak references version 1.35 from:
 
https://sites.google.com/site/scalable68/scalable-reference-counting-with-efficient-support-for-weak-references
 
 
Thank you,
Amine Moulay Ramdane.
Horizon68 <horizon@horizon.com>: Jul 04 01:45PM -0700

Hello...
 
 
Disadvantages of Actor model:
 
1- Not all languages easily enforce immutability
 
Erlang, the language that first popularized actors has immutability at
its core but Java and Scala (actually the JVM) does not enforce
immutability.
 
 
2- Still pretty complex
 
Actors are based on an asynchronous model of programming which is not so
straight forward and easy to model in all scenarios; it is particularly
difficult to handle errors and failure scenarios.
 
 
3- Does not prevent deadlock or starvation
 
Two actors can be in the state that wait message one from another; thus
you have a deadlock just like with locks, although much easier to debug.
With transactional memory however you are guaranteed deadlock free.
 
 
4- Not so efficient
 
Because of enforced immutability and because many actors have to switch
in using the same thread actors won't be as efficient as lock-based
concurrency.
 
 
Conclusion:
 
Lock-based concurrency is the most efficient.
 
 
More about Message Passing Process Communication Model and Shared Memory
Process Communication Model:
 
 
An advantage of shared memory model is that memory communication is
faster as compared to the message passing model on the same machine.
 
However, shared memory model may create problems such as synchronization
and memory protection that need to be addressed.
 
Message passing's major flaw is the inversion of control–it is a moral
equivalent of gotos in un-structured programming (it's about time
somebody said that message passing is considered harmful).
 
Also some research shows that the total effort to write an MPI
application is significantly higher than that required to write a
shared-memory version of it.
 
 
 
 
Thank you,
Amine Moulay Ramdane.
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Jul 04 10:29PM +0100

On 04/07/2019 21:45, Horizon68 wrote:
> Message passing's major flaw is the inversion of control–it is a moral
> equivalent of gotos in un-structured programming (it's about time somebody
> said that message passing is considered harmful).
 
Inversion of control is a good thing not a bad thing and when combined
with message passing complex systems can be easily reduced to a collection
of simpler sub-systems. Such systems can be considered orthogonal to
concurrent or data oriented systems, i.e. the two approaches can be used
together if the design of the system (and the system designer) isn't
fucktarded.
 
So, no, message passing is NOT considered harmful, only a fucktard would
think that.
 
/Flibble
 
 
--
"Snakes didn't evolve, instead talking snakes with legs changed into
snakes." - Rick C. Hodgin
 
"You won't burn in hell. But be nice anyway." – Ricky Gervais
 
"I see Atheists are fighting and killing each other again, over who
doesn't believe in any God the most. Oh, no..wait.. that never happens." –
Ricky Gervais
 
"Suppose it's all true, and you walk up to the pearly gates, and are
confronted by God," Bryne asked on his show The Meaning of Life. "What
will Stephen Fry say to him, her, or it?"
"I'd say, bone cancer in children? What's that about?" Fry replied.
"How dare you? How dare you create a world to which there is such misery
that is not our fault. It's not right, it's utterly, utterly evil."
"Why should I respect a capricious, mean-minded, stupid God who creates a
world that is so full of injustice and pain. That's what I would say."
Horizon68 <horizon@horizon.com>: Jul 04 02:23PM -0700

Hello,
 
 
More about Hardware transactional memory, and now about the
disadvantages of Intel TSX:
 
Here is also something interesting to read about hardware transactional
memory that is Intel TSX:
 
TSX does not gaurantee forward progress, so there must always be a
fallback non-TSX pathway. (complex transactions might always abort even
without any contention because they overflow the speculation buffer.
Even transactions that could run in theory might livelock forever if you
don't have the right pauses to allow forward progress, so the fallback
path is needed then too).
 
TSX works by keeping a speculative set of registers and processor state.
It tracks all reads done in the speculation block, and enqueues all
writes to be delayed until the transaction ends. The memory tracking of
the transaction is currently done using the L1 cache and the standard
cache line protocols. This means contention is only detected at cache
line granularity, so you have the standard "false sharing" issue.
 
If your transaction reads a cache line, then any write to that cache
line by another core causes the transaction to abort. (reads by other
cores do not cause an abort).
 
If your transaction writes a cache line, then any read or write by
another core causes the transaction to abort.
 
If your transaction aborts, then any cache lines written are evicted
from L1. If any of the cache lines involved in the transaction are
evicted during the transaction (eg. if you touch too much memory, or
another core locks that line), the transaction is aborted.
 
TSX seems to allow quite a large working set (up to size of L1 ?).
Obviously the more memory you touch the more likely to abort due to
contention.
 
Obviously you will get aborts from anything "funny" that's not just
plain code and memory access. Context switches, IO, kernel calls, etc.
will abort transactions.
 
At the moment, TSX is quite slow, even if there's no contention and you
don't do anything in the block. There's a lot of overhead. Using TSX
naively may slow down even threaded code. Getting significant
performance gains from it is non-trivial.
Read more here:
 
http://cbloomrants.blogspot.ca/2014/11/11-12-14-intel-tsx-notes.html
 
Thank you,
Amine Moulay Ramdane.
Horizon68 <horizon@horizon.com>: Jul 04 02:15PM -0700

Hello..
 
 
Read the following paper about the disadvantages of Transactional memory:
 
 
"Hardware-only (HTM) suffers from two major impediments:
high implementation and verification costs lead to design
risks too large to justify on a niche programming model;
hardware capacity constraints lead to significant performance
degradation when overflow occurs, and proposals for managing overflows
(for example, signatures ) incur false positives that add
complexity to the programming model.
 
Therefore, from an industrial perspective, HTM designs have to provide
more benefits for the cost, on a more diverse set of workloads (with
varying transactional characteristics) for hardware designers to
consider implementation."
 
etc.
 
"We observed that the TM programming model itself, whether implemented
in hardware or software, introduces complexities that limit the expected
productivity gains, thus reducing the current incentive for migration to
transactional programming, and the justification at present for anything
more than a small amount of hardware support."
 
 
Read more here:
 
http://pages.cs.wisc.edu/~cain/pubs/cascaval_cacm08.pdf
 
 
 
Thank you,
Amine Moulay Ramdane.
G G <gdotone@gmail.com>: Jul 03 09:21PM -0700

#define _PROTOTYPE(function, params) function params
 
 
how does this work?
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jul 04 06:38AM +0200

On 04.07.2019 06:21, G G wrote:
> #define _PROTOTYPE(function, params) function params
 
> how does this work?
 
It defines a pure text substitution rule, called a macro.
 
`params` is intended to be a parenthesized list of arguments.
 
Note that names starting with underscore followed by uppercase are
reserved to the implementation, so if this is not code from the C++
implementation it's UB.
 
 
Cheers!,
 
- Alf
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: