Saturday, March 24, 2018

Digest for comp.programming.threads@googlegroups.com - 5 updates in 5 topics

computer45 <computer45@cyber.com>: Mar 23 04:47PM -0400

Hello,
 
Read this:
 
 
WeeFence: Toward Making Fences Free in TSO
 
"Today's fences can be quite expensive. If, instead, they were
largely free, software could benefitsubstantially: programmers could
write faster fine-grained concurrent algorithms, and C++ and Java
compilers could guarantee SC at little cost.
 
In this paper, we have presented WFence, a fence that is very
cheap because it allows post-fence accesses to skip it. Such accesses
can typically complete and retire before the pre-fence writes
have drained from the write buffer. If an incorrect access reordering
is about to happen, the hardware stalls for a short period to avoid
it. In addition, WFence is compatible with the use of conventional
fences in the same program.
 
We presented the WFence design for TSO, and compared it to
a conventional fence with speculation for 8-processor simulations.
We ran parallel kernels that contain explicit fences and parallel
applications that do not. For the kernels, WFence eliminated nearly
all of the fence stall, reducing the kernels' execution time by an
average of 11%. For the applications, a conservative compiler algorithm
placed fences in the code to guarantee SC. Then, on average,
WFences reduced the resulting fence overhead from 38% of the
applications' execution time to 2% (in a centralized WFence design),
or from 36% to 5% (in a distributed WFence design).
Overall, the resulting cheap fence can be a good help for parallel
programming. In our future work, we plan to optimize the distributed
GRT design for the case where a WFence maps to multiple
GRT modules."
 
 
Read more here:
 
http://iacoma.cs.uiuc.edu/iacoma-papers/isca13_2.pdf
 
 
 
Thank you,
Amine Moulay Ramdane.
computer45 <computer45@cyber.com>: Mar 23 04:13PM -0400

Hello....
 
HPE's Superdome X: The Mission-Critical Scale-Up x86 Platform
for SAP, Oracle, and SQL Server
 
"As to scalability, vendors are attempting to grow the performance of
their high-end systems in as much a linear fashion as possible, with
every doubling of sockets. While 8 sockets have become the norm, HPE's
Superdome X scales to 16 sockets and, as such, HPE claims to hold
several performance world records for x86 servers with this platform.
 
HPE is focused on helping Oracle Database customers reduce complexity
and licensing costs with various attributes of the Superdome X,
including the fact that multicore processors are priced as
(number of cores)*(multicore factor) processors, where the multicore
factor is 0.5 for x86 processors versus 1.00 for modern processors
commonly used in Unix systems. Also, the cost of maintenance
and support tends to be lower on a modern scale-up x86 system than on a
legacy Unix system. HPE is also trying to migrate customers on Oracle
RAC, a popular but costly scale-out clustering solution that
delivers high availability, to a scale-up approach on the Superdome X,
which — HPE believes — would deliver similar levels of availability but
at lower cost."
 
 
Read more here:
 
https://h20195.www2.hpe.com/v2/getpdf.aspx/4AA5-5789ENW.pdf
 
 
 
Thank you,
Amine Moulay Ramdane.
computer45 <computer45@cyber.com>: Mar 23 03:02PM -0400

Hello..
 
Read this:
 
 
I have posted in my previous post that:
 
Memory Models: x86 is TSO, TSO is "Good"
 
But here is more benchmarks about relaxed memory models compared to SC
model:
 
 
Read carefully:
 
Architecture Support and Scalability Analysis of Memory
Consistency Models in Network-on-Chip based Systems
 
"From the experiments, it can also be observed that the reduction in the
relative execution times under relaxed memory models like WC, RC, and
PRC over the SC model is expected to be 3.33 percentage points
approximately for each doubling of the core count up to 128 cores. The
relaxed memory models could exploit the increasing parallelism in
the larger networks beyond the 64-cores by reordering and pipelining the
memory operations. Similarly, for the TSO and PSO models Figure 4.37
suggests that this can be projected to 1.0 percentage points for each
doubling of the core count up to 128 cores. In general, the performance
gain of the relaxed memory consistency models can be even
higher than the experimental values when the system size is further
increased.
 
 
Read more here:
 
https://www.diva-portal.org/smash/get/diva2:602640/FULLTEXT01.pdf
 
 
 
Thank you,
Amine Moulay Ramdane.
computer45 <computer45@cyber.com>: Mar 23 02:46PM -0400

Hello,
 
 
Read this:
 
Memory Models: x86 is TSO, TSO is "Good"
 
Read more here:
 
https://jakob.engbloms.se/archives/1435
 
 
Thank you,
Amine Moulay Ramdne.
computer45 <computer45@cyber.com>: Mar 23 02:03PM -0400

Hello,
 
Read this:
 
 
Best Practices for Low Latency Systems
 
Its been 8 years since Google noticed that an extra 500ms of latency
dropped traffic by 20% and Amazon realized that 100ms of extra latency
dropped sales by 1%. Ever since then developers have been racing to the
bottom of the latency curve, culminating in front-end developers
squeezing every last millisecond out of their JavaScript, CSS, and even
HTML.
 
Read more here:
 
https://codedependents.com/2014/01/27/11-best-practices-for-low-latency-systems/
 
 
Thank you,
Amine Moulay Ramdane.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

No comments: