- WeeFence: Toward Making Fences Free in TSO - 1 Update
- HPE's Superdome X: The Mission-Critical Scale-Up x86 Platform,for SAP, Oracle, and SQL Server - 1 Update
- But here is more benchmarks about relaxed memory models compared to SC model - 1 Update
- Memory Models: x86 is TSO, TSO is "Good" - 1 Update
- Best Practices for Low Latency Systems - 1 Update
computer45 <computer45@cyber.com>: Mar 23 04:47PM -0400 Hello, Read this: WeeFence: Toward Making Fences Free in TSO "Today's fences can be quite expensive. If, instead, they were largely free, software could benefitsubstantially: programmers could write faster fine-grained concurrent algorithms, and C++ and Java compilers could guarantee SC at little cost. In this paper, we have presented WFence, a fence that is very cheap because it allows post-fence accesses to skip it. Such accesses can typically complete and retire before the pre-fence writes have drained from the write buffer. If an incorrect access reordering is about to happen, the hardware stalls for a short period to avoid it. In addition, WFence is compatible with the use of conventional fences in the same program. We presented the WFence design for TSO, and compared it to a conventional fence with speculation for 8-processor simulations. We ran parallel kernels that contain explicit fences and parallel applications that do not. For the kernels, WFence eliminated nearly all of the fence stall, reducing the kernels' execution time by an average of 11%. For the applications, a conservative compiler algorithm placed fences in the code to guarantee SC. Then, on average, WFences reduced the resulting fence overhead from 38% of the applications' execution time to 2% (in a centralized WFence design), or from 36% to 5% (in a distributed WFence design). Overall, the resulting cheap fence can be a good help for parallel programming. In our future work, we plan to optimize the distributed GRT design for the case where a WFence maps to multiple GRT modules." Read more here: http://iacoma.cs.uiuc.edu/iacoma-papers/isca13_2.pdf Thank you, Amine Moulay Ramdane. |
computer45 <computer45@cyber.com>: Mar 23 04:13PM -0400 Hello.... HPE's Superdome X: The Mission-Critical Scale-Up x86 Platform for SAP, Oracle, and SQL Server "As to scalability, vendors are attempting to grow the performance of their high-end systems in as much a linear fashion as possible, with every doubling of sockets. While 8 sockets have become the norm, HPE's Superdome X scales to 16 sockets and, as such, HPE claims to hold several performance world records for x86 servers with this platform. HPE is focused on helping Oracle Database customers reduce complexity and licensing costs with various attributes of the Superdome X, including the fact that multicore processors are priced as (number of cores)*(multicore factor) processors, where the multicore factor is 0.5 for x86 processors versus 1.00 for modern processors commonly used in Unix systems. Also, the cost of maintenance and support tends to be lower on a modern scale-up x86 system than on a legacy Unix system. HPE is also trying to migrate customers on Oracle RAC, a popular but costly scale-out clustering solution that delivers high availability, to a scale-up approach on the Superdome X, which — HPE believes — would deliver similar levels of availability but at lower cost." Read more here: https://h20195.www2.hpe.com/v2/getpdf.aspx/4AA5-5789ENW.pdf Thank you, Amine Moulay Ramdane. |
computer45 <computer45@cyber.com>: Mar 23 03:02PM -0400 Hello.. Read this: I have posted in my previous post that: Memory Models: x86 is TSO, TSO is "Good" But here is more benchmarks about relaxed memory models compared to SC model: Read carefully: Architecture Support and Scalability Analysis of Memory Consistency Models in Network-on-Chip based Systems "From the experiments, it can also be observed that the reduction in the relative execution times under relaxed memory models like WC, RC, and PRC over the SC model is expected to be 3.33 percentage points approximately for each doubling of the core count up to 128 cores. The relaxed memory models could exploit the increasing parallelism in the larger networks beyond the 64-cores by reordering and pipelining the memory operations. Similarly, for the TSO and PSO models Figure 4.37 suggests that this can be projected to 1.0 percentage points for each doubling of the core count up to 128 cores. In general, the performance gain of the relaxed memory consistency models can be even higher than the experimental values when the system size is further increased. Read more here: https://www.diva-portal.org/smash/get/diva2:602640/FULLTEXT01.pdf Thank you, Amine Moulay Ramdane. |
computer45 <computer45@cyber.com>: Mar 23 02:46PM -0400 Hello, Read this: Memory Models: x86 is TSO, TSO is "Good" Read more here: https://jakob.engbloms.se/archives/1435 Thank you, Amine Moulay Ramdne. |
computer45 <computer45@cyber.com>: Mar 23 02:03PM -0400 Hello, Read this: Best Practices for Low Latency Systems Its been 8 years since Google noticed that an extra 500ms of latency dropped traffic by 20% and Amazon realized that 100ms of extra latency dropped sales by 1%. Ever since then developers have been racing to the bottom of the latency curve, culminating in front-end developers squeezing every last millisecond out of their JavaScript, CSS, and even HTML. Read more here: https://codedependents.com/2014/01/27/11-best-practices-for-low-latency-systems/ Thank you, Amine Moulay Ramdane. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com. |
No comments:
Post a Comment