- Here is my new variants of Scalable RWLocks that are powerful.. - 1 Update
- Performance of unaligned memory-accesses - 19 Updates
- In the end, rason will come - 1 Update
- In the end, rason will come - 3 Updates
- Why can't I understand what coroutines are? - 1 Update
aminer68@gmail.com: Aug 07 11:18AM -0700 Hello, Here is my new variants of Scalable RWLocks that are powerful.. Author: Amine Moulay Ramdane Description: A fast, and scalable and starvation-free and fair and lightweight Multiple-Readers-Exclusive-Writer Lock called LW_RWLockX, the scalable LW_RWLockX does spin-wait, and also a fast and scalable and starvation-free and fair Multiple-Readers-Exclusive-Writer Lock called RWLockX, the scalable RWLockX doesn't spin-wait but uses my portable SemaMonitor and portable event objects , so it is energy efficient. The parameter of the constructors is the size of the array of the readers , so if the size of the array is equal to the number of parallel readers, so it will be scalable, but if the number of readers are greater than the size of the array , you will start to have contention, please look at the source code of my scalable algorithms to understand. I have used my following hash function to make my new variants of RWLocks scalable: --- function DJB2aHash(key:int64):uint64; var i: integer; key1:uint64; begin Result := 5381; for i := 1 to 8 do begin key1:=(key shr ((i-1)*8)) and $00000000000000ff; Result := ((Result shl 5) xor Result) xor key1; end; end; --- You can download them from: https://sites.google.com/site/scalable68/new-variants-of-scalable-rwlocks Thank you, Amine Moulay Ramdane. |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Aug 07 12:37PM On Wed, 2019-08-07, Bonita Montero wrote: > I just wrote a litte test that checks the performance of unaligned > memory-acesses on x86 / Win32. I've run this code on my Ryten 1800X: What's the point of the exercise, in a C++ context? Unaligned access in portable code is always the result of a programming error. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 02:47PM +0200 > What's the point of the exercise, in a C++ context? Unaligned access > in portable code is always the result of a programming error. Pure theory. All platforms support unaligned access either directly through the CPU or through trapping by the operating-system (very slow). |
David Brown <david.brown@hesbynett.no>: Aug 07 03:57PM +0200 On 07/08/2019 14:08, Bonita Montero wrote: > ... is not what you'd like to tell. You wanted to tell that the > operator is compiled in a way that the shifts and loads are bundled > in a single load. So I misunderstood you. I wrote what I intended to write, but you misunderstood. (That happens, sometimes, especially if when you have to work with a second language. It's no problem.) The compiler turns the shift-and-or code into optimised code using an unaligned access. I was surprised that MSVC could not do this optimisation - that compiler is often quite good at optimisations. |
David Brown <david.brown@hesbynett.no>: Aug 07 04:18PM +0200 On 07/08/2019 14:47, Bonita Montero wrote: >> in portable code is always the result of a programming error. > Pure theory. All platforms support unaligned access either directly > through the CPU or through trapping by the operating-system (very slow). No, they don't. Some cpus support direct unaligned accesses. For others, various things could happen. On big OS's, you are likely to get a trap or exception causing the OS to kill your program with a fault - I can't imagine why an OS would bother simulating the unaligned access. On embedded systems, unaligned access may lead to a bus fault of some sort, halting the system or causing a restart. And on some systems that I have used, unaligned access will silently give you muddled reads and corrupting writes. Unaligned access is always an error. Use code with shifts and masks, if you need it, or use memcpy. If your compiler isn't good enough to give you efficient enough code for your needs, get a better compiler. |
scott@slp53.sl.home (Scott Lurndal): Aug 07 02:46PM >Try using unaligned addresses with several threads. Try doing a LOCK >XADD on a location that straddles two cache lines, and is not aligned on >a line, vs one that is aligned on a cache line, and properly padded. Processor vendors work hard so that most unaligned accesses don't add significant additional latencies to the instructions. Our ARM64 processor generally has no perf difference between aligned and unaligned to DRAM (unaligned isn't supported to device memory). Locked transactions on intel systems that straddle cache lines need to assert a system bus lock, which causes extreme performance degradation, particularly in NUMA systems. Don't do that. |
scott@slp53.sl.home (Scott Lurndal): Aug 07 02:47PM >for performantly mofifying datastructure for persistence or >transmission over the network. So this is clearly an unique >advantage of the Intel-Architecture. No, it's not unique. See AArch64. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 04:53PM +0200 For portability-reasons almost any system tries to be comatible to x86-systems for unaligned accesses. But on some exotic systems which don't run a common operating system or none at all you might be right. Unaligned accesses are simply useful for data-structures which are sent over the network or persisted on disk to save the padding bytes. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 04:55PM +0200 >> transmission over the network. So this is clearly an unique >> advantage of the Intel-Architecture. > No, it's not unique. See AArch64. Ok, except for atomicity: i.e. loads / stors aren't atomic and atomic RMW-instruction will alway fault. |
scott@slp53.sl.home (Scott Lurndal): Aug 07 03:11PM >> No, it's not unique. See AArch64. >Ok, except for atomicity: i.e. loads / stors aren't atomic and >atomic RMW-instruction will alway fault. B2.2: Atomicity is a feature of memory accesses, described as atomic accesses. The Arm architecture description refers to two types of atomicity, single-copy atomicity and multi-copy atomicity. In the Armv8 architecture, the atomicity requirements for memory accesses depend on the memory type, and whether the access is explicit or implicit. For more information, see: B2.2.1 Requirements for single-copy atomicity If ARMv8.4-LSE is implemented, all loads and stores are single-copy atomic when the following conditions are true: · Accesses are unaligned to their data size but are aligned within a 16-byte quantity that is aligned to 16 bytes. · Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory. Otherwise it is IMPLEMENTATION DEFINED whether loads and stores are single-copy atomic. |
David Brown <david.brown@hesbynett.no>: Aug 07 05:30PM +0200 /Please/ learn to use Usenet properly! Keep attributions, and quote an appropriate amount of context! On 07/08/2019 16:53, Bonita Montero wrote: > For portability-reasons almost any system tries to be comatible to > x86-systems for unaligned accesses. Total nonsense. There is rarely any reason for wanting unaligned access, and no justification for using it in code that should be portable. (An implementation can use it under the hood, when implementing memcpy, or for the kind of optimisations I showed gcc doing. But you don't write unaligned accesses in the source code.) Other cpu designs do not attempt to copy the x86. The great majority of code that is written that is reliant on a processor working like an x86 is written for Windows, and does not need to be portable to anything other than x86. The rest of the programming world is mostly either aimed at reasonable portability, such as across different *nix systems, or targeted at smaller embedded systems. For portable code, you don't care about unaligned accesses because you don't use them in the source code - you only care that the implementation handles your source code efficiently. Many cpus implement unaligned accesses - because the designers think the balance between use and cost makes it appropriate. It is /not/ for compatibility with x86. And for processors that don't support unaligned access in hardware, no one would bother supporting it by software emulation except if it were required for /binary/ compatibility with other processors in the same family. I don't know of any systems where that applies. > But on some exotic systems which > don't run a common operating system or none at all you might be right. These "exotic" systems far and away outnumber the PC's of this world. The programming world does not revolve around x86. > Unaligned accesses are simply useful for data-structures which are > sent over the network or persisted on disk to save the padding bytes. Nonsense. Proper programming is useful for data structures that are sent over the network or are stored on files. Use portable coding, or implementation dependent coding (like "packed" structs), and let the compiler use whatever instructions are supported and most efficient for the platform. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 05:39PM +0200 > There is rarely any reason for wanting unaligned access, and > no justification for using it in code that should be portable. I didn't say that it's always good style, but a lot of code relies on it so that most platforms support it through the CPU or the OS. So the code formerly written for x86 will also run without changes. > Many cpus implement unaligned accesses - because the designers think > the balance between use and cost makes it appropriate. It is /not/ > for compatibility with x86. I'll bet no one would have implemented this on other CPUs if there won't be a lot of code written on x86-machines that relies on this. > These "exotic" systems far and away outnumber the PC's of this world. Most of these trap unaligned acceses through the OS. > implementation dependent coding (like "packed" structs), and let the > compiler use whatever instructions are supported and most efficient > for the platform. That's your taste of proper programming, but not the facts. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 05:41PM +0200 >> atomic RMW-instruction will alway fault. > B2.2: > ... I mis-worded what I wanted to tell: I simply wanted to say that _unaligned_ loads / stores are not atomic on ARM. |
scott@slp53.sl.home (Scott Lurndal): Aug 07 03:58PM >> ... >I mis-worded what I wanted to tell: I simply wanted to say that >_unaligned_ loads / stores are not atomic on ARM. But they are atomic on ARMv8.4. Which is what the text you elided showed. |
scott@slp53.sl.home (Scott Lurndal): Aug 07 04:07PM >/Please/ learn to use Usenet properly! Keep attributions, and quote an >appropriate amount of context! Good luck with that, it's been tried before. >> For portability-reasons almost any system tries to be comatible to >> x86-systems for unaligned accesses. >Other cpu designs do not attempt to copy the x86. Certainly not the instruction set, unless you consider AMD, Cyrix, Nat Semi, Harris, IBM, TI or Transmeta :-) On the other hand, any processor vendor attempting to make a competing server processor will have to accomodate the standard programming methods used on Intel processors if they want to gain any market share since most of the software would be ported from X86. That means things like unaligned accesses and providing something that looks like the intel strongly (program) ordered memory model are high on the desirable feature list. AArch64 was specifically designed to be a competing server processor and thus supports unaligned accesses. The memory model is a bit weaker but generally provides program ordering; a small percentage of software ported from x86 may require some changes (unless it uses the appropriate C11 or C++14 capabilities). >Many cpus implement unaligned accesses - because the designers think the >balance between use and cost makes it appropriate. It is /not/ for >compatibility with x86. For ARMv8 it was _specifically_ for compatibility with X86(_64). |
David Brown <david.brown@hesbynett.no>: Aug 07 06:12PM +0200 On 07/08/2019 17:39, Bonita Montero wrote: >> no justification for using it in code that should be portable. > I didn't say that it's always good style, but a lot of code relies > on it so that most platforms support it through the CPU or the OS. Any code that relies on this is very badly written. It may be that it is common in the Windows world, where it is clear that many people have quite a poor knowledge of legal C and C++, and little concept or interest in writing clear, safe, and portable code. And it may be that MSVC, knowing that its users are often unaware of the details of their programming languages, is dumbed down to support such broken code. Attempts to use unaligned access on other compilers may fail in unexpected ways due to undefined behaviour. > So the code formerly written for x86 will also run without changes. Again, nonsense. The x86 is quite a "programmer friendly" ISA. It supports unaligned accesses, it has a strong memory model, it has support for many types of atomic operations. People do write code that is dependent on these features, and also dependent on compilers that have extra semantics to support non-portable coding (such as guaranteeing wrapping on signed integer overflow). Code that is written "assuming an x86 processor" will often have strange breakages on other platforms and other compilers - because the code is not portable C or C++. Other processors do not copy these x86 features. These kind of features can often be very expensive to implement (in terms of die size, power consumption, speed, etc.) and only exist in the x86 world because of backwards compatibility with the kind of badly written code that exists in the Windows world. >> for compatibility with x86. > I'll bet no one would have implemented this on other CPUs if there > won't be a lot of code written on x86-machines that relies on this. Bet whatever you like. But don't quit your day job. >> These "exotic" systems far and away outnumber the PC's of this world. > Most of these trap unaligned acceses through the OS. Can you give any kind of a reference for even a single case where you know the OS will trap unaligned accesses and emulate them in software? If not, then I think we can dispense with the fantasy that OS's provide support for unaligned access when the cpu does not. >> compiler use whatever instructions are supported and most efficient >> for the platform. > That's your taste of proper programming, but not the facts. Quote the section in the C++ standards that says unaligned access is allowed in C++, and I'll believe you. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 06:33PM +0200 > Any code that relies on this is very badly written. Depends on which platforms you target. > It may be that it is common in the Windows world, where it is clear > that many people have quite a poor knowledge of legal C and C++, Unaligned accesses come not by doing straight coding. You must code with special alignment-directives or with pointer-casting. So these developers that use unalignes acesses know what they're doing and they know the target-platforms. > integer overflow). Code that is written "assuming an x86 processor" > will often have strange breakages on other platforms and other compilers > - because the code is not portable C or C++. Maybe it will break because of other features; but the unaligned themselfes accessses are mostly de-facto-portable to the target -platforms. > Can you give any kind of a reference for even a single case where you > know the OS will trap unaligned accesses and emulate them in software? It's just a tiny task to support this by an OS and helps to run a lot of old code; so this is very likely. >> That's your taste of proper programming, but not the facts. > Quote the section in the C++ standards that says unaligned access is > allowed in C++, and I'll believe you. My statement was related to your taste of proper programming and not to the standard. You're simply one of those compulsive and intolerant programmers. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 06:38PM +0200 > provides program ordering; a small percentage of software ported from x86 > may require some changes (unless it uses the appropriate C11 or C++14 > capabilities). I don't think that ARM is cosiderring AArch64-implementations as a server-competitor. It's just convenient to have unaligned loads / stores for persistence and network-transfers. |
scott@slp53.sl.home (Scott Lurndal): Aug 07 05:34PM >> capabilities). >I don't think that ARM is cosiderring AArch64-implementations as a >server-competitor. It doesn't matter what you think. You can't even be troubled to properly attribute your posts. Aarch64 was specifically designed as a server-capable processor. I was there. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 07 08:07PM +0200 > properly attribute your posts. > Aarch64 was specifically designed as a server-capable processor. > I was there. All attempts to establish Aarch64-based machines as servers failed. F.e. this here: https://en.wikipedia.org/wiki/Calxeda The people simply want x86 and in rare cases SPARC or POWER. And I doubt that Aarch64 was designed mainly with servers in mind. A 64 bit address-space has advantages even on smartphones with <= 4GB RAM. |
ram@zedat.fu-berlin.de (Stefan Ram): Aug 07 04:52PM > x = a/b >versus > x = a//b In VBA it's x = a/b vs x = a\b which has the elegance of using the same symbols as used in mathematics. (A backslash-like symbol, in mathematics, is used for set difference, quotient group, and integral division.) |
Manfred <noname@add.invalid>: Aug 07 04:05PM +0200 Premise: I agree with you on most of your arguments about overflow. I also note that the authors do explicitly define wrapping for signed integers (section 2, [basic.fundamental]), but also explicitly leave out of scope (section 3) /compiler/ behaviour with wrapping on signed integers, which makes the proposal inherently flawed, I believe. It sounds like they know there is a problem with the matter, yet they want signed integer wrapping as defined behaviour, but do not want to address the consequences of this decision. On 8/6/2019 2:32 PM, David Brown wrote: > /types/ for which overflow is defined (unsigned types) and types for > which it is not defined. But overflow behaviour should be part of the > operations, not the types. Here I tend to disagree, or at least I am fine with the behaviour being attached to the types. You are probably recalling that in ASM overflow is part of the instruction, but I think this is because the ASM type system is obviously much more rudimentary than in higher level languages. Programming languages are meant to translate human logic into machine instructions, and we are used to operations that behave differently depending on the type they are performed on - see e.g. addition on real and complex numbers. From this perspective, in binary arithmetic it does make sense that addition behaves differently for signed and unsigned integers. On the other hand, having some sort of "+" and "ǂ" would complicate the syntax, and be overly redundant too: you would have to specify the behavior of non-wrapping addition on unsigned integers, and wrapping addition on signed integers as well - this would bring more confusion than help, IMHO. |
David Brown <david.brown@hesbynett.no>: Aug 07 05:00PM +0200 On 07/08/2019 16:05, Manfred wrote: > It sounds like they know there is a problem with the matter, yet they > want signed integer wrapping as defined behaviour, but do not want to > address the consequences of this decision. Yes. But the link was to the first draft of the proposal - I made another post with a link to a later version, where the idea of defining signed overflow is dropped. >> operations, not the types. > Here I tend to disagree, or at least I am fine with the behaviour being > attached to the types. I think it would be unpractical, in general, to have overflow behaviour attached to operations rather than types - but it is the operations that have the behaviour. It would be entirely possible to put together classes and some operator overloads that would let you write things like : int a, b, c; c = a +wrapping+ b; c = a -saturating- b; and so on. But I suspect people would find that too verbose for most uses. Hence we have the current solution. > You are probably recalling that in ASM overflow is part of the > instruction, but I think this is because the ASM type system is > obviously much more rudimentary than in higher level languages. It was not what I was thinking of, no. (There are several reasons for assembly arithmetic operations working the way they do and having the flags they do, at least on some cpus.) > and complex numbers. > From this perspective, in binary arithmetic it does make sense that > addition behaves differently for signed and unsigned integers. Certainly some aspects of behaviour have to depend on the operand types and the result types. But the behaviour is not fully defined by them. In mathematics, when you divide two integers you can decide if you want the result rounded/truncated to an integer, or expressed as a rational, or perhaps as a real number. It is the operation that determines this, not the operand types. When you have two unsigned integers and subtract them, you could decide the result should be a signed integer rather than an unsigned integer - it is the operation that determines it. Your choice of wrapping, saturating, trapping, ignoring overflow, etc., is a matter of the operation, independent of the types. For practical reasons (which I am mostly happy with), C says that when the operands are unsigned types (after integer promotion) the operation is carried out as a wrapping operation, while for signed types (after promotion), overflow is UB. > behavior of non-wrapping addition on unsigned integers, and wrapping > addition on signed integers as well - this would bring more confusion > than help, IMHO. I agree that practicality forces the language to use types to determine the operations you get from +, -, etc. But the overflow behaviour is part of the operation, not the type. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 07 06:39PM +0200 On 07.08.2019 17:00, David Brown wrote: > I think it would be unpractical, in general, to have overflow behaviour > attached to operations rather than types - but it is the operations that > have the behaviour. Consider in Python, x = a/b versus x = a//b ... where the former is always, reliably, floating point division, and the latter is always, reliably, integer division. IMO that's nice. The C++ way with operator behavior influenced by types just trips up people. > c = a +wrapping+ b; > c = a -saturating- b; > and so on. For this I would consider the C# `checked` keyword. <url: https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/checked> There is AFAIK nothing corresponding to saturation, but by default integer arithmetic is modulo. Using the `checked` keyword like checked { c = a + b; } ... one specifies overflow checking, with a well defined exception, namely `System.OverflowException`, in case of overflow. The default in C++ is instead UB for overflow. So in C++ one could also have keyword `wrapping`. I.e, keywords/contexts `wrapping` and `checked`, plus maybe `unchecked` for the case where someone uses a compiler option to change default to `wrapping` or `checked`, but one really wants the possible optimization and efficiency of guaranteed UB. > But I suspect people would find that too verbose for most uses. Yeah, but it's like the old proof that doors are practically impossible, by envisioning one particular door that's obviously very impractical. For doors there is existence proof that they're not practically impossible. And ditto for type-independent reliable operator behavior, in particular the C# approach. > Hence we have the current solution. No, for sure. But more my opinion: it's more historical, that some decades ago the optimization possibilities one could give the compiler for stuff like this, did matter. Today there is existence proof, in particular of Java outperforming C++ in certain cases, that it not only does not matter but can be a directly counter-productive approach. [snip more] Cheers!, - Alf |
scott@slp53.sl.home (Scott Lurndal): Aug 07 02:41PM >> obtain, which is likely to require use of the CPU. So why not just use >> threads without AIO? >It is better to use either async or blocking I/O but not to mix. For the same file descriptor or for the program? why? Use whatever is necessary to provide the required functionality and performance. >difference there is if such translation from blocking to async I/O >was made by glibc or ourselves (other than less work for >ourselves when glibc did it)? What's wrong with using blocking I/O for some files and async I/O for others? (or even both on the same file, for that matter). |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment