David Brown <david.brown@hesbynett.no>: Aug 29 08:14AM +0200 On 29/08/18 00:02, Vir Campestris wrote: > 8085 the IPC was around 2/3, and the speed was similar to a Z80. Many > instructions took 4 clocks, quite a lot an extra 3, and not many any > more than that. 4 clocks per instruction means an IPC of 0.25, not "around 2/3". And if my memory serves, that was the minimum cycle count for an instruction on the Z80A. Unlike some processors of the time (like the 6502), there was no pipelining in the Z80A. The 4 clock instructions were for short register-to-register instructions (it had quite a few registers), but many instructions used prefixes, and memory reads and writes quickly added to the times. On the other hand, it did handle a fair number of 16-bit instructions and had some powerful addressing modes, which was a boost compared to other 8-bit devices of the era. > 8 bit operands and no HW divide would also slow you down a bit! HW divide is often overrated - division is rarely used, and in many older designs the hardware division is slow and very costly in real estate. (On one sub-family of the 68k, the hardware division instruction was dropped when it was discovered that software routines were faster!). HW multiply is another matter - and the Z80A did not have hardware multiplication. |
Rosario19 <Ros@invalid.invalid>: Aug 29 09:23AM +0200 On Tue, 28 Aug 2018 15:49:33 +0100, Mr Flibble wrote: >"I'd say, bone cancer in children? because people use carcinogenic substances, first of all, the more danger radiation alpha, beta, and gamma in 90% of cases i think it is one human choice, below human responsability >that is not our fault. It's not right, it's utterly, utterly evil." >"Why should I respect a capricious, mean-minded, stupid God who creates a >world that is so full of injustice and pain. That's what I would say." It's all a test, there is something to prove the time is little here compared to Eternity |
boltar@cylonHQ.com: Aug 29 08:25AM On Tue, 28 Aug 2018 17:18:42 +0100 >> see many takers frankly. >The troll returns. You can't see many takers because you literally have >no clue as to how to use C++ properly as evidenced by your previous posts And it would seem you have no clue about human nature. For most sorting functions users already have to write the copy constructor and assignment operator function along with the comparitor, now with your sort they have to write the swapper too! Whats left - the core sorting algorithm. Big deal, they might as well write that themselves as well! A quicksort and shell sort are all of 20 lines of code max especially if you're not even going to bother to explain exactly what 2 of the functions in your swapper actually do. "Returns a sparse array" is not documentation. >to this group. I wasn't suggesting that you should stop using OOP all >together but that it is just one tool in toolbox and data-oriented design >is making a comeback due to the nature of modern hardware. You obviously Data oriented design never went away in people doing to the metal coding. There's a good reason the core of most OS's and device drivers are written in C and assembler, not C++. |
boltar@cylonHQ.com: Aug 29 08:32AM On Wed, 29 Aug 2018 08:14:46 +0200 >estate. (On one sub-family of the 68k, the hardware division >instruction was dropped when it was discovered that software routines >were faster!). Sounds like they got the intern to write the microcode there. Software should never be faster than hardware to do the same thing on the same CPU. >HW multiply is another matter - and the Z80A did not have hardware >multiplication. Well, to be pedantic it had both hardware and software division so long as you only wanted to do so in multiples of 2 :) |
David Brown <david.brown@hesbynett.no>: Aug 29 10:52AM +0200 >> were faster!). > Sounds like they got the intern to write the microcode there. Software should > never be faster than hardware to do the same thing on the same CPU. That sounds like you don't understand the nature of processor design. It is unusual to have a situation like this, but it happens. The basic order of events goes like this: 1. You have a design where everything takes quite a number of clock cycles. (This was from the beginning of the 1980's, remember.) 2. Doing division in software is very slow, so you implement a hardware divider. This is also slow - a fast divider would take inordinate amount of space in an era where 40K transistor counts was big. But it is a lot faster than using software routines. 3. New family members take advantage of newer technologies and larger transistor counts - pipelining, wider buses and ALUs, caches and buffers, all conspire to give you a tenfold or more increase in IPC counts for the important common instructions. The division hardware might get a revision improving its speed by 2 or 3 - but a fast divider is still too big to justify. 4. At some point, the software is faster than the hardware divider. So the hardware divider is dropped. 5. Later, it becomes practical to have a hardware divider again - transistors are smaller, newer algorithms are available, and you can make a hardware divider that is a good deal faster than the software. For many big processors, you therefore have a fast hardware divider (but still much slower than most operations). For the 68k descendents, low-power and low-cost outweighed the benefits of a hardware divider that was rarely used in real software. You see the same thing in other complex functions. Big processors used to have all sorts of transcendental floating point functions in hardware - now they are often done in software because that gives a better cost-benefit ratio. >> multiplication. > Well, to be pedantic it had both hardware and software division so long as > you only wanted to do so in multiples of 2 :) That is not what anyone means by hardware multiplier. |
boltar@cylonHQ.com: Aug 29 09:03AM On Wed, 29 Aug 2018 10:52:08 +0200 >> Sounds like they got the intern to write the microcode there. Software should >> never be faster than hardware to do the same thing on the same CPU. >That sounds like you don't understand the nature of processor design. I think you missed the point that anything that can be done in software can also be done in microcode on the die itself, otherwise you'd be claiming that software can execute magic hardware functions that the hardware itself can't access! Ergo, whoever wrote the microcode for the division royally fucked up. >> Well, to be pedantic it had both hardware and software division so long as >> you only wanted to do so in multiples of 2 :) >That is not what anyone means by hardware multiplier. *sigh* Yes, I know, hence the smiley. Did it need to be signposted? But non rotational bit shifting is multiplying/dividing by 2 and was often used as a short cut in assembler. |
David Brown <david.brown@hesbynett.no>: Aug 29 11:20AM +0200 > also be done in microcode on the die itself, otherwise you'd be claiming that > software can execute magic hardware functions that the hardware itself can't > access! Ergo, whoever wrote the microcode for the division royally fucked up. Again, it sounds like you don't understand the nature of processor design. Modern processors do not use microcode for most instructions - many do not have microcode at all. It is a /long/ time since cpu designs used microcode for basic register-ALU-register instructions. Software has access to features that the hardware blocks do not - unless the hardware replicates such features. It can use the registers, the ALU, loop accelerators, caches, multiple execution units, register renames, and all the other smart hardware that makes cpus fast today. A hardware block cannot access any of these - because they are in use by the rest of the processor in parallel, and because the hardware interlocks and multiplexers needed to allow their usage would greatly affect the performance for all other code. >>> you only wanted to do so in multiples of 2 :) >> That is not what anyone means by hardware multiplier. > *sigh* Yes, I know, hence the smiley. Did it need to be signposted? Adding a smiley does not make an incorrect statement correct. If it had been remotely funny, interesting, observant or novel, it would have beenfine. > But non rotational bit shifting is multiplying/dividing by 2 and was often used > as a short cut in assembler. Yes, I think everyone already knows that. |
boltar@cylonHQ.com: Aug 29 09:43AM On Wed, 29 Aug 2018 11:20:26 +0200 >> access! Ergo, whoever wrote the microcode for the division royally fucked up. >Again, it sounds like you don't understand the nature of processor design. >Modern processors do not use microcode for most instructions - many do Either you're stupid or you're just being an ass for the sake of arguing. Call it microcode, call it microops, its the same thing. What would you call the risc type instructions an x86 instruction gets converted into in intel processors for example? >the rest of the processor in parallel, and because the hardware >interlocks and multiplexers needed to allow their usage would greatly >affect the performance for all other code. Well that told me, clearly its impossible to implement fast division in hardware then! >>> That is not what anyone means by hardware multiplier. >> *sigh* Yes, I know, hence the smiley. Did it need to be signposted? >Adding a smiley does not make an incorrect statement correct. If it had Except it wasn't incorrect was it. >been remotely funny, interesting, observant or novel, it would have >beenfine. Do yourself a favour and pull that rod out of your backside. Unless you're just another aspie robot who doesn't get tongue in cheek. >used >> as a short cut in assembler. >Yes, I think everyone already knows that. You just said it was incorrect, do try and make your mind up. Get back to me when you've managed it. |
Juha Nieminen <nospam@thanks.invalid>: Aug 29 10:16AM > also be done in microcode on the die itself, otherwise you'd be claiming that > software can execute magic hardware functions that the hardware itself can't > access! Hardware doesn't necessarily always implement the theoretically fastest implementation of complex operations. For example, multiplication (integer or floating point) can be done in one single clock cycle, but that requires a very large amount of chip space (because it requires a staggering amount of transistors). Making a compromise where multiplication takes 2 or 3 clock cycles reduces this physical chip area requirement exponentially. > Ergo, whoever wrote the microcode for the division royally fucked up. There may be other reasons why hardware implementation of something might be slower than an alternative software implementation. One reason might be accuracy, or the exact type of operations that need to be performed in certain exceptional situations. Floating point calculations can be an example of this (where eg. IEEE standard-compliant calculations might require more complex operations than would be necessary for the application at hand). Another good example is the RDRAND opcode in newer Intel processors. Depending on the processor, it can be 20 times slower than Mersenne Twister. On the other hand, there are reasons why it's slower. |
Juha Nieminen <nospam@thanks.invalid>: Aug 29 10:17AM > Well that told me, clearly its impossible to implement fast division in > hardware then! Is it possible to implement fast *accurate* division in hardware? Is the software implementation giving the exact same result in every possible situation? |
David Brown <david.brown@hesbynett.no>: Aug 29 12:51PM +0200 > Call it microcode, call it microops, its the same thing. What would you > call the risc type instructions an x86 instruction gets converted into in > intel processors for example? Micro-ops. They are completely, totally and utterly different from microcode. It's fine that you don't know about this sort of thing. Few people do - the details of cpu architecture are irrelevant to most C++ programmers. If you want to know more about this, I am happy to explain what microcode and micro-ops are. But please stop making wild assertions. >> affect the performance for all other code. > Well that told me, clearly its impossible to implement fast division in > hardware then! No, clearly it /is/ possible to implement fast division in hardware. But it is not necessarily cost-effective to do so. There is a vast difference between what is possible, and what is practical or sensible. >>> *sigh* Yes, I know, hence the smiley. Did it need to be signposted? >> Adding a smiley does not make an incorrect statement correct. If it had > Except it wasn't incorrect was it. Yes, it was. No matter what numbers I might want to multiply by or divide by, the Z80A did not have hardware multiplication or hardware division. Saying it can multiply and divide "in multiples of 2" does not change that. (It's not clear what you mean by "in multiples of 2". Perhaps you meant "by powers of 2". It would still be questionable how much hardware support the Z80A has for them, since it could only shift and rotate one step at a time in a single instruction.) >> beenfine. > Do yourself a favour and pull that rod out of your backside. Unless you're > just another aspie robot who doesn't get tongue in cheek. Tongue in cheek is fine. But don't expect people to be particularly impressed by your insults. >> Yes, I think everyone already knows that. > You just said it was incorrect, do try and make your mind up. Get back to > me when you've managed it. You use bit-shifts for multiplying or dividing by 2 (being particularly careful with signs). That does not make a bit-shifter a multiplier or divider. |
boltar@cylonHQ.com: Aug 29 11:03AM On Wed, 29 Aug 2018 12:51:20 +0200 >> intel processors for example? >Micro-ops. They are completely, totally and utterly different from >microcode. No, they're really not. A assembler instruction is broken down into lower level instructions that are directly interpreted by the hardware in both cases. The only difference is micro ops are a slightly higher level than microcode but the paradigm is exactly the same. >> just another aspie robot who doesn't get tongue in cheek. >Tongue in cheek is fine. But don't expect people to be particularly >impressed by your insults. Can dish it out but can't take it? The usual story on usenet. >You use bit-shifts for multiplying or dividing by 2 (being particularly >careful with signs). That does not make a bit-shifter a multiplier or >divider. There is no difference between multiplying by 2 and shifting 1 bit to the left or dividing by 2 and shifting 1 bit to the right, other than some processor specific carry or overflow flag settings afterwards. Argue the toss all you want, its not up for debate. |
David Brown <david.brown@hesbynett.no>: Aug 29 01:04PM +0200 On 29/08/18 12:17, Juha Nieminen wrote: > Is it possible to implement fast *accurate* division in hardware? > Is the software implementation giving the exact same result in every > possible situation? Yes. There are many methods, with different balances between speed (both latency and throughput), die space, power requirements, and complexity. A single-precision floating point divider can be faster than a double-precision divider, but is less accurate in absolute terms - however, it will still be accurate for the resolution of the numbers provided. And yes, these will give exactly the same results as a matching software implementation in every possible situation. For integer division, it's easy. For floating point, with rounding, it gets a bit more complicated - but it is all precisely specified in the IEEE standards (for floating point hardware following those standards - as most do). There are other floating point operations and combinations of operations that are much worse. These can need quite complicated software libraries to get exactly matching results - libraries optimised for this matching rather than for speed. gcc (and probably other compilers) use such libraries so that they can do compile-time calculations that are bit-perfect results of the equivalent run-time calculations, even when the compiler and the target are different processors. |
David Brown <david.brown@hesbynett.no>: Aug 29 01:08PM +0200 On 29/08/18 12:16, Juha Nieminen wrote: >> access! > Hardware doesn't necessarily always implement the theoretically fastest > implementation of complex operations. No, indeed. > (because it requires a staggering amount of transistors). Making a > compromise where multiplication takes 2 or 3 clock cycles reduces this > physical chip area requirement exponentially. Exactly. > calculations can be an example of this (where eg. IEEE standard-compliant > calculations might require more complex operations than would be necessary > for the application at hand). It is not uncommon, especially in processors with smaller dies, to have a hardware implementation for normal finite floating point operations, but throw a trap to software emulation for NaNs, denormals, or other more unusual values. That gives a good balance between speed for common operations without undue costs for rare ones. > Another good example is the RDRAND opcode in newer Intel processors. > Depending on the processor, it can be 20 times slower than Mersenne > Twister. On the other hand, there are reasons why it's slower. Yes - the opcode /looks/ like it gives more "random" numbers than a pseudo-random generator, but really it feeds out a sequence the NSA can predict... (No, I don't believe that.) |
boltar@cylonHQ.com: Aug 29 11:25AM On Wed, 29 Aug 2018 13:08:39 +0200 >Yes - the opcode /looks/ like it gives more "random" numbers than a >pseudo-random generator, but really it feeds out a sequence the NSA can >predict... Surely it can't be too hard to implement a noise based truly random number generator on a CPU by now? |
David Brown <david.brown@hesbynett.no>: Aug 29 01:35PM +0200 > level instructions that are directly interpreted by the hardware in both > cases. The only difference is micro ops are a slightly higher level than > microcode but the paradigm is exactly the same. Micro-ops are used in a processors with complex CISC ISAs with variable lengths. The instructions in an ISA like the x86 are inherently complicated, mixing address calculations, loads, operations, stores, and register updates in the same instruction. This is painful to deal with in a cpu with pipelining, multiple execution units, speculative execution, etc. So the early stages of the instruction decode break down an instruction from something equivalent to : a += x[i++ + 4] into r0 = i r1 = r0 + 4 r2 = x[r1] r3 = a r4 = r3 + r2 a = r4 r5 = r0 + 1 i = r6 Each of these is a RISC-style instruction with a single function, and will be encoded in a straight-forward easy-to-parse format, so that the rest of the processor looks like a RISC cpu. The details of the micro-op "instruction set" are often independent of cpu implementation details such as the number of execution units. Microcode instructions are much lower level. They are not used in modern processors (except, possibly, modern implementations of some old designs). In a fully microcoded cpu, the source instructions are used to call routines in microcode, stored in a ROM. The microcode instructions themselves are very wide - they can be hundreds of bits wide - with a very direct connection to the exact hardware. These directly control things like the register-to-ALU multiplexers, latch enables, gate inputs, and other details. Microcode is impractical for normal operations on processors with multiple execution units. Complicated modern processors - whether they are RISC ISA or use micro-ops to have a RISC core - can have a kind of microcode. This is basically routines stored in ROM (or flash or ram) for handling rare, complex operations - or sometimes operations that have a single ISA instruction but require multiple cycles (like a push/pop multiple register operation). The instructions in this "microcode" are of the same form as normal RISC (or micro-op) instructions, perhaps slightly extended with access to a few internal registers. >> Tongue in cheek is fine. But don't expect people to be particularly >> impressed by your insults. > Can dish it out but can't take it? The usual story on usenet. I haven't insulted you, unless you count quoting your own words back at you. > left or dividing by 2 and shifting 1 bit to the right, other than some > processor specific carry or overflow flag settings afterwards. Argue the toss > all you want, its not up for debate. No one is debating that. But suggesting that having a bit shift means you have a multiplier and a divider is not up for debate either - it was nonsense when you said it, and nonsense it remains. Smiley or no smiley. |
David Brown <david.brown@hesbynett.no>: Aug 29 01:46PM +0200 >> predict... > Surely it can't be too hard to implement a noise based truly random number > generator on a CPU by now? "True" random number generators have been designed using many different principles. I think thermal noise over a reverse biased diode is one common method. Another is the interaction of unsynchronised oscillators. These can be good sources of entropy, but are not necessarily a good source of random numbers. Random numbers need a known distribution - typically, you want to start with a nice linear distribution and then shape it according to application needs. You also need your random data at a fast enough rate, again according to application need. A typical method of linearising or "whitening" your entropy source is to use it to seed a pseudo-random generator - such as a Mersenne twister. When I say opcodes like RDRAND "look" more random, what I mean is that people often think such hardware sources are somehow more "random" than a purely software solution. In real usage, however, when people want "random" numbers they usually want one or both of two things - an unpredictable sequence (i.e., after rolling 2, 3, then 4, you can't tell what the next roll will be, nor can you guess what the roll before was), and a smooth distribution (i.e., rolling 1, 1, 1 should be as common as rolling 5, 2, 4). You can get the unpredictable sequences quite happily with a good pseudo-random generator regularly re-seeded from network traffic timings or other entropy sources - and these are often more linearly distributed than hardware sources. |
scott@slp53.sl.home (Scott Lurndal): Aug 29 01:10PM >also be done in microcode on the die itself, otherwise you'd be claiming that >software can execute magic hardware functions that the hardware itself can't >access! Ergo, whoever wrote the microcode for the division royally fucked up. What modern processor is microcoded? X86/AMD64 has a very small bit of microcode to handle certain non-performance related management functions, but the math instructions are all implemented in gates. Our 64-core ARM64 processor has no microcode. |
scott@slp53.sl.home (Scott Lurndal): Aug 29 01:12PM >Call it microcode, call it microops, its the same thing. What would you >call the risc type instructions an x86 instruction gets converted into in >intel processors for example? Instruction fission is in no way microcode, nor is instruction fusion (i.e. combining adjacent instructions (e.g. test conditional branch) in fetch stage). |
scott@slp53.sl.home (Scott Lurndal): Aug 29 01:15PM >> Well that told me, clearly its impossible to implement fast division in >> hardware then! >Is it possible to implement fast *accurate* division in hardware? Of course it is. It's been possible for a couple of decades, with very low latency (< 5 cycles for many processors). |
scott@slp53.sl.home (Scott Lurndal): Aug 29 01:20PM >level instructions that are directly interpreted by the hardware in both >cases. The only difference is micro ops are a slightly higher level than >microcode but the paradigm is exactly the same. That's a layman's description suitable for laymen. It's not what actually happens in the processor, however. The processor, when it fetches certain instructions, will either pass it directly to the execution pipeline engines (subject to dependency analysis), or will fission it into multiple operations that can be executed in parallel by multiple engines in the pipeline, or will fuse multiple instructions into a single operation that can be executed by one of the pipeline engines). None of this is controlled by any form of programmable microcode - it's implemented directly in gates. |
Juha Nieminen <nospam@thanks.invalid>: Aug 29 01:37PM > When I say opcodes like RDRAND "look" more random, what I mean is that > people often think such hardware sources are somehow more "random" than > a purely software solution. I suppose that it comes down to whether the stream of random numbers is deterministic (and completely predictable given the initial conditions), or whether they numbers are impossible to predict, no matter what information you have. Any cryptographically strong PRNG is completely indistinguishable from a "true" source of randomness, using almost any form of measurement you may conjure. (Given two very large streams of numbers produced by both methods, it's impossible to tell for certain which one was generated with a software PRNG and which one is from a "true" source of randomness.) However, as said, I suppose people instinctively object to the notion that PRNGs always produce the same results when the initial conditions are the same, and thus think of it as "less random". |
David Brown <david.brown@hesbynett.no>: Aug 29 03:45PM +0200 On 29/08/18 15:37, Juha Nieminen wrote: > is deterministic (and completely predictable given the initial conditions), > or whether they numbers are impossible to predict, no matter what > information you have. Yes, that is a difference. Pseudo-random generators are deterministic, but (if they are good algorithms and wide enough numbers) unpredictable unless you know the seed numbers. True random sources can't be predicted at all. But as long as the seed numbers are kept safe (or change with real entropy), there is no way to distinguish the two. > both methods, it's impossible to tell for certain which one was > generated with a software PRNG and which one is from a "true" source > of randomness.) Exactly. > However, as said, I suppose people instinctively object to the notion > that PRNGs always produce the same results when the initial conditions > are the same, and thus think of it as "less random". Yes. (There are other newsgroups where there are people vastly more versed in randomness and cryptography than me, if you want to know more or discuss more.) |
boltar@cylonHQ.com: Aug 29 01:49PM On Wed, 29 Aug 2018 13:35:20 +0200 >On 29/08/18 13:03, boltar@cylonHQ.com wrote: >> Can dish it out but can't take it? The usual story on usenet. >I haven't insulted you, unless you count quoting your own words back at you. Being patronising is being insulting however much you'd like to pretend otherwise. >No one is debating that. But suggesting that having a bit shift means >you have a multiplier and a divider is not up for debate either - it was >nonsense when you said it, and nonsense it remains. Smiley or no smiley. Ok, I think we've now established you really don't understand what tongue in cheek actually means. Don't worry about. |
boltar@cylonHQ.com: Aug 29 01:58PM On Wed, 29 Aug 2018 13:37:07 -0000 (UTC) >However, as said, I suppose people instinctively object to the notion >that PRNGs always produce the same results when the initial conditions >are the same, and thus think of it as "less random". Thats because they're not random, they're chaotic - two entirely different things. A chaotic system given the EXACT same starting parameters WILL produce exactly the same outcome, though a tiny change in those parameters (different seed) will produce an entirely different result. With a truly random system start parameters are irrelevant, the sequence cannot be force repeated. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment