soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Data-Oriented Design and Avoiding the C++ Object-Oriented Programming Zimmer Frame - 25 Updates

Data-Oriented Design and Avoiding the C++ Object-Oriented Programming Zimmer Frame

David Brown <david.brown@hesbynett.no>: Aug 29 08:14AM +0200

On 29/08/18 00:02, Vir Campestris wrote:
> 8085 the IPC was around 2/3, and the speed was similar to a Z80. Many
> instructions took 4 clocks, quite a lot an extra 3, and not many any
> more than that.

4 clocks per instruction means an IPC of 0.25, not "around 2/3". And if
my memory serves, that was the minimum cycle count for an instruction on
the Z80A. Unlike some processors of the time (like the 6502), there was
no pipelining in the Z80A. The 4 clock instructions were for short
register-to-register instructions (it had quite a few registers), but
many instructions used prefixes, and memory reads and writes quickly
added to the times. On the other hand, it did handle a fair number of
16-bit instructions and had some powerful addressing modes, which was a
boost compared to other 8-bit devices of the era.

> 8 bit operands and no HW divide would also slow you down a bit!

HW divide is often overrated - division is rarely used, and in many
older designs the hardware division is slow and very costly in real
estate. (On one sub-family of the 68k, the hardware division
instruction was dropped when it was discovered that software routines
were faster!).

HW multiply is another matter - and the Z80A did not have hardware
multiplication.

Rosario19 <Ros@invalid.invalid>: Aug 29 09:23AM +0200

On Tue, 28 Aug 2018 15:49:33 +0100, Mr Flibble wrote:

>"I'd say, bone cancer in children?

because people use carcinogenic substances, first of all, the more
danger radiation alpha, beta, and gamma

in 90% of cases i think it is one human choice, below human
responsability

>that is not our fault. It's not right, it's utterly, utterly evil."
>"Why should I respect a capricious, mean-minded, stupid God who creates a
>world that is so full of injustice and pain. That's what I would say."

It's all a test, there is something to prove
the time is little here compared to Eternity

boltar@cylonHQ.com: Aug 29 08:25AM

On Tue, 28 Aug 2018 17:18:42 +0100

>> see many takers frankly.

>The troll returns. You can't see many takers because you literally have
>no clue as to how to use C++ properly as evidenced by your previous posts

And it would seem you have no clue about human nature. For most sorting
functions users already have to write the copy constructor and assignment
operator function along with the comparitor, now with your sort they have
to write the swapper too! Whats left - the core sorting algorithm. Big deal,
they might as well write that themselves as well! A quicksort and shell sort
are all of 20 lines of code max especially if you're not even going to bother
to explain exactly what 2 of the functions in your swapper actually do.
"Returns a sparse array" is not documentation.

>to this group. I wasn't suggesting that you should stop using OOP all
>together but that it is just one tool in toolbox and data-oriented design
>is making a comeback due to the nature of modern hardware. You obviously

Data oriented design never went away in people doing to the metal coding.
There's a good reason the core of most OS's and device drivers are written in C
and assembler, not C++.

boltar@cylonHQ.com: Aug 29 08:32AM

On Wed, 29 Aug 2018 08:14:46 +0200
>estate. (On one sub-family of the 68k, the hardware division
>instruction was dropped when it was discovered that software routines
>were faster!).

Sounds like they got the intern to write the microcode there. Software should
never be faster than hardware to do the same thing on the same CPU.

>HW multiply is another matter - and the Z80A did not have hardware
>multiplication.

Well, to be pedantic it had both hardware and software division so long as
you only wanted to do so in multiples of 2 :)

David Brown <david.brown@hesbynett.no>: Aug 29 10:52AM +0200

>> were faster!).

> Sounds like they got the intern to write the microcode there. Software should
> never be faster than hardware to do the same thing on the same CPU.

That sounds like you don't understand the nature of processor design.

It is unusual to have a situation like this, but it happens. The basic
order of events goes like this:

1. You have a design where everything takes quite a number of clock
cycles. (This was from the beginning of the 1980's, remember.)

2. Doing division in software is very slow, so you implement a hardware
divider. This is also slow - a fast divider would take inordinate
amount of space in an era where 40K transistor counts was big. But it
is a lot faster than using software routines.

3. New family members take advantage of newer technologies and larger
transistor counts - pipelining, wider buses and ALUs, caches and
buffers, all conspire to give you a tenfold or more increase in IPC
counts for the important common instructions. The division hardware
might get a revision improving its speed by 2 or 3 - but a fast divider
is still too big to justify.

4. At some point, the software is faster than the hardware divider. So
the hardware divider is dropped.

5. Later, it becomes practical to have a hardware divider again -
transistors are smaller, newer algorithms are available, and you can
make a hardware divider that is a good deal faster than the software.
For many big processors, you therefore have a fast hardware divider (but
still much slower than most operations). For the 68k descendents,
low-power and low-cost outweighed the benefits of a hardware divider
that was rarely used in real software.

You see the same thing in other complex functions. Big processors used
to have all sorts of transcendental floating point functions in hardware
- now they are often done in software because that gives a better
cost-benefit ratio.

>> multiplication.

> Well, to be pedantic it had both hardware and software division so long as
> you only wanted to do so in multiples of 2 :)

That is not what anyone means by hardware multiplier.

boltar@cylonHQ.com: Aug 29 09:03AM

On Wed, 29 Aug 2018 10:52:08 +0200

>> Sounds like they got the intern to write the microcode there. Software should

>> never be faster than hardware to do the same thing on the same CPU.

>That sounds like you don't understand the nature of processor design.

I think you missed the point that anything that can be done in software can
also be done in microcode on the die itself, otherwise you'd be claiming that
software can execute magic hardware functions that the hardware itself can't
access! Ergo, whoever wrote the microcode for the division royally fucked up.

>> Well, to be pedantic it had both hardware and software division so long as
>> you only wanted to do so in multiples of 2 :)

>That is not what anyone means by hardware multiplier.

*sigh* Yes, I know, hence the smiley. Did it need to be signposted?

But non rotational bit shifting is multiplying/dividing by 2 and was often used
as a short cut in assembler.

David Brown <david.brown@hesbynett.no>: Aug 29 11:20AM +0200

> also be done in microcode on the die itself, otherwise you'd be claiming that
> software can execute magic hardware functions that the hardware itself can't
> access! Ergo, whoever wrote the microcode for the division royally fucked up.

Again, it sounds like you don't understand the nature of processor design.

Modern processors do not use microcode for most instructions - many do
not have microcode at all. It is a /long/ time since cpu designs used
microcode for basic register-ALU-register instructions.

Software has access to features that the hardware blocks do not - unless
the hardware replicates such features. It can use the registers, the
ALU, loop accelerators, caches, multiple execution units, register
renames, and all the other smart hardware that makes cpus fast today. A
hardware block cannot access any of these - because they are in use by
the rest of the processor in parallel, and because the hardware
interlocks and multiplexers needed to allow their usage would greatly
affect the performance for all other code.

>>> you only wanted to do so in multiples of 2 :)

>> That is not what anyone means by hardware multiplier.

> *sigh* Yes, I know, hence the smiley. Did it need to be signposted?

Adding a smiley does not make an incorrect statement correct. If it had
been remotely funny, interesting, observant or novel, it would have
beenfine.

> But non rotational bit shifting is multiplying/dividing by 2 and was often used
> as a short cut in assembler.

Yes, I think everyone already knows that.

boltar@cylonHQ.com: Aug 29 09:43AM

On Wed, 29 Aug 2018 11:20:26 +0200
>> access! Ergo, whoever wrote the microcode for the division royally fucked up.

>Again, it sounds like you don't understand the nature of processor design.

>Modern processors do not use microcode for most instructions - many do

Either you're stupid or you're just being an ass for the sake of arguing.
Call it microcode, call it microops, its the same thing. What would you
call the risc type instructions an x86 instruction gets converted into in
intel processors for example?

>the rest of the processor in parallel, and because the hardware
>interlocks and multiplexers needed to allow their usage would greatly
>affect the performance for all other code.

Well that told me, clearly its impossible to implement fast division in
hardware then!

>>> That is not what anyone means by hardware multiplier.

>> *sigh* Yes, I know, hence the smiley. Did it need to be signposted?

>Adding a smiley does not make an incorrect statement correct. If it had

Except it wasn't incorrect was it.

>been remotely funny, interesting, observant or novel, it would have
>beenfine.

Do yourself a favour and pull that rod out of your backside. Unless you're
just another aspie robot who doesn't get tongue in cheek.

>used
>> as a short cut in assembler.

>Yes, I think everyone already knows that.

You just said it was incorrect, do try and make your mind up. Get back to
me when you've managed it.

Juha Nieminen <nospam@thanks.invalid>: Aug 29 10:16AM

> also be done in microcode on the die itself, otherwise you'd be claiming that
> software can execute magic hardware functions that the hardware itself can't
> access!

Hardware doesn't necessarily always implement the theoretically fastest
implementation of complex operations.

For example, multiplication (integer or floating point) can be done in one
single clock cycle, but that requires a very large amount of chip space
(because it requires a staggering amount of transistors). Making a
compromise where multiplication takes 2 or 3 clock cycles reduces this
physical chip area requirement exponentially.

> Ergo, whoever wrote the microcode for the division royally fucked up.

There may be other reasons why hardware implementation of something might
be slower than an alternative software implementation.

One reason might be accuracy, or the exact type of operations that need
to be performed in certain exceptional situations. Floating point
calculations can be an example of this (where eg. IEEE standard-compliant
calculations might require more complex operations than would be necessary
for the application at hand).

Another good example is the RDRAND opcode in newer Intel processors.
Depending on the processor, it can be 20 times slower than Mersenne
Twister. On the other hand, there are reasons why it's slower.

Juha Nieminen <nospam@thanks.invalid>: Aug 29 10:17AM

> Well that told me, clearly its impossible to implement fast division in
> hardware then!

Is it possible to implement fast *accurate* division in hardware?
Is the software implementation giving the exact same result in every
possible situation?

David Brown <david.brown@hesbynett.no>: Aug 29 12:51PM +0200

> Call it microcode, call it microops, its the same thing. What would you
> call the risc type instructions an x86 instruction gets converted into in
> intel processors for example?

Micro-ops. They are completely, totally and utterly different from
microcode.

It's fine that you don't know about this sort of thing. Few people do -
the details of cpu architecture are irrelevant to most C++ programmers.
If you want to know more about this, I am happy to explain what
microcode and micro-ops are. But please stop making wild assertions.

>> affect the performance for all other code.

> Well that told me, clearly its impossible to implement fast division in
> hardware then!

No, clearly it /is/ possible to implement fast division in hardware.
But it is not necessarily cost-effective to do so. There is a vast
difference between what is possible, and what is practical or sensible.

>>> *sigh* Yes, I know, hence the smiley. Did it need to be signposted?

>> Adding a smiley does not make an incorrect statement correct. If it had

> Except it wasn't incorrect was it.

Yes, it was. No matter what numbers I might want to multiply by or
divide by, the Z80A did not have hardware multiplication or hardware
division. Saying it can multiply and divide "in multiples of 2" does
not change that.

(It's not clear what you mean by "in multiples of 2". Perhaps you meant
"by powers of 2". It would still be questionable how much hardware
support the Z80A has for them, since it could only shift and rotate one
step at a time in a single instruction.)

>> beenfine.

> Do yourself a favour and pull that rod out of your backside. Unless you're
> just another aspie robot who doesn't get tongue in cheek.

Tongue in cheek is fine. But don't expect people to be particularly
impressed by your insults.

>> Yes, I think everyone already knows that.

> You just said it was incorrect, do try and make your mind up. Get back to
> me when you've managed it.

You use bit-shifts for multiplying or dividing by 2 (being particularly
careful with signs). That does not make a bit-shifter a multiplier or
divider.

boltar@cylonHQ.com: Aug 29 11:03AM

On Wed, 29 Aug 2018 12:51:20 +0200
>> intel processors for example?

>Micro-ops. They are completely, totally and utterly different from
>microcode.

No, they're really not. A assembler instruction is broken down into lower
level instructions that are directly interpreted by the hardware in both
cases. The only difference is micro ops are a slightly higher level than
microcode but the paradigm is exactly the same.

>> just another aspie robot who doesn't get tongue in cheek.

>Tongue in cheek is fine. But don't expect people to be particularly
>impressed by your insults.

Can dish it out but can't take it? The usual story on usenet.

>You use bit-shifts for multiplying or dividing by 2 (being particularly
>careful with signs). That does not make a bit-shifter a multiplier or
>divider.

There is no difference between multiplying by 2 and shifting 1 bit to the
left or dividing by 2 and shifting 1 bit to the right, other than some
processor specific carry or overflow flag settings afterwards. Argue the toss
all you want, its not up for debate.

David Brown <david.brown@hesbynett.no>: Aug 29 01:04PM +0200

On 29/08/18 12:17, Juha Nieminen wrote:

> Is it possible to implement fast *accurate* division in hardware?
> Is the software implementation giving the exact same result in every
> possible situation?

Yes.

There are many methods, with different balances between speed (both
latency and throughput), die space, power requirements, and complexity.
A single-precision floating point divider can be faster than a
double-precision divider, but is less accurate in absolute terms -
however, it will still be accurate for the resolution of the numbers
provided.

And yes, these will give exactly the same results as a matching software
implementation in every possible situation. For integer division, it's
easy. For floating point, with rounding, it gets a bit more complicated
- but it is all precisely specified in the IEEE standards (for floating
point hardware following those standards - as most do).

There are other floating point operations and combinations of operations
that are much worse. These can need quite complicated software
libraries to get exactly matching results - libraries optimised for this
matching rather than for speed. gcc (and probably other compilers) use
such libraries so that they can do compile-time calculations that are
bit-perfect results of the equivalent run-time calculations, even when
the compiler and the target are different processors.

David Brown <david.brown@hesbynett.no>: Aug 29 01:08PM +0200

On 29/08/18 12:16, Juha Nieminen wrote:
>> access!

> Hardware doesn't necessarily always implement the theoretically fastest
> implementation of complex operations.

No, indeed.

> (because it requires a staggering amount of transistors). Making a
> compromise where multiplication takes 2 or 3 clock cycles reduces this
> physical chip area requirement exponentially.

Exactly.

> calculations can be an example of this (where eg. IEEE standard-compliant
> calculations might require more complex operations than would be necessary
> for the application at hand).

It is not uncommon, especially in processors with smaller dies, to have
a hardware implementation for normal finite floating point operations,
but throw a trap to software emulation for NaNs, denormals, or other
more unusual values. That gives a good balance between speed for common
operations without undue costs for rare ones.

> Another good example is the RDRAND opcode in newer Intel processors.
> Depending on the processor, it can be 20 times slower than Mersenne
> Twister. On the other hand, there are reasons why it's slower.

Yes - the opcode /looks/ like it gives more "random" numbers than a
pseudo-random generator, but really it feeds out a sequence the NSA can
predict...

(No, I don't believe that.)

boltar@cylonHQ.com: Aug 29 11:25AM

On Wed, 29 Aug 2018 13:08:39 +0200

>Yes - the opcode /looks/ like it gives more "random" numbers than a
>pseudo-random generator, but really it feeds out a sequence the NSA can
>predict...

Surely it can't be too hard to implement a noise based truly random number
generator on a CPU by now?

David Brown <david.brown@hesbynett.no>: Aug 29 01:35PM +0200

> level instructions that are directly interpreted by the hardware in both
> cases. The only difference is micro ops are a slightly higher level than
> microcode but the paradigm is exactly the same.

Micro-ops are used in a processors with complex CISC ISAs with variable
lengths. The instructions in an ISA like the x86 are inherently
complicated, mixing address calculations, loads, operations, stores, and
register updates in the same instruction. This is painful to deal with
in a cpu with pipelining, multiple execution units, speculative
execution, etc. So the early stages of the instruction decode break
down an instruction from something equivalent to :

a += x[i++ + 4]

into

r0 = i
r1 = r0 + 4
r2 = x[r1]
r3 = a
r4 = r3 + r2
a = r4
r5 = r0 + 1
i = r6

Each of these is a RISC-style instruction with a single function, and
will be encoded in a straight-forward easy-to-parse format, so that the
rest of the processor looks like a RISC cpu. The details of the
micro-op "instruction set" are often independent of cpu implementation
details such as the number of execution units.

Microcode instructions are much lower level. They are not used in
modern processors (except, possibly, modern implementations of some old
designs). In a fully microcoded cpu, the source instructions are used
to call routines in microcode, stored in a ROM. The microcode
instructions themselves are very wide - they can be hundreds of bits
wide - with a very direct connection to the exact hardware. These
directly control things like the register-to-ALU multiplexers, latch
enables, gate inputs, and other details. Microcode is impractical for
normal operations on processors with multiple execution units.

Complicated modern processors - whether they are RISC ISA or use
micro-ops to have a RISC core - can have a kind of microcode. This is
basically routines stored in ROM (or flash or ram) for handling rare,
complex operations - or sometimes operations that have a single ISA
instruction but require multiple cycles (like a push/pop multiple
register operation). The instructions in this "microcode" are of the
same form as normal RISC (or micro-op) instructions, perhaps slightly
extended with access to a few internal registers.

>> Tongue in cheek is fine. But don't expect people to be particularly
>> impressed by your insults.

> Can dish it out but can't take it? The usual story on usenet.

I haven't insulted you, unless you count quoting your own words back at you.

> left or dividing by 2 and shifting 1 bit to the right, other than some
> processor specific carry or overflow flag settings afterwards. Argue the toss
> all you want, its not up for debate.

No one is debating that. But suggesting that having a bit shift means
you have a multiplier and a divider is not up for debate either - it was
nonsense when you said it, and nonsense it remains. Smiley or no smiley.

David Brown <david.brown@hesbynett.no>: Aug 29 01:46PM +0200

>> predict...

> Surely it can't be too hard to implement a noise based truly random number
> generator on a CPU by now?

"True" random number generators have been designed using many different
principles. I think thermal noise over a reverse biased diode is one
common method. Another is the interaction of unsynchronised oscillators.

These can be good sources of entropy, but are not necessarily a good
source of random numbers. Random numbers need a known distribution -
typically, you want to start with a nice linear distribution and then
shape it according to application needs. You also need your random data
at a fast enough rate, again according to application need. A typical
method of linearising or "whitening" your entropy source is to use it to
seed a pseudo-random generator - such as a Mersenne twister.

When I say opcodes like RDRAND "look" more random, what I mean is that
people often think such hardware sources are somehow more "random" than
a purely software solution. In real usage, however, when people want
"random" numbers they usually want one or both of two things - an
unpredictable sequence (i.e., after rolling 2, 3, then 4, you can't tell
what the next roll will be, nor can you guess what the roll before was),
and a smooth distribution (i.e., rolling 1, 1, 1 should be as common as
rolling 5, 2, 4). You can get the unpredictable sequences quite happily
with a good pseudo-random generator regularly re-seeded from network
traffic timings or other entropy sources - and these are often more
linearly distributed than hardware sources.

scott@slp53.sl.home (Scott Lurndal): Aug 29 01:10PM

>also be done in microcode on the die itself, otherwise you'd be claiming that
>software can execute magic hardware functions that the hardware itself can't
>access! Ergo, whoever wrote the microcode for the division royally fucked up.

What modern processor is microcoded? X86/AMD64 has a very small bit of microcode to
handle certain non-performance related management functions, but the math instructions are
all implemented in gates.

Our 64-core ARM64 processor has no microcode.

scott@slp53.sl.home (Scott Lurndal): Aug 29 01:12PM

>Call it microcode, call it microops, its the same thing. What would you
>call the risc type instructions an x86 instruction gets converted into in
>intel processors for example?

Instruction fission is in no way microcode, nor is instruction
fusion (i.e. combining adjacent instructions (e.g. test conditional branch)
in fetch stage).

scott@slp53.sl.home (Scott Lurndal): Aug 29 01:15PM

>> Well that told me, clearly its impossible to implement fast division in
>> hardware then!

>Is it possible to implement fast *accurate* division in hardware?

Of course it is. It's been possible for a couple of decades, with
very low latency (< 5 cycles for many processors).

scott@slp53.sl.home (Scott Lurndal): Aug 29 01:20PM

>level instructions that are directly interpreted by the hardware in both
>cases. The only difference is micro ops are a slightly higher level than
>microcode but the paradigm is exactly the same.

That's a layman's description suitable for laymen. It's not what
actually happens in the processor, however.

The processor, when it fetches certain instructions, will either
pass it directly to the execution pipeline engines (subject to
dependency analysis), or will fission it into multiple operations
that can be executed in parallel by multiple engines in the pipeline,
or will fuse multiple instructions into a single operation that can
be executed by one of the pipeline engines). None of this is controlled
by any form of programmable microcode - it's implemented directly
in gates.

Juha Nieminen <nospam@thanks.invalid>: Aug 29 01:37PM

> When I say opcodes like RDRAND "look" more random, what I mean is that
> people often think such hardware sources are somehow more "random" than
> a purely software solution.

I suppose that it comes down to whether the stream of random numbers
is deterministic (and completely predictable given the initial conditions),
or whether they numbers are impossible to predict, no matter what
information you have.

Any cryptographically strong PRNG is completely indistinguishable from
a "true" source of randomness, using almost any form of measurement you
may conjure. (Given two very large streams of numbers produced by
both methods, it's impossible to tell for certain which one was
generated with a software PRNG and which one is from a "true" source
of randomness.)

However, as said, I suppose people instinctively object to the notion
that PRNGs always produce the same results when the initial conditions
are the same, and thus think of it as "less random".

David Brown <david.brown@hesbynett.no>: Aug 29 03:45PM +0200

On 29/08/18 15:37, Juha Nieminen wrote:
> is deterministic (and completely predictable given the initial conditions),
> or whether they numbers are impossible to predict, no matter what
> information you have.

Yes, that is a difference. Pseudo-random generators are deterministic,
but (if they are good algorithms and wide enough numbers) unpredictable
unless you know the seed numbers. True random sources can't be
predicted at all. But as long as the seed numbers are kept safe (or
change with real entropy), there is no way to distinguish the two.

> both methods, it's impossible to tell for certain which one was
> generated with a software PRNG and which one is from a "true" source
> of randomness.)

Exactly.

> However, as said, I suppose people instinctively object to the notion
> that PRNGs always produce the same results when the initial conditions
> are the same, and thus think of it as "less random".

Yes.

(There are other newsgroups where there are people vastly more versed in
randomness and cryptography than me, if you want to know more or discuss
more.)

boltar@cylonHQ.com: Aug 29 01:49PM

On Wed, 29 Aug 2018 13:35:20 +0200
>On 29/08/18 13:03, boltar@cylonHQ.com wrote:
>> Can dish it out but can't take it? The usual story on usenet.

>I haven't insulted you, unless you count quoting your own words back at you.

Being patronising is being insulting however much you'd like to pretend
otherwise.

>No one is debating that. But suggesting that having a bit shift means
>you have a multiplier and a divider is not up for debate either - it was
>nonsense when you said it, and nonsense it remains. Smiley or no smiley.

Ok, I think we've now established you really don't understand what tongue in
cheek actually means. Don't worry about.

boltar@cylonHQ.com: Aug 29 01:58PM

On Wed, 29 Aug 2018 13:37:07 -0000 (UTC)
>However, as said, I suppose people instinctively object to the notion
>that PRNGs always produce the same results when the initial conditions
>are the same, and thus think of it as "less random".

Thats because they're not random, they're chaotic - two entirely different
things. A chaotic system given the EXACT same starting parameters WILL produce
exactly the same outcome, though a tiny change in those parameters (different
seed) will produce an entirely different result. With a truly random system
start parameters are irrelevant, the sequence cannot be force repeated.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Wednesday, August 29, 2018

Digest for comp.lang.c++@googlegroups.com - 25 updates in 1 topic

No comments:

Blog Archive

About Me