Wednesday, June 1, 2016

Digest for comp.lang.c++@googlegroups.com - 13 updates in 2 topics

Lynn McGuire <lmc@winsim.com>: Jun 01 04:36PM -0500

"C++ Performance: Common Wisdoms and Common "Wisdoms""
http://ithare.com/c-performance-common-wisdoms-and-common-wisdoms/
 
"Author: "No Bugs" Hare"
"Job Title: Sarcastic Architect"
"Hobbies: Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek"
 
I have got to admit, I am very guilty of premature optimization.
 
Lynn
Jerry Stuckle <jstucklex@attglobal.net>: May 31 08:24PM -0400

On 5/31/2016 4:37 PM, Ian Collins wrote:
> well as fiber. Intel 10/40 GbE chips are becoming mainstream on Xeon
> server boards. I use 10 GbE copper in my home network which will
> happily run at full speed.
 
And you didn't answer the question, either. How like a troll.
 
>> data concurrently at full speed.
 
> They can, common 8 port SAS/SATA controllers use 8xPCIe lanes which
> provide ample bandwidth for 8 or 16 SATA drives.
 
Try again. Only controller can access the data bus at a time. 16 SATAs
cannot transfer data concurrently at full speed.
 
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
Ian Collins <ian-news@hotmail.com>: Jun 01 06:04PM +1200

On 06/ 1/16 12:24 PM, Jerry Stuckle wrote:
>> server boards. I use 10 GbE copper in my home network which will
>> happily run at full speed.
 
> And you didn't answer the question, either.
 
If you knew anything about the data centre space you would know about
100GbE networking.
 
 
>> They can, common 8 port SAS/SATA controllers use 8xPCIe lanes which
>> provide ample bandwidth for 8 or 16 SATA drives.
 
> Try again. Only controller can access the data bus at a time.
 
And controllers have 8, 16 and even 24 channels.
 
> 16 SATAs
> cannot transfer data concurrently at full speed.
 
Well mine do.
 
There's an esoteric concept in the storage world you probably haven't
encountered in your 90s world called "RAID".
 
--
Ian
Juha Nieminen <nospam@thanks.invalid>: Jun 01 06:12AM

>> project, be my guest. But don't be telling people that it's a problem
>> for *all* C++ programmers, because it isn't. That's just a big fat lie.
 
> Just because you don't doesn't mean it's not important.
 
Then switch to another language, if it's so imporant to you.
How hard is that to understand?
 
My point is: Only a small fraction of C++ programmers need to work on
codebases that large. It's not a reason to tell them all that C++ is
bad because some megaproject X takes a long time to compile. That's a
completely stupid complaint. It doesn't affect the majority of people.
 
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
scott@slp53.sl.home (Scott Lurndal): Jun 01 12:55PM

>> provide ample bandwidth for 8 or 16 SATA drives.
 
>Try again. Only controller can access the data bus at a time. 16 SATAs
>cannot transfer data concurrently at full speed.
 
The front-side bus has been obsolete for over a decade.
 
Modern hardware has no problem transferring from 16 SATA _controllers_
simultaneously.
legalize+jeeves@mail.xmission.com (Richard): Jun 01 04:34PM

[Please do not mail me a copy of your followup]
 
Even though I have Jerry in my KILL file, I am finding Ian's responses
quite enjoyable humor :).
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
Jerry Stuckle <jstucklex@attglobal.net>: Jun 01 12:58PM -0400

On 6/1/2016 2:12 AM, Juha Nieminen wrote:
 
>> Just because you don't doesn't mean it's not important.
 
> Then switch to another language, if it's so imporant to you.
> How hard is that to understand?
 
Why should I switch when it's not important to you?
 
> bad because some megaproject X takes a long time to compile. That's a
> completely stupid complaint. It doesn't affect the majority of people.
 
> --- news://freenews.netfront.net/ - complaints: news@netfront.net ---
 
Just because you don't have a problem with it doesn't mean other
programmers don't. And yes, there are many programmers who work on
larger projects and it affects them. It doesn't take long.
 
It's not my problem that the biggest program you ever wrote was "Hello
World".
 
 
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
Jerry Stuckle <jstucklex@attglobal.net>: Jun 01 01:11PM -0400

On 6/1/2016 2:04 AM, Ian Collins wrote:
 
>> And you didn't answer the question, either.
 
> If you knew anything about the data centre space you would know about
> 100GbE networking.
 
I know aboutdata center space. But that is limited to the data center.
How many places in the world have 100Gbs networking external to a data
center?
 
>>> provide ample bandwidth for 8 or 16 SATA drives.
 
>> Try again. Only controller can access the data bus at a time.
 
> And controllers have 8, 16 and even 24 channels.
 
Of which only one can access the data bus at a time.
 
>> 16 SATAs
>> cannot transfer data concurrently at full speed.
 
> Well mine do.
 
You only *think* yours do.
 
> There's an esoteric concept in the storage world you probably haven't
> encountered in your 90s world called "RAID".
 
That's right. And here's some esoteric concepts in the storage world
you probably haven't encountered in your 70's world:
 
Seek command
Get response
Read/write block()
Get response
 
Even on an SSD, the above takes time - several cycles to interpret and
process the commands and send the response. With a hard disk, even
RAID, it takes even longer. Now the data may already be in the RAID's
cache (in which case it will operate at speeds near to - but not quite
as fast as - an SSD), but even then the cache size is limited. And even
an 8 disk RAID can't keep up; eventually it will have to go to disk, and
even the fastest disks max out at around 175MB/s.
 
Plus, if the controller uses DMA to transfer to memory, it has to
interleave memory access with the processor. Find if the processor
isn't doing anything, but not so good if it is. And if it doesn't use
DMA, the processor has to perform the transfer.
 
You only THINK you are getting full speed. Even manufacturers admit the
data transfer rates they quote are are theoretical maximums.
 
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
Jerry Stuckle <jstucklex@attglobal.net>: Jun 01 01:13PM -0400

On 6/1/2016 8:55 AM, Scott Lurndal wrote:
 
> The front-side bus has been obsolete for over a decade.
 
> Modern hardware has no problem transferring from 16 SATA _controllers_
> simultaneously.
 
Try again. Only one can access the bus at a time. You do not have 16
separate address buses to the same memory. And even the memory chips
only have one address and one data bus.
 
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
scott@slp53.sl.home (Scott Lurndal): Jun 01 05:37PM


>Try again. Only one can access the bus at a time. You do not have 16
>separate address buses to the same memory. And even the memory chips
>only have one address and one data bus.
 
 
What bus? Everything is either connected point to point through a non-blocking
crossbar, or in the case of Intel processors, all elements sit on a pair
of rings. The bandwidth of the ring is sufficient to support all active
data sources and sinks active at the same time. That means all devices
can transfer data simulatanously to memory (and the cache hierarchy will snoop
the accesses and invalidate any cached data along the way). There are multiple
memory controllers and multiple dimms on the memory subsystem side, and the
sufficent bandwidth to support the crossbar and/or ring structures in
the unCore/RoC. Your key word for the day is "interleave".
scott@slp53.sl.home (Scott Lurndal): Jun 01 05:58PM

>Get response
>Read/write block()
>Get response
 
Actually, the 70's were the last point in time where anyone,
ever, did separate seeks and read/writes. They've been combined
into a single operation for 30+ years on every hardware type that
matters (IDE, SATA, SCSI, SAS, FC, NvME). Even in the 70's, they
were combined for everyone except IBM's CKD (Burroughs stopped using
discrete seeks in the late 60's).
 
 
>Even on an SSD, the above takes time - several cycles to interpret and
>process the commands and send the response.
 
Even on an SSD attached to a bog-standard SATA controller, that's not
true. One submits a FIS (Frame Information Structure) to the controller
that identifies the direction of transfer and the desired starting sector
on the device. The controller will DMA data to/from memory independently
of the CPU, even handling non-physically-contiguous memory regions. The
driver can queue many FIS's to the controller hardware. A simple port
multiplier with a handful of modern SSD's can saturate 6Gbps
third-generation SATA lanes.
 
Modern systems uses NVME controllers instead of SATA which has a new driver
interface that provides even lower overhead (64k queues, 64k entries deep)
and much better throughput with higher bandwidth.
 
https://en.wikipedia.org/wiki/NVM_Express
 
 
>interleave memory access with the processor. Find if the processor
>isn't doing anything, but not so good if it is. And if it doesn't use
>DMA, the processor has to perform the transfer.
 
You really don't understand processor design. Firstly, the caches
(and most systems see 90%+ hit rates) somewhat isolate the processors from
the memory subsystem. Secondly, the bandwidth to the memory subsystem
is designed from the start to be sufficient to support simultaneous access
from the cache subsystem (refills and evictions of dirty lines) and
the I/O subsystem hardware.
 
When we were building supercomputers, where memory bandwidth is king,
the requirement was something like 1 byte per flop, and we were able
to do that using QDR infiniband as the interconnect with modern memory
controllers.
 
We regularly measure the actual datarates to ensure that we meet the line
rate under load. Our 40Gbps interfaces get 36+Gbps for TCP packets
(the 64b/66b encoding and TCP headers eat some of the bandwidth), measured and sustained,
with full processor utilization (there are 48 cores).
Jerry Stuckle <jstucklex@attglobal.net>: Jun 01 04:08PM -0400

On 6/1/2016 1:58 PM, Scott Lurndal wrote:
> matters (IDE, SATA, SCSI, SAS, FC, NvME). Even in the 70's, they
> were combined for everyone except IBM's CKD (Burroughs stopped using
> discrete seeks in the late 60's).
 
Sometimes yes, sometimes no. Both exist, and are used.
 
> driver can queue many FIS's to the controller hardware. A simple port
> multiplier with a handful of modern SSD's can saturate 6Gbps
> third-generation SATA lanes.
 
That is correct. But it still takes several cycles to do so. The
response is not immediate. Just like most processor instructions take
multiple cycles.
 
> interface that provides even lower overhead (64k queues, 64k entries deep)
> and much better throughput with higher bandwidth.
 
> https://en.wikipedia.org/wiki/NVM_Express
 
Which does not change the facts.
 
> is designed from the start to be sufficient to support simultaneous access
> from the cache subsystem (refills and evictions of dirty lines) and
> the I/O subsystem hardware.
 
Oh, yes, I understand it all right. Your 90% hit rate is for
instruction retrieval, not data access. And even then, the caches must
be loaded, which requires memory access. And no memory system supports
concurrent access.
 
The maximum speed can occur in bursts - but even the disk manufacturers
admit it's only a theoretical maximum, not real-world.
 
> the requirement was something like 1 byte per flop, and we were able
> to do that using QDR infiniband as the interconnect with modern memory
> controllers.
 
1 byte per floating operation per second? That doesn't even make sense.
 
> rate under load. Our 40Gbps interfaces get 36+Gbps for TCP packets
> (the 64b/66b encoding and TCP headers eat some of the bandwidth), measured and sustained,
> with full processor utilization (there are 48 cores).
 
Yes, the packets may be 36Gbs, but the data rate is not. And it is not
the 40Gbs you previously claimed.
 
And for how long did you measure it? A few milliseconds?
 
And it still doesn't measure up to mainframe speeds.
 
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
Jerry Stuckle <jstucklex@attglobal.net>: Jun 01 04:22PM -0400

On 6/1/2016 1:37 PM, Scott Lurndal wrote:
> memory controllers and multiple dimms on the memory subsystem side, and the
> sufficent bandwidth to support the crossbar and/or ring structures in
> the unCore/RoC. Your key word for the day is "interleave".
 
Here's something simple for you to understand. It's not a complete
description, but I'm not going to write a novel on how memory works.
 
Memory chips have two sets of lines. They have address lines and they
have data lines. The address lines are used to access a particular
memory location, and the data lines are used to read or write to that
location.
 
Now, the memory chips in the computer are connected together. Some
address lines are directly in parallel, while others go through
additional circuitry to select the desired chip(s). But all chips share
at least some address lines. The same with data lines - chips can share
the same data lines, but only one chip is active at one time for any
data line. These are called the address and data buses, respectively.
 
Only one device (including the CPUs) can place an address on the bus at
any one time. Devices can interleave - but that means that all other
devices must wait while that device is accessing the memory. It doesn't
make any difference how many DIMMS and now many memory controllers there
are - the chip's design restricts access to one unit at a time.
 
So no, you cannot run all of your I/O concurrently at full speed - no
matter what you claim.
 
But then it's just what I would expect from you based on your earlier
claims.
 
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: