soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

I need a CPU core exclusively for one thread - 22 Updates
C++ error handling -> error is rare -> throw and forget - 2 Updates
Totally insane (Was: Cars (Was: Parsing multi-byte keystrokes)) - 1 Update

I need a CPU core exclusively for one thread

Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 09:02AM +0200

Am 06.07.2023 um 09:25 schrieb Bonita Montero:

> Even with a billion file descriptors the access time would
> be the same, i.e. O(1).

In theory, but actually there's some cacching effect if the
hashtable fits into the cache:

#include <iostream>
#include <unordered_map>
#include <random>
#include <chrono>

using namespace std;
using namespace chrono;

atomic<size_t> aSum( 0 );

int main()
{
constexpr size_t
TO = 0x100000000,
ROUNDS = 10'000'000;
unordered_map<size_t, size_t> map;
mt19937_64 mt;
for( size_t n = 1, b = 0; n <= TO; n *= 2, b = n )
{
for( size_t i = b; i != n; ++i )
map.emplace( i, i );
uniform_int_distribution<size_t> uid( 0, n - 1 );
size_t sum = 0;
auto start = high_resolution_clock::now();
for( size_t r = ROUNDS; r--; )
sum += map[uid( mt )];
double ns = duration_cast<nanoseconds>( high_resolution_clock::now() -
start ).count() / (double)ROUNDS;
::aSum.fetch_add( sum );
cout << hex << n << ": " << ns << endl;
}
}

This are the results on a AMD 7950X:

1: 6.82776
2: 6.82223
4: 6.82411
8: 6.82552
10: 6.82525
20: 6.82547
40: 6.82349
80: 6.84252
100: 6.84796
200: 10.5177
400: 9.12652
800: 12.9653
1000: 15.0953
2000: 16.1944
4000: 17.4853
8000: 11.3426
10000: 13.4169
20000: 14.9261
40000: 18.6082
80000: 31.3227
100000: 63.0259
200000: 87.1134
400000: 103.188
800000: 141.211
1000000: 200.26
2000000: 308.339
4000000: 178.041
8000000: 225.446
10000000: 514.778
20000000: 223.162
40000000: 230.186
80000000: 834.93
100000000: 202.915

The steps to get to the value are always the same.
But the access time of the memory largely differs
according to the size of the hashtable.

David Brown <david.brown@hesbynett.no>: Jul 07 10:03AM +0200

On 06/07/2023 20:50, Bonita Montero wrote:
>> That is well within the budget for a high-end server.

> And which server application on a server of any size can saturate
> a 100GbE-link ?

A file server with a couple of fast SSD's could do it. Remember, these
are 100 Gbps links, not 100 GBps.

If you have a 32 core server, that's an average 3 Gbps per core.

Usually, however, saturation of the link is not the issue - just like
cpu clock speeds, it is often the peaks that matter. You don't
(typically) buy a 100 Gb link because you want to send 36 TB over the
next hour, you buy it so that you can send 1 GB in a tenth of a second.

AMD's latest server chips have 128 cores, and Ampere One's have 192
cores - per socket. Do you think people with a 192 core cpu are going
to be happy with a 10 Gb link? And do you think people would be making
and selling such processors, and servers containing them, if they were
not useful?

Note that I said /high-end/ server. Most servers won't need anything
like that for their application sides. But if you have a cluster, and
storage external to the server, you'll easily find such speeds to be
useful. And plenty of systems in data centres, HPC systems, cloud
hosting, etc., will have links like that - or faster.

Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 11:01AM +0200

Am 07.07.2023 um 10:03 schrieb David Brown:

> A file server with a couple of fast SSD's could do it. ...

No one uses a fileserver with 12,5GB/s.

> If you have a 32 core server, that's an average 3 Gbps per core.

LOL. You have been bathed to hot by your mother.

Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 12:53PM +0200

Am 07.07.2023 um 10:03 schrieb David Brown:

> A file server with a couple of fast SSD's could do it. Remember, these
> are 100 Gbps links, not 100 GBps.

And one thing which just in my mind: file server's don't have any
additional load beyond I/O, so separating the flows wouldn't hurt.

scott@slp53.sl.home (Scott Lurndal): Jul 07 02:05PM

>Am 07.07.2023 um 10:03 schrieb David Brown:

>> A file server with a couple of fast SSD's could do it. ...

>No one uses a fileserver with 12,5GB/s.

Your computing experiences seem very limited.

Our lab fileservers (using 25gb, 40gb and 100gb network adapters)
serve hundreds of high-end multicore servers performing RTL simulations
24x7 using NFS.

<Die kindische Beleidigung verschwand>

Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 05:30PM +0200

Am 07.07.2023 um 16:05 schrieb Scott Lurndal:

> Our lab fileservers (using 25gb, 40gb and 100gb network adapters)
> serve hundreds of high-end multicore servers performing RTL simulations
> 24x7 using NFS.

I don't believe you at all, that's pure phantasy.
But fileservers don't have high CPU-load anyway.
So manual segementation wouldn't hurt, even more
because in a LAN-segment you have jumbo frames.

kalevi@kolttonen.fi (Kalevi Kolttonen): Jul 07 03:53PM

>> serve hundreds of high-end multicore servers performing RTL simulations
>> 24x7 using NFS.

> I don't believe you at all, that's pure phantasy.

Do you happen to have any good reasons why he would lie
about their lab?

br,
KK

Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 06:00PM +0200

Am 07.07.2023 um 17:53 schrieb Kalevi Kolttonen:

> Do you happen to have any good reasons why he would lie
> about their lab?

100GbE is a backbone-technology, maybe for linking switches
with further lower speed links. For which application would
you need a fileserver which can supply 12,5GB/s ?

David Brown <david.brown@hesbynett.no>: Jul 07 06:02PM +0200

On 07/07/2023 17:30, Bonita Montero wrote:
>> serve hundreds of high-end multicore servers performing RTL simulations
>> 24x7 using NFS.

> I don't believe you at all, that's pure phantasy.

Do you think Scott is deliberately lying here? That's quite the accusation.

It would appear you know practically nothing about servers or
networking, especially for more demanding uses. That's fine, of course
- no one knows about everything, and most people have little interest in
such things unless they actually need to know about them. However, it
is absurd to suggest that just because /you/ can't imagine what how such
systems might be used, they don't exist. And it is arrogant and
obnoxious to accuse those who /do/ know, and /do/ use such systems, of
lying about it.

kalevi@kolttonen.fi (Kalevi Kolttonen): Jul 07 04:14PM

> 100GbE is a backbone-technology, maybe for linking switches
> with further lower speed links. For which application would
> you need a fileserver which can supply 12,5GB/s ?

He did describe his lab setup by saying that there are
"hundreds of high-end multicore servers performing RTL
simulations 24x7 using NFS".

I have no idea what RTL is, it could perhaps be something
to do with CPUs, I don't know. But it is obvious that what
they are doing is no joke.

br,
KK

James Kuyper <jameskuyper@alumni.caltech.edu>: Jul 07 12:41PM -0400

On 7/7/23 12:14, Kalevi Kolttonen wrote:
...
> "hundreds of high-end multicore servers performing RTL
> simulations 24x7 using NFS".

> I have no idea what RTL is, ...
Scott can tell us, of course, but it might be one of these:
<https://en.wikipedia.org/wiki/RTL#Electronics>

scott@slp53.sl.home (Scott Lurndal): Jul 07 05:00PM

>He did describe his lab setup by saying that there are
>"hundreds of high-end multicore servers performing RTL
>simulations 24x7 using NFS".

RTL (Register Transfer Language) aka HDL (Hardware Description Language)
aka Verilog. A hardware description language used
to design advanced (3nm process in our case) processor chips. With
lots of cores (36 or more) and the aforementioned hardware accelerator
blocks. The simulations actually simulate every gate clock by clock under various
directed and randomized test cases to ensure that the very expensive 3nm
process masks create a working chip. Each block is verified (simulated)
individual, groups of blocks are simulated together and various
full-chip simulations (the entire design) are also run. Then there are
the back-end tasks - floorplanning and pnr (Place and Route) to locate each
RTL block totaling many billion transistors. Then there is timing closure[*],
often with margins in the 10s of picosecond range.

[*] Ensuring that signal propogation in any particular combinatorial
path doesn't exceed the clock interval.

I'd also point out that the major customers for server grade processors
are AWS, Azure, Google and Facebook; all of which use 25/40Gbe(past) or 100Gbe(present)
rack to rack and rack to egress/ingress pipes and at least 10g rack to each server. As
David pointed out, with 192 core processors now available, 192 guest VMs can
easly saturate a 100Gbe link on a server.

Meanwhile, 400Gbe optical PHY modules became available a couple years ago
signaling that 200Gbe and 400Gbe are soon to follow.

Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 07:01PM +0200

Am 07.07.2023 um 19:00 schrieb Scott Lurndal:

> the back-end tasks - floorplanning and pnr (Place and Route) to locate each
> RTL block totaling many billion transistors. Then there is timing closure[*],
> often with margins in the 10s of picosecond range.

And why should that need such large and fast transfers ?

Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 07:01PM +0200

Am 07.07.2023 um 18:02 schrieb David Brown:
>>> 24x7 using NFS.

>> I don't believe you at all, that's pure phantasy.

> Do you think Scott is deliberately lying here? ...

Yes.

Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 07:02PM +0200

Am 07.07.2023 um 18:14 schrieb Kalevi Kolttonen:

> He did describe his lab setup by saying that there are
> "hundreds of high-end multicore servers performing RTL
> simulations 24x7 using NFS".

Simulations needing constant I/Os with such a speed ???

scott@slp53.sl.home (Scott Lurndal): Jul 07 05:16PM

>>> 24x7 using NFS.

>> I don't believe you at all, that's pure phantasy.

>Do you think Scott is deliberately lying here? That's quite the accusation.

Indeed.

>obnoxious to accuse those who /do/ know, and /do/ use such systems, of
>lying about it.

>> But fileservers don't have high CPU-load anyway.

I would argue that this isn't actually the case. Consider,
for example, file servers that support deduplication.

https://www.netapp.com/data-management/what-is-data-deduplication/

kalevi@kolttonen.fi (Kalevi Kolttonen): Jul 07 05:31PM

> RTL (Register Transfer Language)

When reading GCC documentation many years ago, I came
across this RTL:

https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gccint/RTL.html

Does it have anything to do with the RTL you guys use?

br,
KK

David Brown <david.brown@hesbynett.no>: Jul 07 07:32PM +0200

On 07/07/2023 19:01, Bonita Montero wrote:
>> closure[*],
>> often with margins in the 10s of picosecond range.

> And why should that need such large and fast transfers ?

Since you haven't a clue what RTL simulation involves, while Scott
clearly does, why do you feel qualified to doubt him?

RTL simulation for these kinds of chips are /massive/ tasks. (Note that
the "36 or more cores" refers to the chips designed at Scott's company -
not the servers used for simulation, which will have as many cores as
economically feasible.) These chips will have billions of gates, which
need to be simulated. My guess is that they will be using many
thousands of processor cores spread across hundreds of machines for
doing the work, each machine having perhaps half a terabyte of ram.
Since you are simulating a single chip, each server can only simulate
part of it (in time and space), and you need a great deal of
communication between the machines to accurately model the interactions
between parts. So there will be very fast networking between the servers.

Then there are the files and databases involved, and output files. The
design files will be measured in gigabytes, as will the output files,
and there are hundreds of simulation servers accessing these files from
central repositories. Of course they will need massive bandwidths on
the file servers. And you will want the lowest practical latencies for
the file serving - even though file serving takes relatively little
processing power, you want to spread the different clients around the
cores of the server (with RSS) to minimise latencies. You do not want
everything bottlenecked through one core.

All this is, of course, very expensive. But if it saves a single failed
tape-out and prototype run of the chips, it will pay for itself.

Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 07:58PM +0200

Am 07.07.2023 um 19:32 schrieb David Brown:

> part of it (in time and space), and you need a great deal of
> communication between the machines to accurately model the interactions
> between parts. So there will be very fast networking between the servers.

If you would do that with constant I/Os on the data and not in RAM
you would simulate almost nothing. Scott is lying.

kalevi@kolttonen.fi (Kalevi Kolttonen): Jul 07 06:09PM

> Scott is lying.

I am beginning to understand why some people have
killfiled you. I have never killfiled anybody and
never will, but please stop those ridiculous
accusations immediately. It is utter foolishness.

Based on several wise comments, there is absolutely
no reason to doubt Scott Lurndal's sincerity.
I am sure you are the only one who thinks he is lying.

br,
KK

Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 08:17PM +0200

Am 07.07.2023 um 20:09 schrieb Kalevi Kolttonen:

> killfiled you. I have never killfiled anybody and
> never will, but please stop those ridiculous
> accusations immediately. It is utter foolishness.

Think about the lag between I/O latency and cache- / memory-latency.
Imagine what a calculation's speed on a data set constantly fetched
from I/O would be.

kalevi@kolttonen.fi (Kalevi Kolttonen): Jul 07 06:34PM

> Think about the lag between I/O latency and cache- / memory-latency.
> Imagine what a calculation's speed on a data set constantly fetched
> from I/O would be.

No, I will not. Instead I will freely admit that
those demanding CPU simulations involving high-end
hardware devices are totally outside my very small
set of knowledge. It is specialized expert knowlegde
that only a handful of people have. Ordinary folks
never get access to labs like that.

It is obvious that Scott is one of those experts.
Calling him a liar is stone cold crazy... Oh no!

Again, why would he lie? To show off his skills,
or to brag about the workings of his advanced lab?

I seem to remember that Scott has been working in
the CS field ever since the big Burroughs machines
were hot. It means he has *decades of experience*.

This is my last comment concerning this matter.

br,
KK

C++ error handling -> error is rare -> throw and forget

wij <wyniijj5@gmail.com>: Jul 07 07:23AM -0700

I just skimmed the "Error handling" section of https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
It is really blink-talk, all the time till now. It read to me (just one point):
Because error is rare, so throw it (and forget) is justified.
I say practical codes have to written for all possible branches, rare or not
is not an issue. Take a basic example of reading an integer from the standard
input for instance:

int num;
std::cin >> num;

Of course, this piece is mostly for demo. But what if we want practical codes
to read an integer from the standard input? I think the answer is 'no way' for
application developers. stdc++ library developers keep playing blind and cheat.
One possible reason, stdc++ is for checking and demo. the feature of C++
itself, not really for application developer.

"Öö Tiib" <ootiib@hot.ee>: Jul 07 09:37AM -0700

On Friday, 7 July 2023 at 17:23:36 UTC+3, wij wrote:

> int num;
> std::cin >> num;

> Of course, this piece is mostly for demo.

It is unclear what you mean by that example. Yes, majority of software
does not communicate reading text from standard input using C++ streams.

> But what if we want practical codes
> to read an integer from the standard input? I think the answer is 'no way' for
> application developers.

Why you think so? The C++ streams can be and are useful. Just not always.

> stdc++ library developers keep playing blind and cheat.
> One possible reason, stdc++ is for checking and demo. the feature of C++
> itself, not really for application developer.

You mean GNU C++ library implementers? On the contrary ... most contributors
are rather good and well paid C++ developers. They manage to implement
what standard requires and quite well IMHO.

Totally insane (Was: Cars (Was: Parsing multi-byte keystrokes))

Muttley@dastardlyhq.com: Jul 07 07:15AM

On Thu, 6 Jul 2023 16:53:13 -0000 (UTC)
>>linux and unix are 99% interchangable. Yes the kernels differ but then the
>>solaris kernel is a very different to AIX which is different to HP-UX.

>This (preceding) paragraph is totally insane and was totally unnecessary.

So far all of your posts fit that criteria.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Friday, July 7, 2023

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

No comments:

Blog Archive

About Me