Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 09:02AM +0200 Am 06.07.2023 um 09:25 schrieb Bonita Montero: > Even with a billion file descriptors the access time would > be the same, i.e. O(1). In theory, but actually there's some cacching effect if the hashtable fits into the cache: #include <iostream> #include <unordered_map> #include <random> #include <chrono> using namespace std; using namespace chrono; atomic<size_t> aSum( 0 ); int main() { constexpr size_t TO = 0x100000000, ROUNDS = 10'000'000; unordered_map<size_t, size_t> map; mt19937_64 mt; for( size_t n = 1, b = 0; n <= TO; n *= 2, b = n ) { for( size_t i = b; i != n; ++i ) map.emplace( i, i ); uniform_int_distribution<size_t> uid( 0, n - 1 ); size_t sum = 0; auto start = high_resolution_clock::now(); for( size_t r = ROUNDS; r--; ) sum += map[uid( mt )]; double ns = duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / (double)ROUNDS; ::aSum.fetch_add( sum ); cout << hex << n << ": " << ns << endl; } } This are the results on a AMD 7950X: 1: 6.82776 2: 6.82223 4: 6.82411 8: 6.82552 10: 6.82525 20: 6.82547 40: 6.82349 80: 6.84252 100: 6.84796 200: 10.5177 400: 9.12652 800: 12.9653 1000: 15.0953 2000: 16.1944 4000: 17.4853 8000: 11.3426 10000: 13.4169 20000: 14.9261 40000: 18.6082 80000: 31.3227 100000: 63.0259 200000: 87.1134 400000: 103.188 800000: 141.211 1000000: 200.26 2000000: 308.339 4000000: 178.041 8000000: 225.446 10000000: 514.778 20000000: 223.162 40000000: 230.186 80000000: 834.93 100000000: 202.915 The steps to get to the value are always the same. But the access time of the memory largely differs according to the size of the hashtable. |
David Brown <david.brown@hesbynett.no>: Jul 07 10:03AM +0200 On 06/07/2023 20:50, Bonita Montero wrote: >> That is well within the budget for a high-end server. > And which server application on a server of any size can saturate > a 100GbE-link ? A file server with a couple of fast SSD's could do it. Remember, these are 100 Gbps links, not 100 GBps. If you have a 32 core server, that's an average 3 Gbps per core. Usually, however, saturation of the link is not the issue - just like cpu clock speeds, it is often the peaks that matter. You don't (typically) buy a 100 Gb link because you want to send 36 TB over the next hour, you buy it so that you can send 1 GB in a tenth of a second. AMD's latest server chips have 128 cores, and Ampere One's have 192 cores - per socket. Do you think people with a 192 core cpu are going to be happy with a 10 Gb link? And do you think people would be making and selling such processors, and servers containing them, if they were not useful? Note that I said /high-end/ server. Most servers won't need anything like that for their application sides. But if you have a cluster, and storage external to the server, you'll easily find such speeds to be useful. And plenty of systems in data centres, HPC systems, cloud hosting, etc., will have links like that - or faster. |
Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 11:01AM +0200 Am 07.07.2023 um 10:03 schrieb David Brown: > A file server with a couple of fast SSD's could do it. ... No one uses a fileserver with 12,5GB/s. > If you have a 32 core server, that's an average 3 Gbps per core. LOL. You have been bathed to hot by your mother. |
Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 12:53PM +0200 Am 07.07.2023 um 10:03 schrieb David Brown: > A file server with a couple of fast SSD's could do it. Remember, these > are 100 Gbps links, not 100 GBps. And one thing which just in my mind: file server's don't have any additional load beyond I/O, so separating the flows wouldn't hurt. |
scott@slp53.sl.home (Scott Lurndal): Jul 07 02:05PM >Am 07.07.2023 um 10:03 schrieb David Brown: >> A file server with a couple of fast SSD's could do it. ... >No one uses a fileserver with 12,5GB/s. Your computing experiences seem very limited. Our lab fileservers (using 25gb, 40gb and 100gb network adapters) serve hundreds of high-end multicore servers performing RTL simulations 24x7 using NFS. <Die kindische Beleidigung verschwand> |
Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 05:30PM +0200 Am 07.07.2023 um 16:05 schrieb Scott Lurndal: > Our lab fileservers (using 25gb, 40gb and 100gb network adapters) > serve hundreds of high-end multicore servers performing RTL simulations > 24x7 using NFS. I don't believe you at all, that's pure phantasy. But fileservers don't have high CPU-load anyway. So manual segementation wouldn't hurt, even more because in a LAN-segment you have jumbo frames. |
kalevi@kolttonen.fi (Kalevi Kolttonen): Jul 07 03:53PM >> serve hundreds of high-end multicore servers performing RTL simulations >> 24x7 using NFS. > I don't believe you at all, that's pure phantasy. Do you happen to have any good reasons why he would lie about their lab? br, KK |
Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 06:00PM +0200 Am 07.07.2023 um 17:53 schrieb Kalevi Kolttonen: > Do you happen to have any good reasons why he would lie > about their lab? 100GbE is a backbone-technology, maybe for linking switches with further lower speed links. For which application would you need a fileserver which can supply 12,5GB/s ? |
David Brown <david.brown@hesbynett.no>: Jul 07 06:02PM +0200 On 07/07/2023 17:30, Bonita Montero wrote: >> serve hundreds of high-end multicore servers performing RTL simulations >> 24x7 using NFS. > I don't believe you at all, that's pure phantasy. Do you think Scott is deliberately lying here? That's quite the accusation. It would appear you know practically nothing about servers or networking, especially for more demanding uses. That's fine, of course - no one knows about everything, and most people have little interest in such things unless they actually need to know about them. However, it is absurd to suggest that just because /you/ can't imagine what how such systems might be used, they don't exist. And it is arrogant and obnoxious to accuse those who /do/ know, and /do/ use such systems, of lying about it. |
kalevi@kolttonen.fi (Kalevi Kolttonen): Jul 07 04:14PM > 100GbE is a backbone-technology, maybe for linking switches > with further lower speed links. For which application would > you need a fileserver which can supply 12,5GB/s ? He did describe his lab setup by saying that there are "hundreds of high-end multicore servers performing RTL simulations 24x7 using NFS". I have no idea what RTL is, it could perhaps be something to do with CPUs, I don't know. But it is obvious that what they are doing is no joke. br, KK |
James Kuyper <jameskuyper@alumni.caltech.edu>: Jul 07 12:41PM -0400 On 7/7/23 12:14, Kalevi Kolttonen wrote: ... > "hundreds of high-end multicore servers performing RTL > simulations 24x7 using NFS". > I have no idea what RTL is, ... Scott can tell us, of course, but it might be one of these: <https://en.wikipedia.org/wiki/RTL#Electronics> |
scott@slp53.sl.home (Scott Lurndal): Jul 07 05:00PM >He did describe his lab setup by saying that there are >"hundreds of high-end multicore servers performing RTL >simulations 24x7 using NFS". RTL (Register Transfer Language) aka HDL (Hardware Description Language) aka Verilog. A hardware description language used to design advanced (3nm process in our case) processor chips. With lots of cores (36 or more) and the aforementioned hardware accelerator blocks. The simulations actually simulate every gate clock by clock under various directed and randomized test cases to ensure that the very expensive 3nm process masks create a working chip. Each block is verified (simulated) individual, groups of blocks are simulated together and various full-chip simulations (the entire design) are also run. Then there are the back-end tasks - floorplanning and pnr (Place and Route) to locate each RTL block totaling many billion transistors. Then there is timing closure[*], often with margins in the 10s of picosecond range. [*] Ensuring that signal propogation in any particular combinatorial path doesn't exceed the clock interval. I'd also point out that the major customers for server grade processors are AWS, Azure, Google and Facebook; all of which use 25/40Gbe(past) or 100Gbe(present) rack to rack and rack to egress/ingress pipes and at least 10g rack to each server. As David pointed out, with 192 core processors now available, 192 guest VMs can easly saturate a 100Gbe link on a server. Meanwhile, 400Gbe optical PHY modules became available a couple years ago signaling that 200Gbe and 400Gbe are soon to follow. |
Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 07:01PM +0200 Am 07.07.2023 um 19:00 schrieb Scott Lurndal: > the back-end tasks - floorplanning and pnr (Place and Route) to locate each > RTL block totaling many billion transistors. Then there is timing closure[*], > often with margins in the 10s of picosecond range. And why should that need such large and fast transfers ? |
Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 07:01PM +0200 Am 07.07.2023 um 18:02 schrieb David Brown: >>> 24x7 using NFS. >> I don't believe you at all, that's pure phantasy. > Do you think Scott is deliberately lying here? ... Yes. |
Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 07:02PM +0200 Am 07.07.2023 um 18:14 schrieb Kalevi Kolttonen: > He did describe his lab setup by saying that there are > "hundreds of high-end multicore servers performing RTL > simulations 24x7 using NFS". Simulations needing constant I/Os with such a speed ??? |
scott@slp53.sl.home (Scott Lurndal): Jul 07 05:16PM >>> 24x7 using NFS. >> I don't believe you at all, that's pure phantasy. >Do you think Scott is deliberately lying here? That's quite the accusation. Indeed. >obnoxious to accuse those who /do/ know, and /do/ use such systems, of >lying about it. >> But fileservers don't have high CPU-load anyway. I would argue that this isn't actually the case. Consider, for example, file servers that support deduplication. https://www.netapp.com/data-management/what-is-data-deduplication/ |
kalevi@kolttonen.fi (Kalevi Kolttonen): Jul 07 05:31PM > RTL (Register Transfer Language) When reading GCC documentation many years ago, I came across this RTL: https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gccint/RTL.html Does it have anything to do with the RTL you guys use? br, KK |
David Brown <david.brown@hesbynett.no>: Jul 07 07:32PM +0200 On 07/07/2023 19:01, Bonita Montero wrote: >> closure[*], >> often with margins in the 10s of picosecond range. > And why should that need such large and fast transfers ? Since you haven't a clue what RTL simulation involves, while Scott clearly does, why do you feel qualified to doubt him? RTL simulation for these kinds of chips are /massive/ tasks. (Note that the "36 or more cores" refers to the chips designed at Scott's company - not the servers used for simulation, which will have as many cores as economically feasible.) These chips will have billions of gates, which need to be simulated. My guess is that they will be using many thousands of processor cores spread across hundreds of machines for doing the work, each machine having perhaps half a terabyte of ram. Since you are simulating a single chip, each server can only simulate part of it (in time and space), and you need a great deal of communication between the machines to accurately model the interactions between parts. So there will be very fast networking between the servers. Then there are the files and databases involved, and output files. The design files will be measured in gigabytes, as will the output files, and there are hundreds of simulation servers accessing these files from central repositories. Of course they will need massive bandwidths on the file servers. And you will want the lowest practical latencies for the file serving - even though file serving takes relatively little processing power, you want to spread the different clients around the cores of the server (with RSS) to minimise latencies. You do not want everything bottlenecked through one core. All this is, of course, very expensive. But if it saves a single failed tape-out and prototype run of the chips, it will pay for itself. |
Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 07:58PM +0200 Am 07.07.2023 um 19:32 schrieb David Brown: > part of it (in time and space), and you need a great deal of > communication between the machines to accurately model the interactions > between parts. So there will be very fast networking between the servers. If you would do that with constant I/Os on the data and not in RAM you would simulate almost nothing. Scott is lying. |
kalevi@kolttonen.fi (Kalevi Kolttonen): Jul 07 06:09PM > Scott is lying. I am beginning to understand why some people have killfiled you. I have never killfiled anybody and never will, but please stop those ridiculous accusations immediately. It is utter foolishness. Based on several wise comments, there is absolutely no reason to doubt Scott Lurndal's sincerity. I am sure you are the only one who thinks he is lying. br, KK |
Bonita Montero <Bonita.Montero@gmail.com>: Jul 07 08:17PM +0200 Am 07.07.2023 um 20:09 schrieb Kalevi Kolttonen: > killfiled you. I have never killfiled anybody and > never will, but please stop those ridiculous > accusations immediately. It is utter foolishness. Think about the lag between I/O latency and cache- / memory-latency. Imagine what a calculation's speed on a data set constantly fetched from I/O would be. |
kalevi@kolttonen.fi (Kalevi Kolttonen): Jul 07 06:34PM > Think about the lag between I/O latency and cache- / memory-latency. > Imagine what a calculation's speed on a data set constantly fetched > from I/O would be. No, I will not. Instead I will freely admit that those demanding CPU simulations involving high-end hardware devices are totally outside my very small set of knowledge. It is specialized expert knowlegde that only a handful of people have. Ordinary folks never get access to labs like that. It is obvious that Scott is one of those experts. Calling him a liar is stone cold crazy... Oh no! Again, why would he lie? To show off his skills, or to brag about the workings of his advanced lab? I seem to remember that Scott has been working in the CS field ever since the big Burroughs machines were hot. It means he has *decades of experience*. This is my last comment concerning this matter. br, KK |
wij <wyniijj5@gmail.com>: Jul 07 07:23AM -0700 I just skimmed the "Error handling" section of https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines It is really blink-talk, all the time till now. It read to me (just one point): Because error is rare, so throw it (and forget) is justified. I say practical codes have to written for all possible branches, rare or not is not an issue. Take a basic example of reading an integer from the standard input for instance: int num; std::cin >> num; Of course, this piece is mostly for demo. But what if we want practical codes to read an integer from the standard input? I think the answer is 'no way' for application developers. stdc++ library developers keep playing blind and cheat. One possible reason, stdc++ is for checking and demo. the feature of C++ itself, not really for application developer. |
"Öö Tiib" <ootiib@hot.ee>: Jul 07 09:37AM -0700 On Friday, 7 July 2023 at 17:23:36 UTC+3, wij wrote: > int num; > std::cin >> num; > Of course, this piece is mostly for demo. It is unclear what you mean by that example. Yes, majority of software does not communicate reading text from standard input using C++ streams. > But what if we want practical codes > to read an integer from the standard input? I think the answer is 'no way' for > application developers. Why you think so? The C++ streams can be and are useful. Just not always. > stdc++ library developers keep playing blind and cheat. > One possible reason, stdc++ is for checking and demo. the feature of C++ > itself, not really for application developer. You mean GNU C++ library implementers? On the contrary ... most contributors are rather good and well paid C++ developers. They manage to implement what standard requires and quite well IMHO. |
Muttley@dastardlyhq.com: Jul 07 07:15AM On Thu, 6 Jul 2023 16:53:13 -0000 (UTC) >>linux and unix are 99% interchangable. Yes the kernels differ but then the >>solaris kernel is a very different to AIX which is different to HP-UX. >This (preceding) paragraph is totally insane and was totally unnecessary. So far all of your posts fit that criteria. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment