- question about memory-bandwidth and logical cores - 8 Updates
- Faster than Boost, Cereal and Protobuf - 13 Updates
- reading random bytes from memory - 3 Updates
- quaternion graphics in C or C-style C++? - 1 Update
David Brown <david.brown@hesbynett.no>: Aug 21 08:58AM +0200 On 20/08/17 17:38, Paavo Helde wrote: >> im not sure) if someone have such info tell me know > I think the L1 cache is typically shared by the logical cores, so the > memory bandwidth does not really double for logical cpus. Of course L1 cache (content and bandwidth) is shared by the logical cores - they are /logical/ cores, not physical cores. They share almost everything - instruction decoders, pipelines, execution units, buffers, etc. They have separate logical sets of ISA registers (the registers visible to the programmer), but on devices like x86 chips (where the ISA has few registers) there are many more physical hardware registers that are mapped at different times - and the logical cores share them too. Cores and caches are organised as a hierarchy in multi-core devices. The highest bandwidths are in the closest steps - physical cores to their L1 caches, cores to cores within a core cluster (if the chip has this level), L1 caches to their L2 caches, L2 caches to the L3 cache (usually shared amongst all cores on the chip), bandwidth off-chip. Usually the off-chip bandwidth is shared amongst all cores, but for multi-module chips like AMD's new devices, each chip in the module has its own buses off the module. In other words - it is complicated, depends totally on the level of cache you are talking to, and details are specific to the device architecture. And as has been pointed out, it has /nothing/ to do with C++ - it is general architecture issue, independent of language. Unless you are targeting for a specific chip (such as fine-tuning for a particular supercomputer model), you use the same general rules for all languages, and all chips: Aim for locality of reference in your critical data structures. Keep the structures small. Avoid sharing and false sharing between threads. Use an OS that is aware of the memory architecture of your processor, and the geometry of its logical and physical cores. (I am replying to you here, for your interest. I have long ago seen it as pointless trying to talk to Fir.) |
Paavo Helde <myfirstname@osa.pri.ee>: Aug 21 01:05PM +0300 On 21.08.2017 9:58, David Brown wrote: > your processor, and the geometry of its logical and physical cores. > (I am replying to you here, for your interest. I have long ago seen it > as pointless trying to talk to Fir.) Thanks for clarifying this, this is more or less consistent with my understanding. I had an impression that there are still separate cpu instruction pipelines for logical processors - they are executing different code after all - is this not so? I agree it is pointless to discuss with Fir, but there is no rule one should do meaningful things all the time ;-) Some of his absurd ideas contain some interesting moments... Cheers Paavo |
scott@slp53.sl.home (Scott Lurndal): Aug 21 01:00PM >I had an impression that there are still separate cpu instruction >pipelines for logical processors - they are executing different code >after all - is this not so? The whole point of SMT (e.g. hyperthreading) is to have higher utilization of the core resources. The hyperthreads/logical processors share all the resources of the core (except each logical processor keeps separate state - e.g. registers, page table base address, etc). The caches, store buffers, pipelines are shared. |
David Brown <david.brown@hesbynett.no>: Aug 21 04:16PM +0200 On 21/08/17 12:05, Paavo Helde wrote: > I had an impression that there are still separate cpu instruction > pipelines for logical processors - they are executing different code > after all - is this not so? They will have to keep some parts separated, so that they can track independent instruction schemes. How much is duplicated, and how much is shared, is going to vary a bit between implementations. |
fir <profesor.fir@gmail.com>: Aug 21 08:41AM -0700 W dniu poniedziałek, 21 sierpnia 2017 12:05:43 UTC+2 użytkownik Paavo Helde napisał: > I agree it is pointless to discuss with Fir, but there is no rule one > should do meaningful things all the time ;-) Some of his absurd ideas > contain some interesting moments... brown is total lama i wouldnt listen to that fella (unles someone wants to gets stupider) as those bandwidth imo it is probably clear most preferably do some test with memset if you got logical cores at home it is binary thing imo, like with physical cores and sse/avx with 2 physical cores whan you do memset you will get it twoce as fast when you use 2 cores [tried it myself belive me] when using avx, even if t has commands to store 8 integers at once you will get 0% speed bonus (compared to usege 8 sequential 32 bit mov stores) [tried it myself belive me] logical cores are like AVX or like physical cores (i guess form whats is said here and form other things i maybe heard and i vaguelly remember that it goes unfortunatelly like AVX - no additional MemBandw) [no tried it myself yet, got no logical cores on board] |
Paavo Helde <myfirstname@osa.pri.ee>: Aug 21 10:28PM +0300 On 21.08.2017 18:41, fir wrote: > brown is total lama i wouldnt listen to that fella (unles someone wants to gets stupider) Calling somebody a Tibetan Lama is a compliment in my book! |
fir <profesor.fir@gmail.com>: Aug 21 01:26PM -0700 W dniu poniedziałek, 21 sierpnia 2017 21:29:09 UTC+2 użytkownik Paavo Helde napisał: > On 21.08.2017 18:41, fir wrote: > > brown is total lama i wouldnt listen to that fella (unles someone wants to gets stupider) > Calling somebody a Tibetan Lama is a compliment in my book! well, im not sure if this brown is Tibetian lama, but for sure he is lama |
David Brown <david.brown@hesbynett.no>: Aug 21 10:44PM +0200 On 21/08/17 21:28, Paavo Helde wrote: >> brown is total lama i wouldnt listen to that fella (unles someone >> wants to gets stupider) > Calling somebody a Tibetan Lama is a compliment in my book! Don't forget that Fir does not believe in correct spelling, or using the conventional meanings for words. You can try to guess what he is trying to say, or just ignore him. Certainly don't try to offer help, advice or answers to his questions - that just results in insults. I suspect it is because he can't cope with the idea that someone knows more than he does - he asks more in the hope that other people will confirm that they don't know either. Then he can make more posts replying to himself with less and less intelligible content, and he can imagine that he is the only person smart enough to talk to. Sometimes his posts inspire interesting questions or other posts, however. If Fir listens in and learns something, that's okay. |
woodbrian77@gmail.com: Aug 20 09:33PM -0700 I'm happy to report that the C++ Midleware Writer (CMW) is faster than the serialization library in Boost, Cereal and Protobuf in this benchmark: /https://github.com/thekvs/cpp-serializers . The CMW produced a smaller serialized size than Capnproto or Cereal: Capnproto 17,768 Cereal 17,416 CMW 16,712 I'm happy to give demos of the software. If you have a 2017 C++ compiler, it normally takes about ten minutes. The first step is to download/clone this: https://github.com/Ebenezer-group/onwards Brian Ebenezer Enterprises - In G-d we trust. http://webEbenezer.net |
Daniel <danielaparker@gmail.com>: Aug 20 09:43PM -0700 > is faster than the serialization library in Boost, Cereal > and Protobuf in this benchmark: > /https://github.com/thekvs/cpp-serializers Am I missing something? I don't see CMW in that benchmark. Daniel https://github.com/danielaparker/jsoncons |
woodbrian77@gmail.com: Aug 20 10:14PM -0700 On Sunday, August 20, 2017 at 11:43:27 PM UTC-5, Daniel wrote: > Am I missing something? I don't see CMW in that benchmark. No, it's not listed there. I ran the benchmark locally. |
David Brown <david.brown@hesbynett.no>: Aug 21 09:27AM +0200 > On Sunday, August 20, 2017 at 11:43:27 PM UTC-5, Daniel wrote: >> Am I missing something? I don't see CMW in that benchmark. > No, it's not listed there. I ran the benchmark locally. Then how about giving a list of the results here? Otherwise it looks like you are just cherry-picking - saying you are faster than Boost, Cereal and Protobuf but "forgetting" to mention yas, thrift, msgpack and the others listed on that site. Benchmarks can give an idea of relative speeds and sizes, but you have to provide the numbers - not your conclusions, which will be highly biased (or at least assumed to be highly biased) since you are the producer of one of the competing libraries. Of course, there are all sorts of feature and requirements differences between these libraries which are usually far more important than speed or size. It would be helpful to have a comparison there too (the github project is missing this information, and is basically useless for anyone trying to consider choosing a serialisation library). |
Daniel <danielaparker@gmail.com>: Aug 21 04:24AM -0700 > On Sunday, August 20, 2017 at 11:43:27 PM UTC-5, Daniel wrote: > > Am I missing something? I don't see CMW in that benchmark. > No, it's not listed there. I ran the benchmark locally. I would suggest cloning the cpp-serializers master branch, adding CMW, and submitting a pull request. If your project is not accepted, you can still send a link to the cloned github project here, so people can see if they can reproduce your results, should they wish to do so. Daniel https://github.com/danielaparker/jsoncons |
"Öö Tiib" <ootiib@hot.ee>: Aug 21 05:42AM -0700 > On Sunday, August 20, 2017 at 11:43:27 PM UTC-5, Daniel wrote: > > Am I missing something? I don't see CMW in that benchmark. > No, it's not listed there. I ran the benchmark locally. It looks half of a test anyway. What I would expect is serialization speed comparison, size comparison otw (need for bandwidth) and deserialization speed comparison. A thing has to be tested from end to end otherwise the results are likely meaningless with gaps for cheating. |
woodbrian77@gmail.com: Aug 21 08:08AM -0700 On Monday, August 21, 2017 at 2:27:42 AM UTC-5, David Brown wrote: > Then how about giving a list of the results here? Otherwise it looks > like you are just cherry-picking - saying you are faster than Boost, > Cereal and Protobuf but "forgetting" to mention yas, thrift, msgpack and If it's faster than Cereal, it's faster than thrift and msgpack. > speeds and sizes, but you have to provide the numbers - not your > conclusions, which will be highly biased (or at least assumed to be > highly biased) since you are the producer of one of the competing libraries. The size I provided is a number. > or size. It would be helpful to have a comparison there too (the github > project is missing this information, and is basically useless for anyone > trying to consider choosing a serialisation library). The CMW automates the creation of serialization functions. Here's another serialization library: https://github.com/eliasdaler/MetaStuff His approach requires you to maintain functions like this: template <> inline auto registerMembers<Person>() { return members( member("age", &Person::getAge, &Person::setAge), member("name", &Person::getName, &Person::setName), member("salary", &Person::salary), member("favouriteMovies", &Person::favouriteMovies) ); } With the CMW you don't have to write code like that. Other than the CMW, I'm not aware of other libraries that have support for plf::colony or std::string_view. Brian Ebenezer Enterprises http://webEbenezer.net |
woodbrian77@gmail.com: Aug 21 08:23AM -0700 On Monday, August 21, 2017 at 6:24:41 AM UTC-5, Daniel wrote: > > No, it's not listed there. I ran the benchmark locally. > I would suggest cloning the cpp-serializers master branch, adding CMW, and submitting a pull request. > If your project is not accepted, you can still send a link to the cloned github project here, so people can see if they can reproduce your results, should they wish to do so. I did send an email to the author of the benchmark telling him my serialized size and how it did on the timing. He hasn't replied. What I could do is publish the code I used in my repo. |
woodbrian77@gmail.com: Aug 21 08:27AM -0700 On Monday, August 21, 2017 at 7:43:05 AM UTC-5, Öö Tiib wrote: > comparison, size comparison otw (need for bandwidth) and deserialization > speed comparison. A thing has to be tested from end to end otherwise > the results are likely meaningless with gaps for cheating. I don't see a big difference between his benchmark and what you wrote. He provides the serialized sizes and the combined (serialization and deserialization) times. You want to see it broken down more? In my opinion it's an OK benchmark. |
Daniel <danielaparker@gmail.com>: Aug 21 08:32AM -0700 > I did send an email to the author of the benchmark telling him > my serialized size and how it did on the timing. He hasn't > replied. Why would he? Either you send him a pull request, which is what people do if they want to be included in somebody else's benchmarks, or there's nothing to reply to. Daniel |
woodbrian77@gmail.com: Aug 21 08:35AM -0700 > > conclusions, which will be highly biased (or at least assumed to be > > highly biased) since you are the producer of one of the competing libraries. > The size I provided is a number. Cereal 17,416 CMW 16,712 One difference is probably due to my using a variable-length integer for the string lengths. There's a vector of 100 strings in the test. I'm not sure if Cereal is using 4 byte or 8 byte string lengths. If it's using 8 bytes and I only need 1 byte for each string, that's 700 bytes which is close to the difference in sizes between my approach and Cereal. I use 4 byte integers for the lengths of the vectors. Some of the others may use 8. In that sense they are more general than my approach, but am not sure how often it matters. In this test there are two vectors, so it would be an 8 byte difference. |
Daniel <danielaparker@gmail.com>: Aug 21 08:53AM -0700 > it's using 8 bytes and I only need 1 byte for each string, that's 700 > bytes which is close to the difference in sizes between my approach > and Cereal. Don't know about Cereal, but most binary representations use variable length encodings for the lengths of strings, arrays or objects, see for example MessagePack or cbor. For short strings or small integers, they typically combine the data type code and the length into one byte. A big obstacle you'll have to getting a user for CMW is the fact that you're using a proprietary data format that is known only to you. For example, if you were to use cbor instead, somebody could create binary data encodings with your software and read them in a python application with no additional work. Daniel |
woodbrian77@gmail.com: Aug 21 12:06PM -0700 On Monday, August 21, 2017 at 10:54:07 AM UTC-5, Daniel wrote: > encodings for the lengths of strings, arrays or objects, see for example > MessagePack or cbor. For short strings or small integers, they typically > combine the data type code and the length into one byte. I don't need data type codes. At least not in general. > A big obstacle you'll have to getting a user for CMW is the fact that you're > using a proprietary data format that is known only to you. The format is not a secret. Others like the serialization library in Boost or Cereal don't use cbor. > you were to use cbor instead, somebody could create binary data encodings > with your software and read them in a python application with no additional > work. I hope for something like that in the future, but will let things shake out a little before working on that. Brian |
Marcel Mueller <news.5.maazl@spamgourmet.org>: Aug 21 03:51AM +0200 On 20.08.17 23.38, fir wrote: > sum+= *x; > } > then depending on readed ram area it will work or crash ("there is a trouble with that aplication, aplication will be closed") Of course, this is undefined behavior. > i think you get crash when you will read a ram area where ram is just not pinned > (im not sure if on windows pages are just guarded from read, i understand write, execute, but read? is this the case?) Read about virtual memory and 486 protected mode (yes 486, the model is still valid) to get an idea about memory protection and process separation. > that was one question second is > how to catch and recover from this crash (i just want to write a tiny ram scanner who will try read all 32 bit ram area and will give me info back which areas i can read and which i cant) Any program running at user space will never be able to read any memory that it has not written itself before (directly or indirectly) or is initialized by the kernel with zeros. Again read about process separation. > (same possibly with write though im not sure if i will try read any byte vale then write say 0x55 to any byte then write back oryginal value it will calmly stand such crash test ;c will it? If the page is not writable, it will crash. Marcel |
Paavo Helde <myfirstname@osa.pri.ee>: Aug 21 08:51AM +0300 On 21.08.2017 0:38, fir wrote: > then depending on readed ram area it will work or crash ("there is a trouble with that aplication, aplication will be closed") > i think you get crash when you will read a ram area where ram is just not pinned > (im not sure if on windows pages are just guarded from read, i understand write, execute, but read? is this the case?) The virtual memory space the process sees consists of memory pages which are addressed indirectly through the special page tables. The problem is not that the pages are readable, but that they are just missing (page table entries not pointing to a normal memory page). > that was one question second is > how to catch and recover from this crash (i just want to write a tiny ram scanner who will try read all 32 bit ram area and will give me info back which areas i can read and which i cant) In Windows and with MSVC you need to compile with /EHa option and use catch(...) with ellipses in a C++ try block. Such a block will catch page access violations and other stuff as an MSVC extension. In the catch(...) block you can study the exception (by rethrowing it in the special __try block and using GetExceptionCode() in the __except block) to figure out if it is a normal C++ exception or a so-called "structured exception" which covers page access errors. Not recommended for production code, this is highly unportable and can easily be misused for hiding program errors instead of fixing them. Also, /EHa will make the whole program a bit slower. If you just need to test the memory for readability or writability then in Windows there are the IsBadReadPtr(), IsBadWritePtr() SDK functions. Not recommended either, except for the kind of experimentation you seem do be doing. HTH Paavo |
fir <profesor.fir@gmail.com>: Aug 21 08:31AM -0700 W dniu poniedziałek, 21 sierpnia 2017 07:52:00 UTC+2 użytkownik Paavo Helde napisał: > do be doing. > HTH > Paavo do you know maybe how its in pure c (c has signal.h header and can register some kind of handler - im to tired now to check it myself, but would like to know if some has easy way to know that) most ideally i would like to get full info like "MOV instruction for IP 0x0040_110c tried to read from adress 0x0000_0012 which is page guarded from read" and then silently continue execution thru all those reads until my app will go thru all bytes and succesfully finish execution |
SG <s.gesemann@gmail.com>: Aug 21 05:12AM -0700 On Sunday, August 20, 2017 at 6:58:21 AM UTC+2, David Melik wrote: > So, I'd like to know, how can quaternions simplify this process? I > recall they're something like a scalar on some (x,y,z) but forgot how > that would seem to simplify any multiplication or iteration. Quaternions are useful if you need a compact representation of a 3D rotation matrix. Given a normalized quaternion q, it's rather easy to determine a corresponding rotation matrix R R = q2rot(q) (I won't bother defining q2rot) And for every rotation matrix R there are two such quaternions: q and -q. This q2rot functions has the following properties: q2rot(q) = q2rot(-q) and q2rot(a * b) = q2rot(a) * q2rot(b). Given the last equality and the fact that multiplying quaternions is cheaper than multiplying 3x3 rotation matrices, quaternions allow you to efficiently multiply lots of 3D rotations together. So, if you need to multiply lots of rotations, quaternions are going to be more efficient for that. However, if you want to apply the resulting rotation to a collection of points (like the vertices of your cube), you should probably convert the quaternion back to a 3x3 rotation matrix because this matrix representation is more efficient for such things in terms of number of necessary floating point operations. I think that answers your question? Cheers! SG |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment