soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

question about memory-bandwidth and logical cores - 8 Updates
Faster than Boost, Cereal and Protobuf - 13 Updates
reading random bytes from memory - 3 Updates
quaternion graphics in C or C-style C++? - 1 Update

question about memory-bandwidth and logical cores

David Brown <david.brown@hesbynett.no>: Aug 21 08:58AM +0200

On 20/08/17 17:38, Paavo Helde wrote:
>> im not sure) if someone have such info tell me know

> I think the L1 cache is typically shared by the logical cores, so the
> memory bandwidth does not really double for logical cpus.

Of course L1 cache (content and bandwidth) is shared by the logical
cores - they are /logical/ cores, not physical cores. They share almost
everything - instruction decoders, pipelines, execution units, buffers,
etc. They have separate logical sets of ISA registers (the registers
visible to the programmer), but on devices like x86 chips (where the ISA
has few registers) there are many more physical hardware registers that
are mapped at different times - and the logical cores share them too.

Cores and caches are organised as a hierarchy in multi-core devices.
The highest bandwidths are in the closest steps - physical cores to
their L1 caches, cores to cores within a core cluster (if the chip has
this level), L1 caches to their L2 caches, L2 caches to the L3 cache
(usually shared amongst all cores on the chip), bandwidth off-chip.
Usually the off-chip bandwidth is shared amongst all cores, but for
multi-module chips like AMD's new devices, each chip in the module has
its own buses off the module.

In other words - it is complicated, depends totally on the level of
cache you are talking to, and details are specific to the device
architecture.

And as has been pointed out, it has /nothing/ to do with C++ - it is
general architecture issue, independent of language. Unless you are
targeting for a specific chip (such as fine-tuning for a particular
supercomputer model), you use the same general rules for all languages,
and all chips: Aim for locality of reference in your critical data
structures. Keep the structures small. Avoid sharing and false sharing
between threads. Use an OS that is aware of the memory architecture of
your processor, and the geometry of its logical and physical cores.

(I am replying to you here, for your interest. I have long ago seen it
as pointless trying to talk to Fir.)

Paavo Helde <myfirstname@osa.pri.ee>: Aug 21 01:05PM +0300

On 21.08.2017 9:58, David Brown wrote:
> your processor, and the geometry of its logical and physical cores.

> (I am replying to you here, for your interest. I have long ago seen it
> as pointless trying to talk to Fir.)

Thanks for clarifying this, this is more or less consistent with my
understanding.

I had an impression that there are still separate cpu instruction
pipelines for logical processors - they are executing different code
after all - is this not so?

I agree it is pointless to discuss with Fir, but there is no rule one
should do meaningful things all the time ;-) Some of his absurd ideas
contain some interesting moments...

Cheers
Paavo

scott@slp53.sl.home (Scott Lurndal): Aug 21 01:00PM

>I had an impression that there are still separate cpu instruction
>pipelines for logical processors - they are executing different code
>after all - is this not so?

The whole point of SMT (e.g. hyperthreading) is to have higher
utilization of the core resources. The hyperthreads/logical processors
share all the resources of the core (except each logical processor
keeps separate state - e.g. registers, page table base address,
etc). The caches, store buffers, pipelines are shared.

David Brown <david.brown@hesbynett.no>: Aug 21 04:16PM +0200

On 21/08/17 12:05, Paavo Helde wrote:

> I had an impression that there are still separate cpu instruction
> pipelines for logical processors - they are executing different code
> after all - is this not so?

They will have to keep some parts separated, so that they can track
independent instruction schemes. How much is duplicated, and how much
is shared, is going to vary a bit between implementations.

fir <profesor.fir@gmail.com>: Aug 21 08:41AM -0700

W dniu poniedziałek, 21 sierpnia 2017 12:05:43 UTC+2 użytkownik Paavo Helde napisał:

> I agree it is pointless to discuss with Fir, but there is no rule one
> should do meaningful things all the time ;-) Some of his absurd ideas
> contain some interesting moments...

brown is total lama i wouldnt listen to that fella (unles someone wants to gets stupider)

as those bandwidth imo it is probably clear
most preferably do some test with memset if you got logical cores at home

it is binary thing imo, like with physical cores and sse/avx

with 2 physical cores whan you do memset you will get it twoce as fast when you use 2 cores [tried it myself belive me]

when using avx, even if t has commands to store 8 integers at once you will get 0% speed bonus (compared to usege 8 sequential 32 bit mov stores) [tried it myself belive me]

logical cores are like AVX or like physical cores (i guess form whats is said here and form other things i maybe heard and i vaguelly remember that it goes unfortunatelly like AVX - no additional MemBandw)
[no tried it myself yet, got no logical cores on board]

Paavo Helde <myfirstname@osa.pri.ee>: Aug 21 10:28PM +0300

On 21.08.2017 18:41, fir wrote:

> brown is total lama i wouldnt listen to that fella (unles someone wants to gets stupider)

Calling somebody a Tibetan Lama is a compliment in my book!

fir <profesor.fir@gmail.com>: Aug 21 01:26PM -0700

W dniu poniedziałek, 21 sierpnia 2017 21:29:09 UTC+2 użytkownik Paavo Helde napisał:
> On 21.08.2017 18:41, fir wrote:

> > brown is total lama i wouldnt listen to that fella (unles someone wants to gets stupider)

> Calling somebody a Tibetan Lama is a compliment in my book!

well, im not sure if this brown is Tibetian lama, but for sure he is lama

David Brown <david.brown@hesbynett.no>: Aug 21 10:44PM +0200

On 21/08/17 21:28, Paavo Helde wrote:

>> brown is total lama i wouldnt listen to that fella (unles someone
>> wants to gets stupider)

> Calling somebody a Tibetan Lama is a compliment in my book!

Don't forget that Fir does not believe in correct spelling, or using the
conventional meanings for words. You can try to guess what he is trying
to say, or just ignore him. Certainly don't try to offer help, advice
or answers to his questions - that just results in insults. I suspect
it is because he can't cope with the idea that someone knows more than
he does - he asks more in the hope that other people will confirm that
they don't know either. Then he can make more posts replying to himself
with less and less intelligible content, and he can imagine that he is
the only person smart enough to talk to.

Sometimes his posts inspire interesting questions or other posts,
however. If Fir listens in and learns something, that's okay.

Faster than Boost, Cereal and Protobuf

woodbrian77@gmail.com: Aug 20 09:33PM -0700

I'm happy to report that the C++ Midleware Writer (CMW)
is faster than the serialization library in Boost, Cereal
and Protobuf in this benchmark:
/https://github.com/thekvs/cpp-serializers

. The CMW produced a smaller serialized size than
Capnproto or Cereal:

Capnproto 17,768
Cereal 17,416
CMW 16,712

I'm happy to give demos of the software. If you have
a 2017 C++ compiler, it normally takes about ten minutes.
The first step is to download/clone this:
https://github.com/Ebenezer-group/onwards

Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net

Daniel <danielaparker@gmail.com>: Aug 20 09:43PM -0700

> is faster than the serialization library in Boost, Cereal
> and Protobuf in this benchmark:
> /https://github.com/thekvs/cpp-serializers

Am I missing something? I don't see CMW in that benchmark.

Daniel
https://github.com/danielaparker/jsoncons

woodbrian77@gmail.com: Aug 20 10:14PM -0700

On Sunday, August 20, 2017 at 11:43:27 PM UTC-5, Daniel wrote:
> Am I missing something? I don't see CMW in that benchmark.

No, it's not listed there. I ran the benchmark locally.

David Brown <david.brown@hesbynett.no>: Aug 21 09:27AM +0200

> On Sunday, August 20, 2017 at 11:43:27 PM UTC-5, Daniel wrote:
>> Am I missing something? I don't see CMW in that benchmark.

> No, it's not listed there. I ran the benchmark locally.

Then how about giving a list of the results here? Otherwise it looks
like you are just cherry-picking - saying you are faster than Boost,
Cereal and Protobuf but "forgetting" to mention yas, thrift, msgpack and
the others listed on that site. Benchmarks can give an idea of relative
speeds and sizes, but you have to provide the numbers - not your
conclusions, which will be highly biased (or at least assumed to be
highly biased) since you are the producer of one of the competing libraries.

Of course, there are all sorts of feature and requirements differences
between these libraries which are usually far more important than speed
or size. It would be helpful to have a comparison there too (the github
project is missing this information, and is basically useless for anyone
trying to consider choosing a serialisation library).

Daniel <danielaparker@gmail.com>: Aug 21 04:24AM -0700

> On Sunday, August 20, 2017 at 11:43:27 PM UTC-5, Daniel wrote:
> > Am I missing something? I don't see CMW in that benchmark.

> No, it's not listed there. I ran the benchmark locally.

I would suggest cloning the cpp-serializers master branch, adding CMW, and submitting a pull request.

If your project is not accepted, you can still send a link to the cloned github project here, so people can see if they can reproduce your results, should they wish to do so.

Daniel
https://github.com/danielaparker/jsoncons

"Öö Tiib" <ootiib@hot.ee>: Aug 21 05:42AM -0700

> On Sunday, August 20, 2017 at 11:43:27 PM UTC-5, Daniel wrote:
> > Am I missing something? I don't see CMW in that benchmark.

> No, it's not listed there. I ran the benchmark locally.

It looks half of a test anyway. What I would expect is serialization speed
comparison, size comparison otw (need for bandwidth) and deserialization
speed comparison. A thing has to be tested from end to end otherwise
the results are likely meaningless with gaps for cheating.

woodbrian77@gmail.com: Aug 21 08:08AM -0700

On Monday, August 21, 2017 at 2:27:42 AM UTC-5, David Brown wrote:

> Then how about giving a list of the results here? Otherwise it looks
> like you are just cherry-picking - saying you are faster than Boost,
> Cereal and Protobuf but "forgetting" to mention yas, thrift, msgpack and

If it's faster than Cereal, it's faster than thrift and msgpack.

> speeds and sizes, but you have to provide the numbers - not your
> conclusions, which will be highly biased (or at least assumed to be
> highly biased) since you are the producer of one of the competing libraries.

The size I provided is a number.

> or size. It would be helpful to have a comparison there too (the github
> project is missing this information, and is basically useless for anyone
> trying to consider choosing a serialisation library).

The CMW automates the creation of serialization functions.
Here's another serialization library:
https://github.com/eliasdaler/MetaStuff

His approach requires you to maintain functions like this:

template <>
inline auto registerMembers<Person>()
{
return members(
member("age", &Person::getAge, &Person::setAge),
member("name", &Person::getName, &Person::setName),
member("salary", &Person::salary),
member("favouriteMovies", &Person::favouriteMovies)
);
}

With the CMW you don't have to write code like that.

Other than the CMW, I'm not aware of other libraries that have
support for plf::colony or std::string_view.

Brian
Ebenezer Enterprises
http://webEbenezer.net

woodbrian77@gmail.com: Aug 21 08:23AM -0700

On Monday, August 21, 2017 at 6:24:41 AM UTC-5, Daniel wrote:

> > No, it's not listed there. I ran the benchmark locally.

> I would suggest cloning the cpp-serializers master branch, adding CMW, and submitting a pull request.

> If your project is not accepted, you can still send a link to the cloned github project here, so people can see if they can reproduce your results, should they wish to do so.

I did send an email to the author of the benchmark telling him
my serialized size and how it did on the timing. He hasn't
replied.

What I could do is publish the code I used in my repo.

woodbrian77@gmail.com: Aug 21 08:27AM -0700

On Monday, August 21, 2017 at 7:43:05 AM UTC-5, Öö Tiib wrote:
> comparison, size comparison otw (need for bandwidth) and deserialization
> speed comparison. A thing has to be tested from end to end otherwise
> the results are likely meaningless with gaps for cheating.

I don't see a big difference between his benchmark and what
you wrote. He provides the serialized sizes and the combined
(serialization and deserialization) times. You want to see
it broken down more? In my opinion it's an OK benchmark.

Daniel <danielaparker@gmail.com>: Aug 21 08:32AM -0700

> I did send an email to the author of the benchmark telling him
> my serialized size and how it did on the timing. He hasn't
> replied.

Why would he? Either you send him a pull request, which is what people do if they want to be included in somebody else's benchmarks, or there's nothing
to reply to.

Daniel

woodbrian77@gmail.com: Aug 21 08:35AM -0700

> > conclusions, which will be highly biased (or at least assumed to be
> > highly biased) since you are the producer of one of the competing libraries.

> The size I provided is a number.

Cereal 17,416
CMW 16,712

One difference is probably due to my using a variable-length integer
for the string lengths. There's a vector of 100 strings in the test.
I'm not sure if Cereal is using 4 byte or 8 byte string lengths. If
it's using 8 bytes and I only need 1 byte for each string, that's 700
bytes which is close to the difference in sizes between my approach
and Cereal.

I use 4 byte integers for the lengths of the vectors. Some of the others may use 8. In that sense they are more general than my approach, but am not sure how often it matters. In this test there are two vectors, so it would be an 8 byte difference.

Daniel <danielaparker@gmail.com>: Aug 21 08:53AM -0700

> it's using 8 bytes and I only need 1 byte for each string, that's 700
> bytes which is close to the difference in sizes between my approach
> and Cereal.

Don't know about Cereal, but most binary representations use variable length
encodings for the lengths of strings, arrays or objects, see for example
MessagePack or cbor. For short strings or small integers, they typically
combine the data type code and the length into one byte.

A big obstacle you'll have to getting a user for CMW is the fact that you're
using a proprietary data format that is known only to you. For example, if
you were to use cbor instead, somebody could create binary data encodings
with your software and read them in a python application with no additional
work.

Daniel

woodbrian77@gmail.com: Aug 21 12:06PM -0700

On Monday, August 21, 2017 at 10:54:07 AM UTC-5, Daniel wrote:
> encodings for the lengths of strings, arrays or objects, see for example
> MessagePack or cbor. For short strings or small integers, they typically
> combine the data type code and the length into one byte.

I don't need data type codes. At least not in general.

> A big obstacle you'll have to getting a user for CMW is the fact that you're
> using a proprietary data format that is known only to you.

The format is not a secret. Others like the serialization
library in Boost or Cereal don't use cbor.

> you were to use cbor instead, somebody could create binary data encodings
> with your software and read them in a python application with no additional
> work.

I hope for something like that in the future, but will let
things shake out a little before working on that.

Brian

reading random bytes from memory

Marcel Mueller <news.5.maazl@spamgourmet.org>: Aug 21 03:51AM +0200

On 20.08.17 23.38, fir wrote:
> sum+= *x;
> }
> then depending on readed ram area it will work or crash ("there is a trouble with that aplication, aplication will be closed")

Of course, this is undefined behavior.

> i think you get crash when you will read a ram area where ram is just not pinned
> (im not sure if on windows pages are just guarded from read, i understand write, execute, but read? is this the case?)

Read about virtual memory and 486 protected mode (yes 486, the model is
still valid) to get an idea about memory protection and process separation.

> that was one question second is

> how to catch and recover from this crash (i just want to write a tiny ram scanner who will try read all 32 bit ram area and will give me info back which areas i can read and which i cant)

Any program running at user space will never be able to read any memory
that it has not written itself before (directly or indirectly) or is
initialized by the kernel with zeros. Again read about process separation.

> (same possibly with write though im not sure if i will try read any byte vale then write say 0x55 to any byte then write back oryginal value it will calmly stand such crash test ;c will it?

If the page is not writable, it will crash.

Marcel

Paavo Helde <myfirstname@osa.pri.ee>: Aug 21 08:51AM +0300

On 21.08.2017 0:38, fir wrote:
> then depending on readed ram area it will work or crash ("there is a trouble with that aplication, aplication will be closed")

> i think you get crash when you will read a ram area where ram is just not pinned
> (im not sure if on windows pages are just guarded from read, i understand write, execute, but read? is this the case?)

The virtual memory space the process sees consists of memory pages which
are addressed indirectly through the special page tables. The problem is
not that the pages are readable, but that they are just missing (page
table entries not pointing to a normal memory page).

> that was one question second is

> how to catch and recover from this crash (i just want to write a tiny ram scanner who will try read all 32 bit ram area and will give me info back which areas i can read and which i cant)

In Windows and with MSVC you need to compile with /EHa option and use
catch(...) with ellipses in a C++ try block. Such a block will catch
page access violations and other stuff as an MSVC extension. In the
catch(...) block you can study the exception (by rethrowing it in the
special __try block and using GetExceptionCode() in the __except block)
to figure out if it is a normal C++ exception or a so-called "structured
exception" which covers page access errors.

Not recommended for production code, this is highly unportable and can
easily be misused for hiding program errors instead of fixing them.
Also, /EHa will make the whole program a bit slower.

If you just need to test the memory for readability or writability then
in Windows there are the IsBadReadPtr(), IsBadWritePtr() SDK functions.

Not recommended either, except for the kind of experimentation you seem
do be doing.

HTH
Paavo

fir <profesor.fir@gmail.com>: Aug 21 08:31AM -0700

W dniu poniedziałek, 21 sierpnia 2017 07:52:00 UTC+2 użytkownik Paavo Helde napisał:
> do be doing.

> HTH
> Paavo

do you know maybe how its in pure c (c has signal.h header and can register some kind of handler -
im to tired now to check it myself, but would like to know if some has easy way to know that)

most ideally i would like to get full info like

"MOV instruction for IP 0x0040_110c tried to read from adress 0x0000_0012 which is page guarded from read"

and then silently continue execution
thru all those reads until my app will go thru all bytes and succesfully finish execution

quaternion graphics in C or C-style C++?

SG <s.gesemann@gmail.com>: Aug 21 05:12AM -0700

On Sunday, August 20, 2017 at 6:58:21 AM UTC+2, David Melik wrote:

> So, I'd like to know, how can quaternions simplify this process? I
> recall they're something like a scalar on some (x,y,z) but forgot how
> that would seem to simplify any multiplication or iteration.

Quaternions are useful if you need a compact representation of a 3D
rotation matrix. Given a normalized quaternion q, it's rather easy to
determine a corresponding rotation matrix R

R = q2rot(q) (I won't bother defining q2rot)

And for every rotation matrix R there are two such quaternions:
q and -q. This q2rot functions has the following properties:

q2rot(q) = q2rot(-q) and

q2rot(a * b) = q2rot(a) * q2rot(b).

Given the last equality and the fact that multiplying quaternions is
cheaper than multiplying 3x3 rotation matrices, quaternions allow you
to efficiently multiply lots of 3D rotations together. So, if you
need to multiply lots of rotations, quaternions are going to be more
efficient for that.

However, if you want to apply the resulting rotation to a collection
of points (like the vertices of your cube), you should probably
convert the quaternion back to a 3x3 rotation matrix because this
matrix representation is more efficient for such things in terms of
number of necessary floating point operations.

I think that answers your question?

Cheers!
SG

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Monday, August 21, 2017

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

No comments:

Blog Archive

About Me