Thursday, October 24, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

Juha Nieminen <nospam@thanks.invalid>: Oct 24 08:10AM

> claus. Auto-vectorization only works for some predefined code-patterns
> the compiler knows. In most cases you would have to do the vectorization
> yourself through SIMD-intrinsics.
 
Have you even looked at what modern compilers like recent versions of
gcc and clang produce using automatic vectorization? Because I have.
 
It's not always perfect, but it's miles better than no vectorization
of any kind. Often they are able to do surprisingly complex automatic
vectorization of relatively complex structures.
 
Consider, for example, this kind of code:
 
//-------------------------------------------------------------------
#include <array>
 
struct Point4D
{
float x, y, z, w;
};
 
using Point4DVec = std::array<Point4D, 4>;
 
Point4DVec foo(const Point4DVec& v1, const Point4DVec& v2, float f)
{
Point4DVec result;
for(std::size_t i = 0; i < result.size(); ++i)
{
result[i].x = (v1[i].x - v2[i].x) * f;
result[i].y = (v1[i].y - v2[i].y) * f;
result[i].z = (v1[i].z - v2[i].z) * f;
result[i].w = (v1[i].w - v2[i].w) * f;
}
return result;
}
//-------------------------------------------------------------------
 
gcc compiles that to:
 
vbroadcastss ymm2, xmm0
vmovups ymm3, YMMWORD PTR [rsi]
vmovups ymm0, YMMWORD PTR [rsi+32]
vsubps ymm1, ymm3, YMMWORD PTR [rdx]
vsubps ymm0, ymm0, YMMWORD PTR [rdx+32]
mov rax, rdi
vmulps ymm1, ymm1, ymm2
vmulps ymm0, ymm0, ymm2
vmovups YMMWORD PTR [rdi], ymm1
vmovups YMMWORD PTR [rdi+32], ymm0
vzeroupper
ret
Bonita Montero <Bonita.Montero@gmail.com>: Oct 24 11:01AM +0200

> Have you even looked at what modern compilers like recent versions of
> gcc and clang produce using automatic vectorization? Because I have.
 
Have you checked the compiler-docs? They document which code-patterns
they can detect and vectorize. That are only very special patterns.
David Brown <david.brown@hesbynett.no>: Oct 24 12:51PM +0200

On 24/10/2019 11:01, Bonita Montero wrote:
>> gcc and clang produce using automatic vectorization? Because I have.
 
> Have you checked the compiler-docs? They document which code-patterns
> they can detect and vectorize. That are only very special patterns.
 
Can you give links to these documentation pages for gcc and clang?
Bonita Montero <Bonita.Montero@gmail.com>: Oct 24 01:37PM +0200

> Can you give links to these documentation pages for gcc and clang?
 
No, but I'll bet they're not cleverer as Intel-C++:
https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-automatic-vectorization-overview
David Brown <david.brown@hesbynett.no>: Oct 24 02:14PM +0200

On 24/10/2019 13:37, Bonita Montero wrote:
>> Can you give links to these documentation pages for gcc and clang?
 
> No, but I'll bet they're not cleverer as Intel-C++:
> https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-automatic-vectorization-overview
 
So when you wrote:
 
> Have you checked the compiler-docs? They document which code-patterns
> they can detect and vectorize. That are only very special patterns.
 
in response to a comment about gcc and clang auto-vectorisation, you
really didn't know what you were talking about?
 
Both clang and gcc have vectorised code like Juha's for many years - and
both produce a great deal better code for it than ICC in my brief test.
ICC does not vectorise the code at all.
 
(That may, of course, vary for different code samples - and it may be
dependent on compiler flags.) Give it a shot on <https://godbolt.org>
yourself.
 
 
And while it is certainly the case that compilers can only
auto-vectorise certain types of code, it is /not/ the case that it is so
limited that these tools list the code patterns in their documentation.
All they do is provide some tips and suggestions about how to increase
the likelihood of automatic vectorisation.
 
 
Why not try investigating what people write in posts, rather than
mocking them from a position of ignorance?
"Öö Tiib" <ootiib@hot.ee>: Oct 24 05:20AM -0700

On Thursday, 24 October 2019 15:14:35 UTC+3, David Brown wrote:
 
> Why not try investigating what people write in posts, rather than
> mocking them from a position of ignorance?
 
As described by social psychologists David Dunning and Justin
Kruger, the cognitive bias of illusory superiority results from an
internal illusion in people of low ability and from an external
misperception in people of high ability; that is, "the miscalibration
of the incompetent stems from an error about the self, whereas
the miscalibration of the highly competent stems from an error
about others."
https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Bonita Montero <Bonita.Montero@gmail.com>: Oct 24 04:16PM +0200

> ICC does not vectorise the code at all.
 
Check the ICC documentation ...
Bonita Montero <Bonita.Montero@gmail.com>: Oct 24 04:23PM +0200

https://gcc.gnu.org/projects/tree-ssa/vectorization.html
Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 24 08:27AM -0700

On Wednesday, October 23, 2019 at 6:18:17 PM UTC+1, James Kuyper wrote:
 
> reaction that you've provoked corresponding to a fish taking the bait.
> It really doesn't matter whether the bait is "wrong" or "right" in any
> absolute sense, only that people will react to it.
 
 
When practising etymology, it's quite easy to come to wrong conclusions.
 
I think it's likely that the modern use of "troll" on internet forums actually comes from the doll: https://en.wikipedia.org/wiki/Troll_doll
 
This makes the most sense to me out of all the possibilities (including your angling one).
scott@slp53.sl.home (Scott Lurndal): Oct 24 03:50PM

>On Wednesday, October 23, 2019 at 6:18:17 PM UTC+1, James Kuyper wrote:
 
>I think it's likely that the modern use of "troll" on internet forums actually comes from the doll: https://en.wikipedia.org/wiki/Troll_doll
 
>This makes the most sense to me out of all the possibilities (including your angling one).
 
I've been on usenet for almost forty years; It has always been well understood that the
term derives from the fishing usage.
Juha Nieminen <nospam@thanks.invalid>: Oct 24 07:49AM


> The small_vector has also preallocated capacity in it but can
> grow over that using an allocator:
> <https://www.boost.org/doc/libs/1_71_0/doc/html/boost/container/small_vector.html>
 
I am talking about an allocator (usable with any standard library container),
not a std::vector replacement.
"Öö Tiib" <ootiib@hot.ee>: Oct 24 02:41AM -0700

On Thursday, 24 October 2019 10:50:10 UTC+3, Juha Nieminen wrote:
> > <https://www.boost.org/doc/libs/1_71_0/doc/html/boost/container/small_vector.html>
 
> I am talking about an allocator (usable with any standard library container),
> not a std::vector replacement.
 
Yes you talk about concrete solution to problem that (I assume) is
low performance of using dynamic memory. The typical solutions
for reducing dynamic allocations (like using std::vector::reserve or std::unordered_set::reserve) are not good enough. So special
containers optimized to use dynamic storage management
provided by allocator not at all or very rarely are likely as good as
it can get. I don't think there can be special allocator that gives
same effect with stock containers.
Soviet_Mario <SovietMario@CCCP.MIR>: Oct 24 01:33PM +0200

Il 24/10/19 00:15, Keith Thompson ha scritto:
> means that the elements will have the same storage duration as the
> std::array object itself. (That's why the number of elements in
> a std::array has to be known at compile time.)
 
intresting. Actually I must admit I confused VECTOR for ARRAY :\
 
 
> A std::array object is essentially a structure with an array as
> its only member.
 
understood. Tnx
 
 
--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)
Soviet_Mario <SovietMario@CCCP.MIR>: Oct 24 01:36PM +0200

Il 24/10/19 01:19, Mr Flibble ha scritto:
> optimization" whereby small strings are allocated within the
> object itself.  I have also created a "vecarray" container
> that does a similar thing for vectors.
 
uhm, strange ... it seems to me that such a solution may be
worse than the problem
 
I mean : 1) there is no a precise definition of SMALLness, a
known mandatory threshold to know in advance
2) is the optimization mandatory ? If not, even worser with
regard to the address (and storage type) of actual data.
 
we could fall in a scenario in which a small STATIC
std::string could have both the members and data static and
another instance not, just the members and data elsewhere,
without possibility to know in advance.
 
 
 
--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)
David Brown <david.brown@hesbynett.no>: Oct 24 02:25PM +0200

On 24/10/2019 13:36, Soviet_Mario wrote:
> the problem
 
> I mean : 1) there is no a precise definition of SMALLness, a known
> mandatory threshold to know in advance
 
Yes. The ideal size of "smallness" will depend on many factors. But
some ground rules can be established. You will always need "metadata"
in the immediate part of the type - holding things like the current
count of items (vector elements, string bytes, etc.), a pointer to the
data part (typically on the heap), and the size of the allocated part.
On a 64-bit target, that would be 3 * 8 = 24 bytes. If the size of the
string, or the vector, is smaller than 24 bytes, then it makes sense to
store it locally within the object itself.
 
Whether you then say "24 bytes" is your "smallness", or pick something
else will depend on how you are using these types. A bigger "smallness"
will mean more wasted ram if your vectors/strings are not typically
"small" (or if they are typically very small). But heap allocated
memory usually has a minimum size anyway, and allocation on the stack is
very much faster than allocation on the heap. So you might pick a
"smallness" that gives a total object size of 64 bytes, for example - if
you keep the objects aligned they will match cache lines on many cpu's,
and it might even be possible to pass them around in large SIMD registers.
 
> 2) is the optimization mandatory ? If not, even worser with regard to
> the address (and storage type) of actual data.
 
That is up to the implementer of the classes. But "small string" or
"small data" optimisations are usually a win.
 
 
> we could fall in a scenario in which a small STATIC std::string could
> have both the members and data static and another instance not, just the
> members and data elsewhere, without possibility to know in advance.
 
Yes. So what?
Paavo Helde <myfirstname@osa.pri.ee>: Oct 24 03:28PM +0300

On 24.10.2019 14:36, Soviet_Mario wrote:
>> vectors.
 
> uhm, strange ... it seems to me that such a solution may be worse than
> the problem
 
What's the problem you are talking about?
 
Small string optimization is pretty much a perfect optimization as it
avoids dynamic allocation calls and provides better memory locality,
these are both a big deal nowadays. And having an optimization which
provides better performance in some situations is better than not having
an optimization at all.
 
> I mean : 1) there is no a precise definition of SMALLness, a known
> mandatory threshold to know in advance
 
What would you do with that knowledge? Shorten your strings? Smaller
strings are always faster to process, so if you could use smaller
strings in your code you would have already done that (assuming the
speed is so critical).
 
 
> we could fall in a scenario in which a small STATIC std::string could
> have both the members and data static and another instance not, just the
> members and data elsewhere, without possibility to know in advance.
 
So what? If this happens, then presumably this comes from a shortening a
long string, meaning that the initial dynamic allocation call had
already happened and could not be anyway avoided any more.
Paavo Helde <myfirstname@osa.pri.ee>: Oct 24 04:46PM +0300

On 24.10.2019 14:36, Soviet_Mario wrote:
>> whereby small strings are allocated within the object itself.
 
> I mean : 1) there is no a precise definition of SMALLness, a known
> mandatory threshold to know in advance
 
The upper limit for "small string" length can be easily found out as
sizeof(std::string)-1 (-1 for the zero terminator byte, which can in
principle also serve as a marker that SSO is in use).
"Öö Tiib" <ootiib@hot.ee>: Oct 24 06:58AM -0700

On Thursday, 24 October 2019 16:46:57 UTC+3, Paavo Helde wrote:
 
> The upper limit for "small string" length can be easily found out as
> sizeof(std::string)-1 (-1 for the zero terminator byte, which can in
> principle also serve as a marker that SSO is in use).
 
I think it is -2. Note that the std::string may contain zero bytes.
IOW ...:
 
std::string s("\0\0test", 6);
 
... has to work and after that ...
 
assert(s.length() == strlen(s.c_str()));
 
... does not hold. So one byte is needed for length of
short string and other is needed for zero terminator.
Paavo Helde <myfirstname@osa.pri.ee>: Oct 24 05:21PM +0300

On 24.10.2019 16:58, Öö Tiib wrote:
 
> assert(s.length() == strlen(s.c_str()));
 
> ... does not hold. So one byte is needed for length of
> short string and other is needed for zero terminator.
 
Right, thanks for the correction!
 
Storing the length separately would also mean that it's not needed to
calculate it as if by strlen() each time when needed (although strlen()
should also by very fast on such small strings).
Bo Persson <bo@bo-persson.se>: Oct 24 05:27PM +0200

On 2019-10-24 at 16:21, Paavo Helde wrote:
 
> Storing the length separately would also mean that it's not needed to
> calculate it as if by strlen() each time when needed (although strlen()
> should also by very fast on such small strings).
 
If you want to force it, you can store "unused space" instead of "size"
in the last byte. Then that byte happens to be zero when the space is
full. :-)
 
https://github.com/elliotgoodrich/SSO-23
 
For various reasons, like preferring an all-zero init being the empty
string, this is not used by the major implementations.
 
 
Bo Persson
Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 24 05:38AM

On Wed, 2019-10-23, Chris M. Thomasson wrote:
> non-recursive mutex twice in the same thread is an error. This can be
> solved multiple ways. The easy way is to create the Supervisor as a
> service/ daemon. Robust mutexs can be used to detect when a process dies.
 
The software (as I understand it) is Unix-specific; standard practice
there is to have parent--child process relationships, and to use the
wait() family of functions to detect when a child dies.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 24 08:46AM

On Wed, 2019-10-23, Frederick Gotham wrote:
>> the job.
 
> I don't want to complicate the Supervisor, and so I'm happy to put
> the gas_monitor process to sleep.
 
In a way you /are/ complicating it already, by stretching the meaning
of "supervise". Especially if there is some kind of heartbeat
mechanism too, but maybe there isn't.
 
If you /really/ want to sleep forever and consume minimal resources on
Unix, the optimal way would be to close all file descriptors, release
anything else which would survive an exec(), and exec() a tiny
'sleep_forever' binary.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
scott@slp53.sl.home (Scott Lurndal): Oct 24 01:40PM


>The software (as I understand it) is Unix-specific; standard practice
>there is to have parent--child process relationships, and to use the
>wait() family of functions to detect when a child dies.
 
Or poll periodically using kill(pid, 0); kill will return ESRCH if
the process doesn't exist.
 
Using SIGSTOP to suspend a process (either in-context using raise(3)
or externally from a monitor process using kill(2)) is the most efficient
way of pausing a process.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 23 07:42PM -0700

On 9/26/2019 2:13 AM, Bonita Montero wrote:
> CMPXCHGs on three 64 bit values (in a 64-bit-system, in a 32-bit
> system you would have two 64- and one 32-bit exchange) if there's
> no collision.
 
Are you using an embedded version count within the data the CAS's work
on? Like an ABA counter? If a CAS fails, how far back do you have to
restart, or unroll if you will?
 
 
 
 
I'm asking myself if this would be faster with trans-
Bonita Montero <Bonita.Montero@gmail.com>: Oct 24 11:00AM +0200


> Are you using an embedded version count within the data the CAS's work
> on? Like an ABA counter? If a CAS fails, how far back do you have to
> restart, or unroll if you will?
 
No, I'm not using ABA-counters.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: