Sunday, May 30, 2021

Digest for comp.lang.c++@googlegroups.com - 16 updates in 5 topics

"daniel...@gmail.com" <danielaparker@gmail.com>: May 30 03:15PM -0700

On Saturday, May 29, 2021 at 5:47:50 AM UTC-4, David Brown wrote:
> The definition of intmax_t is a problem - it is a limitation for integer
> types in C and C++. Hopefully eventually deprecate intmax_t.
 
One proposal is to make intmax_t mean int64_t, and leave it at that.
Have no requirement that integer types can't be larger. No more ABI
problem.
 
> type-generic macros in C and template functions in C++. From C90 there
> was "abs" and "labs" - C99 could have skipped "llabs" and "imaxabs", and
> similar functions.
 
Yes, of course, and to_integer<T> and from_integer<T>, and others. Many libraries
have to reinvent their own version of these things.
 
> The gcc solution of __int128 covers most purposes without affecting
> backwards compatibility.
 
Hardly "most purposes", far from it. Without compiling with "-std=gnu++11",
you don't even have std::numeric_limits<__int128>. The absence of
standard support for int128_t makes genericity much harder. While other
languages such as rust with better type support see rapid growth
of open source libraries that cover all manner of data interchange
standards, C++ is comparatively stagnant.
 
Daniel
Bonita Montero <Bonita.Montero@gmail.com>: May 30 01:18PM +0200

I've just checked:
 
#include <iostream>
 
using namespace std;
 
int main()
{
int i, j, k;
auto f = [&]() -> int
{
return i + j + k;
};
cout << sizeof f << endl;
}
 
Can anyone tell me whether the lambda has three pointers (24 bytes
on 64 bit systems) instead of just one pointer inside the stack-frame,
which could be an easy optimization ?
"Öö Tiib" <ootiib@hot.ee>: May 30 10:00AM -0700

On Sunday, 30 May 2021 at 14:18:20 UTC+3, Bonita Montero wrote:
 
> Can anyone tell me whether the lambda has three pointers (24 bytes
> on 64 bit systems) instead of just one pointer inside the stack-frame,
> which could be an easy optimization ?
 
C++ standard does not require any optimizations there so the
question is about quality of implementation (QOI). QOI questions do
not make sense without mentioning implementation name and
version.
 
I suspect most compilers are turning your program into equivalent of
 
int main() { std::cout << 42 << std::endl; }
 
The 42 there being unlikely but valid by standard.
Even if you used it then most compilers would probably inline call of it
and so the number would be again meaningless.
Bonita Montero <Bonita.Montero@gmail.com>: May 30 07:20PM +0200

> version.
> I suspect most compilers are turning your program into equivalent of
> int main() { std::cout << 42 << std::endl; }
 
I think you don't understand what I'm asking for.
 
> The 42 there being unlikely but valid by standard.
 
No, 24.
And my question is why the compiler doesn't do the simple
optimization of storing just a single pointer inside the
stackframe inside the lambda-object. That would result in
less memory-accesses and it would save registers.
Real Troll <real.troll@trolls.com>: May 30 06:23PM +0100

On 30/05/2021 12:18, Bonita Montero wrote:
 
> Can anyone tell me whether the lambda has three pointers (24 bytes
> on 64 bit systems) instead of just one pointer inside the stack-frame,
> which could be an easy optimization ?
 
Yes it is 24 on 64 bit machine; Compiled with VS2019 using
 
> C:\Users\*******\Documents\cmdLine\cpp>cl /EHsc lambda01.cpp
> Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29915 for x64
> Copyright (C) Microsoft Corporation. All rights reserved.
 
German Manager helped by a German player won the Champions League for Chelsea ( a UK team) in Portugal!!! How wonderful it is.
Bonita Montero <Bonita.Montero@gmail.com>: May 30 07:34PM +0200

> Yes it is 24 on 64 bit machine; Compiled with VS2019 using
>> C:\Users\*******\Documents\cmdLine\cpp>cl /EHsc lambda01.cpp
 
It isn't different with -Ox, but it could have been.
David Brown <david.brown@hesbynett.no>: May 30 08:31PM +0200

On 30/05/2021 13:18, Bonita Montero wrote:
 
> Can anyone tell me whether the lambda has three pointers (24 bytes
> on 64 bit systems) instead of just one pointer inside the stack-frame,
> which could be an easy optimization ?
 
The question doesn't really make sense.
 
When optimising, a compiler will not generate anything for any of the
variables or the lambda.
 
When looking at this kind of thing, I like to use the online compiler at
<https://godbolt.org> and look at the assembly. This is easier if you
don't try to generate printed output:
 
 
int foo() {
int i, j, k;
 
auto f = [&]() {
return i + j + k;
};
return sizeof(f);
}
 
gcc generates:
 
foo():
movl $24, %eax
ret
 
So the compiler is trying to give you the size of storage it would need
in general for a lambda that took three references. But since the
optimised lambda is entirely removed, it is not the actual size of "f"
in the optimised code.
 
AFAIK the standard doesn't say anything about what size lambdas should
be, or anything else about their types. But I would guess that
compilers try to give consistent results for the sizeof of a lambda
regardless of the optimisation or details of the implementation.
Bonita Montero <Bonita.Montero@gmail.com>: May 30 08:46PM +0200

> in general for a lambda that took three references. But since the
> optimised lambda is entirely removed, it is not the actual size of "f"
> in the optimised code.
 
The compiler could also give the size of an optimized lambda.
 
Consider this:
 
#include <iostream>
#include <functional>
 
using namespace std;
 
int main()
{
int i = 123, j = 456, k = 789;
auto f = [&]() -> int
{
return i + j + k;
};
function<int()> ff = f;
function<int()> *volatile pFf = &ff;
cout << sizeof f << " " << (*pFf)() << endl;
}
 
I pack f into a function<> and then I assign the address to a volatile
pointer to prevent any optimizations on calling the function-object.
So according to what you suggesst the compiler would have the chance
to optimize the three references - but it doesn't.
"Öö Tiib" <ootiib@hot.ee>: May 30 12:26PM -0700

On Sunday, 30 May 2021 at 21:46:22 UTC+3, Bonita Montero wrote:
> function<int()> *volatile pFf = &ff;
> cout << sizeof f << " " << (*pFf)() << endl;
> }
 
Looks like horrible pile of garbage that still does nothing.
 
> pointer to prevent any optimizations on calling the function-object.
> So according to what you suggesst the compiler would have the chance
> to optimize the three references - but it doesn't.
 
If some optimization in some compiler of some feature is missing and
so your resulting program is slow then you should write less garbage
code that compiler can't optimize. Or you can take source code of
compiler, implement the optimization you need and put up a pull
request.
olcott <NoOne@NoWhere.com>: May 29 06:32PM -0500

I am cross-posting this to comp.lang.c and comp.lang.c++ because any
C/C++ professional can correctly answer it and the code is written in C.
 
#define u32 uint32_t
 
int Simulate(u32 P, u32 I)
{
((void(*)(u32))P)(I);
return 1;
}
 
int D(u32 P)
{
if ( H(P, P) )
return 0;
return 1;
}
 
int main()
{
H((u32)D, (u32)D);
}
 
H is simulating partial halt decider based on an x86 emulator. Its input
is the machine address of a C function that has been cast to 32-bit
unsigned integer. H simulates its first parameter on the input of its
second parameter. In the above case H would simulate D(D).
 
--
Copyright 2021 Pete Olcott
 
"Great spirits have always encountered violent opposition from mediocre
minds." Einstein
Bonita Montero <Bonita.Montero@gmail.com>: May 30 05:15AM +0200

STOP POSTING in comp.lang.c/c++.
red floyd <no.spam.here@its.invalid>: May 30 12:03PM -0700

On 5/29/2021 8:15 PM, Bonita Montero wrote:
> STOP POSTING in comp.lang.c/c++.
 
He's obviously not going to stop, so just killfile the idiot.
"Öö Tiib" <ootiib@hot.ee>: May 29 07:03PM -0700

On Saturday, 29 May 2021 at 20:15:56 UTC+3, Chris Vine wrote:
> a cancellation by including a catch-all in your checking function which
> logs that cancellation has begun and such other state as is available
> to it to record, and then rethrows.
 
Sorry for my bad English. I meant the thread to complete its work early or
to cancel it. The flags are for that purpose. I check those in known places
and so repeating same test the total count of such checks is same from
run to run.
 
> One happy outcome of using POSIX
> functions is that the only exception-like thing they can emit is a
> cancellation pseudo-exception. But perhaps you meant something else.
 
So you suggest I can mock the POSIX functions to work like always but
then throw sometimes something unusual for testing? It is plan but feels
like quite lot of work compared to mocking the flag checking.
 
> select() will signal it as ready for reading), and polled a flag on a
> select timeout, but just cancelling it proved much easier and more
> obvious.
 
OK, but how it is easier and more obvious? Indicating with flags and
letting the running work to decide itself where and how to complete
early feels most obvious split of responsibilities. Otherwise the
canceling thread has to know and monitor the work progress details
of threads that it can potentially cancel. That feels fragile and risky.
 
> I can recall using it to kill a thread waiting on
> pthread_join() but I cannot now remember the reasons why.
 
It can be the thread it was joining had gone insane and hung.
I prefer to abort whole process or power-cycle devices on such
cases if possible. That guarantees the programming errors are
fixed quickest and so insane processes and trashy devices do
least damage to customers. But I've met it more on Windows
where some closed source garbage I'm forced to use does
hang.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: May 30 11:48AM +0100

On Sat, 29 May 2021 19:03:08 -0700 (PDT)
> On Saturday, 29 May 2021 at 20:15:56 UTC+3, Chris Vine wrote:
> > On Sat, 29 May 2021 07:59:02 -0700 (PDT)
> > Öö Tiib <oot...@hot.ee> wrote:
[snip]
 
> So you suggest I can mock the POSIX functions to work like always but
> then throw sometimes something unusual for testing? It is plan but feels
> like quite lot of work compared to mocking the flag checking.
 
You want to know when and how many times a cancellation "request" (ie
flag change) has been made in respect of a mocked version of a blocking
function (that is, a function blocking on some event and/or a quit flag
request), possibly without carrying out any cancellation/quitting? The
"possibly without carrying out any cancellation/quitting" makes such
mocking unfeasible with thread cancellation, since once cancellation
has started you can catch and rethrow it (and instrument and count that)
but you cannot stop it. For that you would have to mock the function
which applies pthread_cancel instead.
 
> early feels most obvious split of responsibilities. Otherwise the
> canceling thread has to know and monitor the work progress details
> of threads that it can potentially cancel. That feels fragile and risky.
 
The cancelling thread doesn't need to know any of the work details of
the thread which is accepting connections. With deferred cancellation
the accepting thread is master of its own cancellation and (in the case
I have in mind) allows cancellation only when applying accept(), so
disallowing cancellation whenever accept() returns with a new
connection. Once a new connection occurs it completes the
establishment of the connection and hands off the new connection socket
for another thread to deal with in the normal way, during which time it
is uncancellable. When it has completed the hand off unhindered it
loops back to accept and makes itself cancellable again, and so on. It
would work the same way as having the accepting thread checking a flag
via a timeout and killing itself with an exception or in some other
way, but with much less faff.
 
The simple scheme I have described works on the basis that you may want
to terminate the accepting thread but not any thread(s) still dealing
with previously established connections. That may not always be what
you want but since every thread is master of its own cancellation, you
can arrange for the termination of other threads to occur in any way you
want, and (at the points where cancellation is allowed) you can save
work, clean-up, log etc. in an appropriate catch-all block which
rethrows when it has done the saving and clean-up. If the thread(s)
handling previously established connections are doing so asynchronously
via an event loop, you probably wouldn't use cancellation at all: you
would bring the relevant event loop(s) maintained by those threads to
an end. (That wasn't the position in the case I dealt with, but I can
easily imagine it could be.)
"Öö Tiib" <ootiib@hot.ee>: May 30 09:35AM -0700

On Sunday, 30 May 2021 at 13:48:55 UTC+3, Chris Vine wrote:
> flag change) has been made in respect of a mocked version of a blocking
> function (that is, a function blocking on some event and/or a quit flag
> request), possibly without carrying out any cancellation/quitting?
 
Maybe. Idiomatic case is that we want a work to end early as it has
become obsolete. Other idiomatic case is that we want thread to stop
early as whatever work it can possibly do has became obsolete.
With flags we can indicate that such or other case is now and then
have ending the work or stopping the thread early.
Also we want to test everything. By mocking flag checking
functions we can test it with granularity of one check. If only sole work
or whole thread is stopped early does not matter to the question as
we are testing outcome of such abrupt stop. If it didn't stop then we
have defect.
 
> has started you can catch and rethrow it (and instrument and count that)
> but you cannot stop it. For that you would have to mock the function
> which applies pthread_cancel instead.
 
So we have to have both flags and cancellation in place when same
thread is reused for different works? Or maybe flags and then if flags
do not work then cancellation as wheelchair to defective program?
 
> would work the same way as having the accepting thread checking a flag
> via a timeout and killing itself with an exception or in some other
> way, but with much less faff.
 
Sounds like some kind of master of itself suicide pattern. It is very
interesting as I've never needed that thing. What happens when it kills
itself? Does it signal someone to join it?
 
> want, and (at the points where cancellation is allowed) you can save
> work, clean-up, log etc. in an appropriate catch-all block which
> rethrows when it has done the saving and clean-up.
 
Ok so I can have the dead corpse of responsible removed. How
to try to figure what happened?
 
> would bring the relevant event loop(s) maintained by those threads to
> an end. (That wasn't the position in the case I dealt with, but I can
> easily imagine it could be.)
 
Threads are doing anything that can take time, anything that can take
time may become obsolete before it is completed. It can be let
to run to end and then obsolete product discarded or it may be stopped
early as performance optimization. Network communication is usually
behind bottle neck of available network cards and as our processors are
typically tremendously quicker just one thread can easily handle all
communication going through one network card.
Juha Nieminen <nospam@thanks.invalid>: May 30 06:40AM

>> each other) traversing set is awfully slow.
 
> A vector isn't cache-friendly either when you do a binary search.
> Random-access memory-accesses are always slow.
 
He was talking about traversing the set, not searching it.
In other words, for(auto& element: theSet).
 
(This, of course, assuming that the amount of data is so large
that it won't fit entirely even in the L3 cache. Or, if we are
just traversing the set for the first time since all of its contents
have been flushed from the caches.)
 
Of course even with std::vector it depends on the size of the
element. Traversing a (very large) std::vector linearly from
beginning to end isn't magically going to be very fast either,
if each element is large enough. And "large enough" is actually
quite small. If I remember correctly, cache line sizes are typically
64 bytes or so. This means that if the vector element type is an
object of size 64 bytes or more, and you are accessing just one member
variable of each object, then you'll get no benefit from linear
traversal compared to random access (in the case that the contents of
the vector are not already in the caches).
 
You only get a speed advantage for (very large) vectors which element
size is very small, like 4 or 8 bytes. For example, if the vector
represents a bitmap image, with each "pixel" element taking eg. 4 bytes,
then a linear traversal will be quite efficient (assuming none of the
vector contents were in any cache to begin with, you'll get an
extremely heavy cache miss only each 16 pixels.)
 
Of course almost none of this applies if the vector or set is
small enough to fit in L1 cache, and it has already been loaded in
there in its entirety previously. Then none of this matters almost
at all. It starts mattering a bit more if the vector is too large
for L1 but small enough for L2 cache, and furthermore if it's too
large for L2 but small enough for L3 cache.
 
Modern CPUs tend to have a quite large L3 cache, which mitigates
the problems of cache misses in many instances. For example my CPU
has a L3 cache of 12 MB. Thus if I need to, for example, do some
operations repeatedly to, let's say, an image that fits comfortably
within those 12 MB, then it will be very fast.
 
It's only when the dataset is much larger than L3 that cache locality
really starts having a very pronounced effect (when performing
operations repeatedly on the entire dataset).
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: