soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Why does this work on Xcode ??? - 2 Updates
Lock-free LRU-cache-algorithm - 8 Updates
Performance of unaligned memory-accesses - 6 Updates
Adding compilation arguments to 4000 make files. . . - 1 Update
Why can't I understand what coroutines are? - 7 Updates
[announcement] jsoncons v0.132.0 released - 1 Update

Tim Rentsch <tr.17687@z991.linuxsc.com>: Aug 10 09:29AM -0700

> My point is, I'm not sure the students will appreciate recursion
> properly based on that code, if that's the intention. (Perhaps not
> based on any C++ code.)

These comments strike me as somewhat odd. Certainly it is true
that recursive functions often offer a good fit to recursive data
types. That doesn't mean recursion is a bad fit in other cases.
On the contrary, using recursion often gives a definition that is
simpler and easier to understand than a non-recursive version. A
well-known example is finding the greatest common divisor of two
numbers, which might be written as follows (with gcd(0,0) to
return the value 1):

unsigned
gcd( unsigned a, unsigned b ){
return a == 0 ? b != 0 ? b : 1
: gcd( b%a, a );
}

Writing a nicely tail-recursive function often entails passing
arguments besides just the primary data item. This happens with
arrays in C and C++ because arrays are not recursive data types in
C or C++ (nor are they in many functional languages). Furthermore
we may want not one additional "cursor" but more than one. Here
is a function to determine if a sorted array contains a given
value:

bool
contains( int *v, size_t base, size_t limit, int item ){
if( base == limit ) return false;

size_t mid = base + (limit-base)/2;

if( v[mid] < item ) return contains( v, mid+1, limit, item );
if( v[mid] > item ) return contains( v, base, mid, item );

return true;
}

In later versions of C++ we might choose to write this function
differently, using 'span' (disclaimer: I am not an expert on
span!). Doing that would very likely give a nicer function. But
I wouldn't say the non-span version is "inelegant". It's just a
reflection of how the language-provided data types work. Do I
remember right that you are one of the people who is still
working (or forced to work) with C++11? Anyone in that boat
might choose a function definition like the one above, until such
time as better language facilities are available so it could be
re-written more attractively.

I guess my main point is that recursion is more broadly applicable
than just those situations where recursive data types are
involved. In fact, in languages like C and C++, it is much more
likely to encounter a good fit for recursion that does not make
use of recursive data types, simply because C and C++ typically do
not define data types recursively. Recursion is more fundamental
than just the cases where recursive data types are used.

Tim Rentsch <tr.17687@z991.linuxsc.com>: Aug 10 11:18AM -0700

> For portable code it doesn't depend: you just have no guarantee and
> hence risk UB.

> And a risk of UB = UB.

Consider the following two function definitions:

#include <array>

using SomeType = double;
const unsigned long SomeFixedSize = 137;
using Array = ::std::array< SomeType, SomeFixedSize >;

bool
iterative_contains( Array & a, unsigned n, SomeType & v ){
while( n > 0 ){
if( a[n-1] == v ) return true;
n--;
}
return false;
}

bool
recursive_contains( Array & a, unsigned n, SomeType & v ){
return ! (n > 0) ? false
: a[n-1] == v ? true
: recursive_contains( a, n-1, v );
}

As far as the Standard is concerned these two functions are
semantically equivalent. They are equally well-defined. They
have exactly the same implications for observable behavior. If
one has undefined behavior then so does the other. If one has a
risk of undefined behavior then so does the other. A conforming
implementation may choose to compile an iterative definition into
recursive object code, as well as vice versa. So if you are going
to say the C++ standard gives you no guarantee about whether the
recursive function might result in undefined behavior, you should
also say the C++ standard gives you no guarantee about whether the
iterative function might result in undefined behavior.

> Modulo a number of caveats, yes. [...]

> * That for a debug build the compiler can't optimize away calls.
> Because you can't trace into a call if it's optimized away.

That depends what you mean by "debug build". In fact g++ will
compile recursive definitions into iterative object code even
when a -g or -ggdb option is given.

Lock-free LRU-cache-algorithm

"Chris M. Thomasson " <ahh_f_it@crap.nothing>: Aug 10 02:09AM -0700

On 8/8/2019 11:59 PM, Bonita Montero wrote:
> look like? LRU-caches are only possible with doubly-linked lists.
> So I thought this woudn't be possible lock-free. But maybe I'm
> wrong here and someone can give me an idea.

Perhaps a lock-free deque?

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Aug 10 02:12AM -0700

On 8/10/2019 2:09 AM, Chris M. Thomasson wrote:
>> So I thought this woudn't be possible lock-free. But maybe I'm
>> wrong here and someone can give me an idea.

> Perhaps a lock-free deque?

Maybe:

https://pdfs.semanticscholar.org/8a68/f45bd32ed050a96faa24139ab71178258f13.pdf

Its pretty glaringly.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 11:42AM +0200

>> So I thought this woudn't be possible lock-free. But maybe I'm
>> wrong here and someone can give me an idea.

> Perhaps a lock-free deque?

No, I want to move an arbitrary element of a doubly linked list
to the top.

"Öö Tiib" <ootiib@hot.ee>: Aug 10 03:48AM -0700

On Saturday, 10 August 2019 12:42:22 UTC+3, Bonita Montero wrote:

> > Perhaps a lock-free deque?

> No, I want to move an arbitrary element of a doubly linked list
> to the top.

You anyway have some rainbow tree or hash table to find that
element that represents cached resource. The element being also
in linked list (having those prev/next pointers) is irrelevant
for consumers of cached resources. So you may leave the prev/next
pointers to be managed by single thread. When there is only single
thread that accesses those then task is "embarrassingly parallel"
IOW locks are not needed, IOW best "lock-free" there can be.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 01:21PM +0200

> pointers to be managed by single thread. When there is only single
> thread that accesses those then task is "embarrassingly parallel"
> IOW locks are not needed, IOW best "lock-free" there can be.

No, the element is found by multiple threads accessing thr hashtable.
I think the buckets could be partitioned with each parition having
a single mutex; and te number of mutexes should be a fair multiple
of the number of HW-threads so there is a low likehood of a collision.
But each thread will push the found cache indivually to the top, so
there's no central thread doing this.
Intel's transactional memory incarnation RTM would be perfectly
suitable for that, also IBMs equivalent incarnation since the
POWE8-CPUs. But AMD-CPUs lack such a feature since they have this
bain-damaged exclusive cache-architecture betwen L1/L2 (both
inclusive) and L3 which doesn't allow this versioning which gets
the rollback from the L3-cache.

"Öö Tiib" <ootiib@hot.ee>: Aug 10 05:30AM -0700

On Saturday, 10 August 2019 14:21:39 UTC+3, Bonita Montero wrote:
> of the number of HW-threads so there is a low likehood of a collision.
> But each thread will push the found cache indivually to the top, so
> there's no central thread doing this.

But the threads could forward that task (of pushing the found element
to top) to single thread. It is technically management of those
prev/next pointers that the cache users don't care about anyway.
Forwarding can be done for example by using lock-free deque and
so it is achieved that whole thing is entirely lock-free.
If it is not good enough for you then perhaps read the paper
that Chris M. Thomasson posted or search for similar.

> bain-damaged exclusive cache-architecture betwen L1/L2 (both
> inclusive) and L3 which doesn't allow this versioning which gets
> the rollback from the L3-cache.

Processor cache architecture is mostly abstracted out of reach on
level of C++. We can adjust the behavior of our software a bit
and rest of it ... is topical in comp.arch.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 03:52PM +0200

> element to top) to single thread. It is technically management
> of those prev/next pointers that the cache users don't care
> about anyway. ...

That would be totally brain-damaged because locking the linked
list and pushing the list-entry to the top by the thread that
accesses a certain block and thereby attributing it at the most
recent block would be more efficient. That's just because the
blocks aren't managed by a single thread but they're touched
shared.

> Processor cache architecture is mostly abstracted out of reach
> on level of C++. We can adjust the behavior of our software a
> bit and rest of it ... is topical in comp.arch.

Hey, transactional memory was one of the candidate-topics for C++20!
And the HLE-flavor of transactional memory which Intel implemnts (and
I recenctly wonderted that IBM implemented the same since the POWER8)
perfectly fits with the usual mutex-locking of C++. It's just that
HLE is more appropriate with mutex-locking with a small scope - like
in this example.

"Öö Tiib" <ootiib@hot.ee>: Aug 10 09:40AM -0700

On Saturday, 10 August 2019 16:52:12 UTC+3, Bonita Montero wrote:
> list and pushing the list-entry to the top by the thread that
> accesses a certain block and thereby attributing it at the most
> recent block would be more efficient.

A brain has to be rather calcified to receive total damage from
simple idea how to get rid of constant race over top of list.

Performance of unaligned memory-accesses

Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 08:55AM +0200

> struct {char a,b,c,d;}
> where char is 8 bits. What happens when you run this on a machine
> which only has 32-bit word-addressable memory?

On these machines sizeof(char) is also 1. And all larger types are
a multiple of this. So there can be also a misalignment in this way
if the compiler supports this. I.e. a 32-bit-value might be directly
followed by a 64-bit value.

> #pragma pack(1)
> struct {char a; int b;}
> on a machine that doesn't support unaligned accesses in hardware.

I alread told in another posting here that if a compiler supports
#pragma pack it supports generating correct code; but this doesn't
mean that the CPU / OS supports this at runtime. F.e. Linux on SPARC
emulates unaligned acesses through trapping whereas on Solaros on
SPARC this depends on how you compile your code.

> (void*)0x12345; // ie. non-aligned pointer
> The compiler might assume that p points to an aligned address. If the
> hardware doesn't support unaligned, what would happen when *p is evaluated?

No, the compiler might not assume this. It simply generates a correct
load suitable for this. But if it works at runtime depends on the CPU
(on Intel-CPUs this always works) and on other systems on the combina-
tion off CPU and OS.

David Brown <david.brown@hesbynett.no>: Aug 10 05:10PM +0200

On 09/08/2019 17:43, Bart wrote:
>> is to be MSVC-compatible here.

> Being able to define such a layout doesn't necessarily mean that
> unaligned accesses are supported by hardware.

More importantly here, it also does not mean you can take pointers to
the unaligned members and use them as normal pointers, nor does it mean
that the compiler supports unaligned access in other cases - /even/ if
the hardware supports it.

On platforms that don't support unaligned access in the hardware,
accessing members of a "packed" struct will give you byte-by-byte
instructions. No guarantees are usually given if you take a pointer to
the member and work from there.

David Brown <david.brown@hesbynett.no>: Aug 10 05:21PM +0200

On 10/08/2019 00:28, Bart wrote:
>>> unaligned accesses are supported by hardware.

>> Ok, you're soooo right.
>> #pragma pack is supported without making sense.

Bart /is/ right. Lots of processors don't support unaligned access, and
compilers for them will usually still have some type of "packed" struct
solution.

>    struct {char a,b,c,d;}

> where char is 8 bits. What happens when you run this on a machine which
> only has 32-bit word-addressable memory?

If the system only has 32-bit word addressable memory, then usually char
will be 32-bit (you get this in some DSP's). Alternatively, it can
emulate access to smaller objects - this may even mean that char*
pointers are bigger than int* pointers. Either way, "pack" will not
affect the layout of a struct containing only char members.

>     int *p = f();

> f (in code not visible from the call-site) returns this:

>    (void*)0x12345;      // ie. non-aligned pointer

No, C does not allow that - converting a value into a pointer type that
is not properly aligned is undefined behaviour. You can use other lower
level methods, such as memcpy, to end up with an unaligned int* pointer
- but dereferencing it will be undefined behaviour.

That is true regardless of whether or not the hardware supports
unaligned pointers.

(Of course any C implementation could choose to define such behaviour in
the obvious way - but I have not seen any compiler document such behaviour.)

Note that it is very dangerous to talk about "code not visible from the
call site". The trend in compilation is that more and more is visible,
with cross-unit compilation and optimisation. Techniques that used to
"work" by hiding information from the compiler no longer work when you
have link-time optimisation and similar systems.

> The compiler might assume that p points to an aligned address. If the
> hardware doesn't support unaligned, what would happen when *p is evaluated?

The compiler can always assume "p" is aligned properly, and use that
fact for optimisation or code re-arrangement, regardless of unaligned
access support in the hardware.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 05:21PM +0200

> More importantly here, it also does not mean you can take pointers
> to the unaligned members and use them as normal pointers, ...

Quote the part of the standard that says that.

> ..., nor does it mean that the compiler supports unaligned access
> in other cases - /even/ if the hardware supports it.

Which compiler does specify this? Show me a link to the documentation.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 05:25PM +0200

> Bart /is/ right. Lots of processors don't support unaligned access,
> and compilers for them will usually still have some type of "packed"
> struct solution.

Yes, they generate the correct code and let the rest depend on the CPU
or the OS (trapping).

> No, C does not allow that - converting a value into a pointer type that
> is not properly aligned is undefined behaviour.

This is done only on embedded-platforms with fixed adresses of hardware
-registers. So you're in the context of a certain compiler and hardware.
There you don't have to worry about what the standard says.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 05:31PM +0200

>> More importantly here, it also does not mean you can take pointers
>> to the unaligned members and use them as normal pointers, ...

> Quote the part of the standard that says that.

Oh, I wasn't correct here: since #pragma pack is a compiler-specific
feature it sould be specified by the compiler-documentation that it
isn't legal to take the address of an unaligned member.
So show me some compilers documentations that say that it isn't
legal to take the address of an unaligned member!

Adding compilation arguments to 4000 make files. . .

David Brown <david.brown@hesbynett.no>: Aug 10 05:06PM +0200

On 09/08/2019 16:36, Scott Lurndal wrote:

> That paper has never been convincing to me, nor anyone that I've
> ever worked with, nor any unix or linux os or other large sourcebase
> that I've worked on professionally.

I have had enough trouble dealing with projects that contain separate
Makefiles for different parts, yet depend on odd ways across parts, to
be convinced by the paper. Far too often I've to do "make clean" before
a "make" because of badly coordinated sub-makes.

I am sure it is /possible/ to have large systems with recursive makes
that work well. Equally, I am sure that often such systems are not done
well.

Why can't I understand what coroutines are?

Tim Rentsch <tr.17687@z991.linuxsc.com>: Aug 09 06:11PM -0700

> Even if the restriction is a special case of coroutines, I think
> it can be argued that "coroutine" is a misnomer for the C++20
> thing. :-) [...]

Yes, and it's an easy argument to make. The C++ construct I have
seen described bears almost no resemblance to coroutines as they
were originally described (I first read about them in Knuth v. 1).
A C++20 "coroutine" is more like a detachable, resumable function,
not at all like the symmetric arrangement of mutually calling
functions described in Knuth.

To be fair, it's probably worth mentioning that in those days (and
especially in Knuth's TAOCP), call linkage was a lot more static,
and what we now take for granted in the way of automatic storage
was often nowhere to be seen in many program environments. Call
stack? Local variables? Reentrant functions? These elements are
pretty much not found in environments where coroutines were
applied, and vice versa. It's hard to unify these two kinds of
control structures. So it isn't surprising that "coroutine" takes
on a rather different form in C++, compared to what coroutines
were originally.

Tim Rentsch <tr.17687@z991.linuxsc.com>: Aug 09 07:49PM -0700

> if( p>left ) { nodes.push( p->left ); }
> }
> }

The function body can be simplified thus:

stack<Node*> nodes;
nodes.push( root );
while( not nodes.empty() ) {
const auto p = nodes.top();
nodes.pop();
if( !p ) continue;
co_yield p->data;
nodes.push( p->right );
nodes.push( p->left );
}

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 10 05:13AM +0200

On 10.08.2019 04:49, Tim Rentsch wrote:
> nodes.push( p->right );
> nodes.push( p->left );
> }

Thanks.

Don't know why I wrote that so complicated, with no nullpointers pushed
except possibly the root. But I guess I was just transforming the
original code, which had those conditional pushes.

Cheers!,

- Alf

Christian Gollwitzer <auriocus@gmx.de>: Aug 10 08:23AM +0200

Am 05.08.19 um 23:09 schrieb Alf P. Steinbach:

>> voilá.

> As I understand it this generates a new coroutine instance, with
> dynamically allocated state, for each call.

yes, of course.

> if( p>left ) { nodes.push( p->left ); }
> }
> }

This code now manages the stack manually and is half-way to the
transformation into the iterative versino which you need to write
*without* coroutines.

I'm not sure my message was received, I try again. So consider, you
decide to go from breadth-first traversal to in-order traversal.

Coroutine:

void bftraverse(Tree* root) {
if (root->left) [<-] bftraverse(root->left);
co_yield(root->content);
if (root->right) [<-] bftraverse(root->right);
}

The only thing that has changed, is the order of the yield instructions.
(plus I added the forgotten [<-] thing as shown in the TR paper, which
means the same as "yield from" in Python)

How to transform your unfolded code? Let's replace:

> co_yield p->data;
> if( p->right ) { nodes.push( p->right ); }
> if( p>left ) { nodes.push( p->left ); }

by

> if( p->right ) { nodes.push( p->right ); }
> co_yield p->data;
> if( p>left ) { nodes.push( p->left ); }

and surprisingly, it is still doing breadth-first search. Transforming
this correctly means either a different stack management, or a special
algorithm invented by geniuses which can walks along the tree leafs-

The point of coroutines is not to maximize efficieny, but to write code
*as if* it were the original simple, clear algorithm, but resulting in a
state machine. For some cases this can actually improve efficiency, e.g
for asynchronous network code, because the manual async thing is so
complex that noone would do it.

The price to pay is actually not that high. Your stack uses one node*
for the coroutine state. The compiler-generated version uses all local
variables (just Node*) plus an instruction pointer, so yes it is more,
but not that much. Gor Nishanov actually claimed in his talk, that VC++
coroutines are efficient enough for asynchronous access to *main memory*

https://www.youtube.com/watch?v=j9tlJAqMV7U

Christian

"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Aug 10 12:19AM -0700

On 8/5/2019 1:26 PM, Christian Gollwitzer wrote:
> Am 05.08.19 um 10:42 schrieb Juha Nieminen:
[...]
> if (root->left) bftraverse(root->left);
> if (root->right) bftraverse(root->right);
> }

Looks nice to me.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 10 10:43AM +0200

On 10.08.2019 08:23, Christian Gollwitzer wrote:
> *without* coroutines.

> I'm not sure my message was received, I try again. So consider, you
> decide to go from breadth-first traversal to in-order traversal.

I think you mean go from pre-order to in-order.

> The only thing that has changed, is the order of the yield instructions.
> (plus I added the forgotten [<-] thing as shown in the TR paper, which
> means the same as "yield from" in Python)

I haven't yet stumbled upon that notation, so I think it must be
something that once was proposed but has been rejected?

Anyway, I'm now aware that for C++20 coroutines one needs a return type
that implements a lot of stuff. With Visual C++ that's apparently
provided by `std::future`. I don't know about the standard.

> and surprisingly, it is still doing breadth-first search. Transforming
> this correctly means either a different stack management, or a special
> algorithm invented by geniuses which can walks along the tree leafs-

Iterative in-order isn't that hard.

After having output (or whatever) the left subtree of node N, one
outputs N.data and registers a sufficient subset of nodes in the right
subtree in the data structure, which can be a stack, so that the same
general process of popping out a node, displaying it, and registering
some more, can work.

What's necessary then is just that the first node popped out and thus
next output, should be the leftmost leaf of the right subtree. To do
that it's necessary to register all the nodes down to that left. I.e., a
loop, to register (push) the nodes down the chain to that leaf node.

void process_tree_node( const Node& node )
{
cout << node.data;
}

void process_tree_nodes_inorder( const Type_<const Node*> p_root )
{
stack<const Node*> nodes;
const Node* p_current = p_root;
for( ;; ) {
while( p_current ) {
nodes.push( p_current );
p_current = p_current->left;
}
if( is_empty( nodes ) ) {
break;
}
p_current = popped_top( nodes );
process_tree_node( *p_current );
p_current = p_current->right;
}
}

> but not that much. Gor Nishanov actually claimed in his talk, that VC++
> coroutines are efficient enough for asynchronous access to *main memory*

> https://www.youtube.com/watch?v=j9tlJAqMV7U

Cheers!,

- Alf

Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 10:50AM +0200

> functors), with the difference that the coroutine can be "called" again
> in such a manner that it continues from where it "returned" last time,
> rather than always from the beginning (which would happen with lambdas).

The difference is that the coroutine can resume its work after
returning and the coroutine can maintain its state between returns.

[announcement] jsoncons v0.132.0 released

Daniel <danielaparker@gmail.com>: Aug 10 01:20AM -0700

https://github.com/danielaparker/jsoncons

A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSONPath, CSV, MessagePack, CBOR, BSON, UBJSON

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Saturday, August 10, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 6 topics

No comments:

Blog Archive

About Me