- Why does this work on Xcode ??? - 2 Updates
- Lock-free LRU-cache-algorithm - 8 Updates
- Performance of unaligned memory-accesses - 6 Updates
- Adding compilation arguments to 4000 make files. . . - 1 Update
- Why can't I understand what coroutines are? - 7 Updates
- [announcement] jsoncons v0.132.0 released - 1 Update
Tim Rentsch <tr.17687@z991.linuxsc.com>: Aug 10 09:29AM -0700 > My point is, I'm not sure the students will appreciate recursion > properly based on that code, if that's the intention. (Perhaps not > based on any C++ code.) These comments strike me as somewhat odd. Certainly it is true that recursive functions often offer a good fit to recursive data types. That doesn't mean recursion is a bad fit in other cases. On the contrary, using recursion often gives a definition that is simpler and easier to understand than a non-recursive version. A well-known example is finding the greatest common divisor of two numbers, which might be written as follows (with gcd(0,0) to return the value 1): unsigned gcd( unsigned a, unsigned b ){ return a == 0 ? b != 0 ? b : 1 : gcd( b%a, a ); } Writing a nicely tail-recursive function often entails passing arguments besides just the primary data item. This happens with arrays in C and C++ because arrays are not recursive data types in C or C++ (nor are they in many functional languages). Furthermore we may want not one additional "cursor" but more than one. Here is a function to determine if a sorted array contains a given value: bool contains( int *v, size_t base, size_t limit, int item ){ if( base == limit ) return false; size_t mid = base + (limit-base)/2; if( v[mid] < item ) return contains( v, mid+1, limit, item ); if( v[mid] > item ) return contains( v, base, mid, item ); return true; } In later versions of C++ we might choose to write this function differently, using 'span' (disclaimer: I am not an expert on span!). Doing that would very likely give a nicer function. But I wouldn't say the non-span version is "inelegant". It's just a reflection of how the language-provided data types work. Do I remember right that you are one of the people who is still working (or forced to work) with C++11? Anyone in that boat might choose a function definition like the one above, until such time as better language facilities are available so it could be re-written more attractively. I guess my main point is that recursion is more broadly applicable than just those situations where recursive data types are involved. In fact, in languages like C and C++, it is much more likely to encounter a good fit for recursion that does not make use of recursive data types, simply because C and C++ typically do not define data types recursively. Recursion is more fundamental than just the cases where recursive data types are used. |
Tim Rentsch <tr.17687@z991.linuxsc.com>: Aug 10 11:18AM -0700 > For portable code it doesn't depend: you just have no guarantee and > hence risk UB. > And a risk of UB = UB. Consider the following two function definitions: #include <array> using SomeType = double; const unsigned long SomeFixedSize = 137; using Array = ::std::array< SomeType, SomeFixedSize >; bool iterative_contains( Array & a, unsigned n, SomeType & v ){ while( n > 0 ){ if( a[n-1] == v ) return true; n--; } return false; } bool recursive_contains( Array & a, unsigned n, SomeType & v ){ return ! (n > 0) ? false : a[n-1] == v ? true : recursive_contains( a, n-1, v ); } As far as the Standard is concerned these two functions are semantically equivalent. They are equally well-defined. They have exactly the same implications for observable behavior. If one has undefined behavior then so does the other. If one has a risk of undefined behavior then so does the other. A conforming implementation may choose to compile an iterative definition into recursive object code, as well as vice versa. So if you are going to say the C++ standard gives you no guarantee about whether the recursive function might result in undefined behavior, you should also say the C++ standard gives you no guarantee about whether the iterative function might result in undefined behavior. > Modulo a number of caveats, yes. [...] > * That for a debug build the compiler can't optimize away calls. > Because you can't trace into a call if it's optimized away. That depends what you mean by "debug build". In fact g++ will compile recursive definitions into iterative object code even when a -g or -ggdb option is given. |
"Chris M. Thomasson " <ahh_f_it@crap.nothing>: Aug 10 02:09AM -0700 On 8/8/2019 11:59 PM, Bonita Montero wrote: > look like? LRU-caches are only possible with doubly-linked lists. > So I thought this woudn't be possible lock-free. But maybe I'm > wrong here and someone can give me an idea. Perhaps a lock-free deque? |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Aug 10 02:12AM -0700 On 8/10/2019 2:09 AM, Chris M. Thomasson wrote: >> So I thought this woudn't be possible lock-free. But maybe I'm >> wrong here and someone can give me an idea. > Perhaps a lock-free deque? Maybe: https://pdfs.semanticscholar.org/8a68/f45bd32ed050a96faa24139ab71178258f13.pdf Its pretty glaringly. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 11:42AM +0200 >> So I thought this woudn't be possible lock-free. But maybe I'm >> wrong here and someone can give me an idea. > Perhaps a lock-free deque? No, I want to move an arbitrary element of a doubly linked list to the top. |
"Öö Tiib" <ootiib@hot.ee>: Aug 10 03:48AM -0700 On Saturday, 10 August 2019 12:42:22 UTC+3, Bonita Montero wrote: > > Perhaps a lock-free deque? > No, I want to move an arbitrary element of a doubly linked list > to the top. You anyway have some rainbow tree or hash table to find that element that represents cached resource. The element being also in linked list (having those prev/next pointers) is irrelevant for consumers of cached resources. So you may leave the prev/next pointers to be managed by single thread. When there is only single thread that accesses those then task is "embarrassingly parallel" IOW locks are not needed, IOW best "lock-free" there can be. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 01:21PM +0200 > pointers to be managed by single thread. When there is only single > thread that accesses those then task is "embarrassingly parallel" > IOW locks are not needed, IOW best "lock-free" there can be. No, the element is found by multiple threads accessing thr hashtable. I think the buckets could be partitioned with each parition having a single mutex; and te number of mutexes should be a fair multiple of the number of HW-threads so there is a low likehood of a collision. But each thread will push the found cache indivually to the top, so there's no central thread doing this. Intel's transactional memory incarnation RTM would be perfectly suitable for that, also IBMs equivalent incarnation since the POWE8-CPUs. But AMD-CPUs lack such a feature since they have this bain-damaged exclusive cache-architecture betwen L1/L2 (both inclusive) and L3 which doesn't allow this versioning which gets the rollback from the L3-cache. |
"Öö Tiib" <ootiib@hot.ee>: Aug 10 05:30AM -0700 On Saturday, 10 August 2019 14:21:39 UTC+3, Bonita Montero wrote: > of the number of HW-threads so there is a low likehood of a collision. > But each thread will push the found cache indivually to the top, so > there's no central thread doing this. But the threads could forward that task (of pushing the found element to top) to single thread. It is technically management of those prev/next pointers that the cache users don't care about anyway. Forwarding can be done for example by using lock-free deque and so it is achieved that whole thing is entirely lock-free. If it is not good enough for you then perhaps read the paper that Chris M. Thomasson posted or search for similar. > bain-damaged exclusive cache-architecture betwen L1/L2 (both > inclusive) and L3 which doesn't allow this versioning which gets > the rollback from the L3-cache. Processor cache architecture is mostly abstracted out of reach on level of C++. We can adjust the behavior of our software a bit and rest of it ... is topical in comp.arch. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 03:52PM +0200 > element to top) to single thread. It is technically management > of those prev/next pointers that the cache users don't care > about anyway. ... That would be totally brain-damaged because locking the linked list and pushing the list-entry to the top by the thread that accesses a certain block and thereby attributing it at the most recent block would be more efficient. That's just because the blocks aren't managed by a single thread but they're touched shared. > Processor cache architecture is mostly abstracted out of reach > on level of C++. We can adjust the behavior of our software a > bit and rest of it ... is topical in comp.arch. Hey, transactional memory was one of the candidate-topics for C++20! And the HLE-flavor of transactional memory which Intel implemnts (and I recenctly wonderted that IBM implemented the same since the POWER8) perfectly fits with the usual mutex-locking of C++. It's just that HLE is more appropriate with mutex-locking with a small scope - like in this example. |
"Öö Tiib" <ootiib@hot.ee>: Aug 10 09:40AM -0700 On Saturday, 10 August 2019 16:52:12 UTC+3, Bonita Montero wrote: > list and pushing the list-entry to the top by the thread that > accesses a certain block and thereby attributing it at the most > recent block would be more efficient. A brain has to be rather calcified to receive total damage from simple idea how to get rid of constant race over top of list. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 08:55AM +0200 > struct {char a,b,c,d;} > where char is 8 bits. What happens when you run this on a machine > which only has 32-bit word-addressable memory? On these machines sizeof(char) is also 1. And all larger types are a multiple of this. So there can be also a misalignment in this way if the compiler supports this. I.e. a 32-bit-value might be directly followed by a 64-bit value. > #pragma pack(1) > struct {char a; int b;} > on a machine that doesn't support unaligned accesses in hardware. I alread told in another posting here that if a compiler supports #pragma pack it supports generating correct code; but this doesn't mean that the CPU / OS supports this at runtime. F.e. Linux on SPARC emulates unaligned acesses through trapping whereas on Solaros on SPARC this depends on how you compile your code. > (void*)0x12345; // ie. non-aligned pointer > The compiler might assume that p points to an aligned address. If the > hardware doesn't support unaligned, what would happen when *p is evaluated? No, the compiler might not assume this. It simply generates a correct load suitable for this. But if it works at runtime depends on the CPU (on Intel-CPUs this always works) and on other systems on the combina- tion off CPU and OS. |
David Brown <david.brown@hesbynett.no>: Aug 10 05:10PM +0200 On 09/08/2019 17:43, Bart wrote: >> is to be MSVC-compatible here. > Being able to define such a layout doesn't necessarily mean that > unaligned accesses are supported by hardware. More importantly here, it also does not mean you can take pointers to the unaligned members and use them as normal pointers, nor does it mean that the compiler supports unaligned access in other cases - /even/ if the hardware supports it. On platforms that don't support unaligned access in the hardware, accessing members of a "packed" struct will give you byte-by-byte instructions. No guarantees are usually given if you take a pointer to the member and work from there. |
David Brown <david.brown@hesbynett.no>: Aug 10 05:21PM +0200 On 10/08/2019 00:28, Bart wrote: >>> unaligned accesses are supported by hardware. >> Ok, you're soooo right. >> #pragma pack is supported without making sense. Bart /is/ right. Lots of processors don't support unaligned access, and compilers for them will usually still have some type of "packed" struct solution. > struct {char a,b,c,d;} > where char is 8 bits. What happens when you run this on a machine which > only has 32-bit word-addressable memory? If the system only has 32-bit word addressable memory, then usually char will be 32-bit (you get this in some DSP's). Alternatively, it can emulate access to smaller objects - this may even mean that char* pointers are bigger than int* pointers. Either way, "pack" will not affect the layout of a struct containing only char members. > int *p = f(); > f (in code not visible from the call-site) returns this: > (void*)0x12345; // ie. non-aligned pointer No, C does not allow that - converting a value into a pointer type that is not properly aligned is undefined behaviour. You can use other lower level methods, such as memcpy, to end up with an unaligned int* pointer - but dereferencing it will be undefined behaviour. That is true regardless of whether or not the hardware supports unaligned pointers. (Of course any C implementation could choose to define such behaviour in the obvious way - but I have not seen any compiler document such behaviour.) Note that it is very dangerous to talk about "code not visible from the call site". The trend in compilation is that more and more is visible, with cross-unit compilation and optimisation. Techniques that used to "work" by hiding information from the compiler no longer work when you have link-time optimisation and similar systems. > The compiler might assume that p points to an aligned address. If the > hardware doesn't support unaligned, what would happen when *p is evaluated? The compiler can always assume "p" is aligned properly, and use that fact for optimisation or code re-arrangement, regardless of unaligned access support in the hardware. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 05:21PM +0200 > More importantly here, it also does not mean you can take pointers > to the unaligned members and use them as normal pointers, ... Quote the part of the standard that says that. > ..., nor does it mean that the compiler supports unaligned access > in other cases - /even/ if the hardware supports it. Which compiler does specify this? Show me a link to the documentation. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 05:25PM +0200 > Bart /is/ right. Lots of processors don't support unaligned access, > and compilers for them will usually still have some type of "packed" > struct solution. Yes, they generate the correct code and let the rest depend on the CPU or the OS (trapping). > No, C does not allow that - converting a value into a pointer type that > is not properly aligned is undefined behaviour. This is done only on embedded-platforms with fixed adresses of hardware -registers. So you're in the context of a certain compiler and hardware. There you don't have to worry about what the standard says. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 05:31PM +0200 >> More importantly here, it also does not mean you can take pointers >> to the unaligned members and use them as normal pointers, ... > Quote the part of the standard that says that. Oh, I wasn't correct here: since #pragma pack is a compiler-specific feature it sould be specified by the compiler-documentation that it isn't legal to take the address of an unaligned member. So show me some compilers documentations that say that it isn't legal to take the address of an unaligned member! |
David Brown <david.brown@hesbynett.no>: Aug 10 05:06PM +0200 On 09/08/2019 16:36, Scott Lurndal wrote: > That paper has never been convincing to me, nor anyone that I've > ever worked with, nor any unix or linux os or other large sourcebase > that I've worked on professionally. I have had enough trouble dealing with projects that contain separate Makefiles for different parts, yet depend on odd ways across parts, to be convinced by the paper. Far too often I've to do "make clean" before a "make" because of badly coordinated sub-makes. I am sure it is /possible/ to have large systems with recursive makes that work well. Equally, I am sure that often such systems are not done well. |
Tim Rentsch <tr.17687@z991.linuxsc.com>: Aug 09 06:11PM -0700 > Even if the restriction is a special case of coroutines, I think > it can be argued that "coroutine" is a misnomer for the C++20 > thing. :-) [...] Yes, and it's an easy argument to make. The C++ construct I have seen described bears almost no resemblance to coroutines as they were originally described (I first read about them in Knuth v. 1). A C++20 "coroutine" is more like a detachable, resumable function, not at all like the symmetric arrangement of mutually calling functions described in Knuth. To be fair, it's probably worth mentioning that in those days (and especially in Knuth's TAOCP), call linkage was a lot more static, and what we now take for granted in the way of automatic storage was often nowhere to be seen in many program environments. Call stack? Local variables? Reentrant functions? These elements are pretty much not found in environments where coroutines were applied, and vice versa. It's hard to unify these two kinds of control structures. So it isn't surprising that "coroutine" takes on a rather different form in C++, compared to what coroutines were originally. |
Tim Rentsch <tr.17687@z991.linuxsc.com>: Aug 09 07:49PM -0700 > if( p>left ) { nodes.push( p->left ); } > } > } The function body can be simplified thus: stack<Node*> nodes; nodes.push( root ); while( not nodes.empty() ) { const auto p = nodes.top(); nodes.pop(); if( !p ) continue; co_yield p->data; nodes.push( p->right ); nodes.push( p->left ); } |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 10 05:13AM +0200 On 10.08.2019 04:49, Tim Rentsch wrote: > nodes.push( p->right ); > nodes.push( p->left ); > } Thanks. Don't know why I wrote that so complicated, with no nullpointers pushed except possibly the root. But I guess I was just transforming the original code, which had those conditional pushes. Cheers!, - Alf |
Christian Gollwitzer <auriocus@gmx.de>: Aug 10 08:23AM +0200 Am 05.08.19 um 23:09 schrieb Alf P. Steinbach: >> voilá. > As I understand it this generates a new coroutine instance, with > dynamically allocated state, for each call. yes, of course. > if( p>left ) { nodes.push( p->left ); } > } > } This code now manages the stack manually and is half-way to the transformation into the iterative versino which you need to write *without* coroutines. I'm not sure my message was received, I try again. So consider, you decide to go from breadth-first traversal to in-order traversal. Coroutine: void bftraverse(Tree* root) { if (root->left) [<-] bftraverse(root->left); co_yield(root->content); if (root->right) [<-] bftraverse(root->right); } The only thing that has changed, is the order of the yield instructions. (plus I added the forgotten [<-] thing as shown in the TR paper, which means the same as "yield from" in Python) How to transform your unfolded code? Let's replace: > co_yield p->data; > if( p->right ) { nodes.push( p->right ); } > if( p>left ) { nodes.push( p->left ); } by > if( p->right ) { nodes.push( p->right ); } > co_yield p->data; > if( p>left ) { nodes.push( p->left ); } and surprisingly, it is still doing breadth-first search. Transforming this correctly means either a different stack management, or a special algorithm invented by geniuses which can walks along the tree leafs- The point of coroutines is not to maximize efficieny, but to write code *as if* it were the original simple, clear algorithm, but resulting in a state machine. For some cases this can actually improve efficiency, e.g for asynchronous network code, because the manual async thing is so complex that noone would do it. The price to pay is actually not that high. Your stack uses one node* for the coroutine state. The compiler-generated version uses all local variables (just Node*) plus an instruction pointer, so yes it is more, but not that much. Gor Nishanov actually claimed in his talk, that VC++ coroutines are efficient enough for asynchronous access to *main memory* https://www.youtube.com/watch?v=j9tlJAqMV7U Christian |
"Chris M. Thomasson" <invalid_chris_thomasson_invalid@invalid.com>: Aug 10 12:19AM -0700 On 8/5/2019 1:26 PM, Christian Gollwitzer wrote: > Am 05.08.19 um 10:42 schrieb Juha Nieminen: [...] > if (root->left) bftraverse(root->left); > if (root->right) bftraverse(root->right); > } Looks nice to me. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 10 10:43AM +0200 On 10.08.2019 08:23, Christian Gollwitzer wrote: > *without* coroutines. > I'm not sure my message was received, I try again. So consider, you > decide to go from breadth-first traversal to in-order traversal. I think you mean go from pre-order to in-order. > The only thing that has changed, is the order of the yield instructions. > (plus I added the forgotten [<-] thing as shown in the TR paper, which > means the same as "yield from" in Python) I haven't yet stumbled upon that notation, so I think it must be something that once was proposed but has been rejected? Anyway, I'm now aware that for C++20 coroutines one needs a return type that implements a lot of stuff. With Visual C++ that's apparently provided by `std::future`. I don't know about the standard. > and surprisingly, it is still doing breadth-first search. Transforming > this correctly means either a different stack management, or a special > algorithm invented by geniuses which can walks along the tree leafs- Iterative in-order isn't that hard. After having output (or whatever) the left subtree of node N, one outputs N.data and registers a sufficient subset of nodes in the right subtree in the data structure, which can be a stack, so that the same general process of popping out a node, displaying it, and registering some more, can work. What's necessary then is just that the first node popped out and thus next output, should be the leftmost leaf of the right subtree. To do that it's necessary to register all the nodes down to that left. I.e., a loop, to register (push) the nodes down the chain to that leaf node. void process_tree_node( const Node& node ) { cout << node.data; } void process_tree_nodes_inorder( const Type_<const Node*> p_root ) { stack<const Node*> nodes; const Node* p_current = p_root; for( ;; ) { while( p_current ) { nodes.push( p_current ); p_current = p_current->left; } if( is_empty( nodes ) ) { break; } p_current = popped_top( nodes ); process_tree_node( *p_current ); p_current = p_current->right; } } > but not that much. Gor Nishanov actually claimed in his talk, that VC++ > coroutines are efficient enough for asynchronous access to *main memory* > https://www.youtube.com/watch?v=j9tlJAqMV7U Cheers!, - Alf |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 10 10:50AM +0200 > functors), with the difference that the coroutine can be "called" again > in such a manner that it continues from where it "returned" last time, > rather than always from the beginning (which would happen with lambdas). The difference is that the coroutine can resume its work after returning and the coroutine can maintain its state between returns. |
Daniel <danielaparker@gmail.com>: Aug 10 01:20AM -0700 https://github.com/danielaparker/jsoncons A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSONPath, CSV, MessagePack, CBOR, BSON, UBJSON |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment