- How all the cool kids are getting array lengths from C++11 onwards - 10 Updates
- Vector with memory from global storage (static duration) - 10 Updates
- Thread must sleep forever (Try to double-lock mutex?) - 3 Updates
- Lock-free LRU-cache-algorithm - 2 Updates
| Juha Nieminen <nospam@thanks.invalid>: Oct 24 08:10AM > claus. Auto-vectorization only works for some predefined code-patterns > the compiler knows. In most cases you would have to do the vectorization > yourself through SIMD-intrinsics. Have you even looked at what modern compilers like recent versions of gcc and clang produce using automatic vectorization? Because I have. It's not always perfect, but it's miles better than no vectorization of any kind. Often they are able to do surprisingly complex automatic vectorization of relatively complex structures. Consider, for example, this kind of code: //------------------------------------------------------------------- #include <array> struct Point4D { float x, y, z, w; }; using Point4DVec = std::array<Point4D, 4>; Point4DVec foo(const Point4DVec& v1, const Point4DVec& v2, float f) { Point4DVec result; for(std::size_t i = 0; i < result.size(); ++i) { result[i].x = (v1[i].x - v2[i].x) * f; result[i].y = (v1[i].y - v2[i].y) * f; result[i].z = (v1[i].z - v2[i].z) * f; result[i].w = (v1[i].w - v2[i].w) * f; } return result; } //------------------------------------------------------------------- gcc compiles that to: vbroadcastss ymm2, xmm0 vmovups ymm3, YMMWORD PTR [rsi] vmovups ymm0, YMMWORD PTR [rsi+32] vsubps ymm1, ymm3, YMMWORD PTR [rdx] vsubps ymm0, ymm0, YMMWORD PTR [rdx+32] mov rax, rdi vmulps ymm1, ymm1, ymm2 vmulps ymm0, ymm0, ymm2 vmovups YMMWORD PTR [rdi], ymm1 vmovups YMMWORD PTR [rdi+32], ymm0 vzeroupper ret |
| Bonita Montero <Bonita.Montero@gmail.com>: Oct 24 11:01AM +0200 > Have you even looked at what modern compilers like recent versions of > gcc and clang produce using automatic vectorization? Because I have. Have you checked the compiler-docs? They document which code-patterns they can detect and vectorize. That are only very special patterns. |
| David Brown <david.brown@hesbynett.no>: Oct 24 12:51PM +0200 On 24/10/2019 11:01, Bonita Montero wrote: >> gcc and clang produce using automatic vectorization? Because I have. > Have you checked the compiler-docs? They document which code-patterns > they can detect and vectorize. That are only very special patterns. Can you give links to these documentation pages for gcc and clang? |
| Bonita Montero <Bonita.Montero@gmail.com>: Oct 24 01:37PM +0200 > Can you give links to these documentation pages for gcc and clang? No, but I'll bet they're not cleverer as Intel-C++: https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-automatic-vectorization-overview |
| David Brown <david.brown@hesbynett.no>: Oct 24 02:14PM +0200 On 24/10/2019 13:37, Bonita Montero wrote: >> Can you give links to these documentation pages for gcc and clang? > No, but I'll bet they're not cleverer as Intel-C++: > https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-automatic-vectorization-overview So when you wrote: > Have you checked the compiler-docs? They document which code-patterns > they can detect and vectorize. That are only very special patterns. in response to a comment about gcc and clang auto-vectorisation, you really didn't know what you were talking about? Both clang and gcc have vectorised code like Juha's for many years - and both produce a great deal better code for it than ICC in my brief test. ICC does not vectorise the code at all. (That may, of course, vary for different code samples - and it may be dependent on compiler flags.) Give it a shot on <https://godbolt.org> yourself. And while it is certainly the case that compilers can only auto-vectorise certain types of code, it is /not/ the case that it is so limited that these tools list the code patterns in their documentation. All they do is provide some tips and suggestions about how to increase the likelihood of automatic vectorisation. Why not try investigating what people write in posts, rather than mocking them from a position of ignorance? |
| "Öö Tiib" <ootiib@hot.ee>: Oct 24 05:20AM -0700 On Thursday, 24 October 2019 15:14:35 UTC+3, David Brown wrote: > Why not try investigating what people write in posts, rather than > mocking them from a position of ignorance? As described by social psychologists David Dunning and Justin Kruger, the cognitive bias of illusory superiority results from an internal illusion in people of low ability and from an external misperception in people of high ability; that is, "the miscalibration of the incompetent stems from an error about the self, whereas the miscalibration of the highly competent stems from an error about others." https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect |
| Bonita Montero <Bonita.Montero@gmail.com>: Oct 24 04:16PM +0200 > ICC does not vectorise the code at all. Check the ICC documentation ... |
| Bonita Montero <Bonita.Montero@gmail.com>: Oct 24 04:23PM +0200 https://gcc.gnu.org/projects/tree-ssa/vectorization.html |
| Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 24 08:27AM -0700 On Wednesday, October 23, 2019 at 6:18:17 PM UTC+1, James Kuyper wrote: > reaction that you've provoked corresponding to a fish taking the bait. > It really doesn't matter whether the bait is "wrong" or "right" in any > absolute sense, only that people will react to it. When practising etymology, it's quite easy to come to wrong conclusions. I think it's likely that the modern use of "troll" on internet forums actually comes from the doll: https://en.wikipedia.org/wiki/Troll_doll This makes the most sense to me out of all the possibilities (including your angling one). |
| scott@slp53.sl.home (Scott Lurndal): Oct 24 03:50PM >On Wednesday, October 23, 2019 at 6:18:17 PM UTC+1, James Kuyper wrote: >I think it's likely that the modern use of "troll" on internet forums actually comes from the doll: https://en.wikipedia.org/wiki/Troll_doll >This makes the most sense to me out of all the possibilities (including your angling one). I've been on usenet for almost forty years; It has always been well understood that the term derives from the fishing usage. |
| Juha Nieminen <nospam@thanks.invalid>: Oct 24 07:49AM > The small_vector has also preallocated capacity in it but can > grow over that using an allocator: > <https://www.boost.org/doc/libs/1_71_0/doc/html/boost/container/small_vector.html> I am talking about an allocator (usable with any standard library container), not a std::vector replacement. |
| "Öö Tiib" <ootiib@hot.ee>: Oct 24 02:41AM -0700 On Thursday, 24 October 2019 10:50:10 UTC+3, Juha Nieminen wrote: > > <https://www.boost.org/doc/libs/1_71_0/doc/html/boost/container/small_vector.html> > I am talking about an allocator (usable with any standard library container), > not a std::vector replacement. Yes you talk about concrete solution to problem that (I assume) is low performance of using dynamic memory. The typical solutions for reducing dynamic allocations (like using std::vector::reserve or std::unordered_set::reserve) are not good enough. So special containers optimized to use dynamic storage management provided by allocator not at all or very rarely are likely as good as it can get. I don't think there can be special allocator that gives same effect with stock containers. |
| Soviet_Mario <SovietMario@CCCP.MIR>: Oct 24 01:33PM +0200 Il 24/10/19 00:15, Keith Thompson ha scritto: > means that the elements will have the same storage duration as the > std::array object itself. (That's why the number of elements in > a std::array has to be known at compile time.) intresting. Actually I must admit I confused VECTOR for ARRAY :\ > A std::array object is essentially a structure with an array as > its only member. understood. Tnx -- 1) Resistere, resistere, resistere. 2) Se tutti pagano le tasse, le tasse le pagano tutti Soviet_Mario - (aka Gatto_Vizzato) |
| Soviet_Mario <SovietMario@CCCP.MIR>: Oct 24 01:36PM +0200 Il 24/10/19 01:19, Mr Flibble ha scritto: > optimization" whereby small strings are allocated within the > object itself. I have also created a "vecarray" container > that does a similar thing for vectors. uhm, strange ... it seems to me that such a solution may be worse than the problem I mean : 1) there is no a precise definition of SMALLness, a known mandatory threshold to know in advance 2) is the optimization mandatory ? If not, even worser with regard to the address (and storage type) of actual data. we could fall in a scenario in which a small STATIC std::string could have both the members and data static and another instance not, just the members and data elsewhere, without possibility to know in advance. -- 1) Resistere, resistere, resistere. 2) Se tutti pagano le tasse, le tasse le pagano tutti Soviet_Mario - (aka Gatto_Vizzato) |
| David Brown <david.brown@hesbynett.no>: Oct 24 02:25PM +0200 On 24/10/2019 13:36, Soviet_Mario wrote: > the problem > I mean : 1) there is no a precise definition of SMALLness, a known > mandatory threshold to know in advance Yes. The ideal size of "smallness" will depend on many factors. But some ground rules can be established. You will always need "metadata" in the immediate part of the type - holding things like the current count of items (vector elements, string bytes, etc.), a pointer to the data part (typically on the heap), and the size of the allocated part. On a 64-bit target, that would be 3 * 8 = 24 bytes. If the size of the string, or the vector, is smaller than 24 bytes, then it makes sense to store it locally within the object itself. Whether you then say "24 bytes" is your "smallness", or pick something else will depend on how you are using these types. A bigger "smallness" will mean more wasted ram if your vectors/strings are not typically "small" (or if they are typically very small). But heap allocated memory usually has a minimum size anyway, and allocation on the stack is very much faster than allocation on the heap. So you might pick a "smallness" that gives a total object size of 64 bytes, for example - if you keep the objects aligned they will match cache lines on many cpu's, and it might even be possible to pass them around in large SIMD registers. > 2) is the optimization mandatory ? If not, even worser with regard to > the address (and storage type) of actual data. That is up to the implementer of the classes. But "small string" or "small data" optimisations are usually a win. > we could fall in a scenario in which a small STATIC std::string could > have both the members and data static and another instance not, just the > members and data elsewhere, without possibility to know in advance. Yes. So what? |
| Paavo Helde <myfirstname@osa.pri.ee>: Oct 24 03:28PM +0300 On 24.10.2019 14:36, Soviet_Mario wrote: >> vectors. > uhm, strange ... it seems to me that such a solution may be worse than > the problem What's the problem you are talking about? Small string optimization is pretty much a perfect optimization as it avoids dynamic allocation calls and provides better memory locality, these are both a big deal nowadays. And having an optimization which provides better performance in some situations is better than not having an optimization at all. > I mean : 1) there is no a precise definition of SMALLness, a known > mandatory threshold to know in advance What would you do with that knowledge? Shorten your strings? Smaller strings are always faster to process, so if you could use smaller strings in your code you would have already done that (assuming the speed is so critical). > we could fall in a scenario in which a small STATIC std::string could > have both the members and data static and another instance not, just the > members and data elsewhere, without possibility to know in advance. So what? If this happens, then presumably this comes from a shortening a long string, meaning that the initial dynamic allocation call had already happened and could not be anyway avoided any more. |
| Paavo Helde <myfirstname@osa.pri.ee>: Oct 24 04:46PM +0300 On 24.10.2019 14:36, Soviet_Mario wrote: >> whereby small strings are allocated within the object itself. > I mean : 1) there is no a precise definition of SMALLness, a known > mandatory threshold to know in advance The upper limit for "small string" length can be easily found out as sizeof(std::string)-1 (-1 for the zero terminator byte, which can in principle also serve as a marker that SSO is in use). |
| "Öö Tiib" <ootiib@hot.ee>: Oct 24 06:58AM -0700 On Thursday, 24 October 2019 16:46:57 UTC+3, Paavo Helde wrote: > The upper limit for "small string" length can be easily found out as > sizeof(std::string)-1 (-1 for the zero terminator byte, which can in > principle also serve as a marker that SSO is in use). I think it is -2. Note that the std::string may contain zero bytes. IOW ...: std::string s("\0\0test", 6); ... has to work and after that ... assert(s.length() == strlen(s.c_str())); ... does not hold. So one byte is needed for length of short string and other is needed for zero terminator. |
| Paavo Helde <myfirstname@osa.pri.ee>: Oct 24 05:21PM +0300 On 24.10.2019 16:58, Öö Tiib wrote: > assert(s.length() == strlen(s.c_str())); > ... does not hold. So one byte is needed for length of > short string and other is needed for zero terminator. Right, thanks for the correction! Storing the length separately would also mean that it's not needed to calculate it as if by strlen() each time when needed (although strlen() should also by very fast on such small strings). |
| Bo Persson <bo@bo-persson.se>: Oct 24 05:27PM +0200 On 2019-10-24 at 16:21, Paavo Helde wrote: > Storing the length separately would also mean that it's not needed to > calculate it as if by strlen() each time when needed (although strlen() > should also by very fast on such small strings). If you want to force it, you can store "unused space" instead of "size" in the last byte. Then that byte happens to be zero when the space is full. :-) https://github.com/elliotgoodrich/SSO-23 For various reasons, like preferring an all-zero init being the empty string, this is not used by the major implementations. Bo Persson |
| Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 24 05:38AM On Wed, 2019-10-23, Chris M. Thomasson wrote: > non-recursive mutex twice in the same thread is an error. This can be > solved multiple ways. The easy way is to create the Supervisor as a > service/ daemon. Robust mutexs can be used to detect when a process dies. The software (as I understand it) is Unix-specific; standard practice there is to have parent--child process relationships, and to use the wait() family of functions to detect when a child dies. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
| Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 24 08:46AM On Wed, 2019-10-23, Frederick Gotham wrote: >> the job. > I don't want to complicate the Supervisor, and so I'm happy to put > the gas_monitor process to sleep. In a way you /are/ complicating it already, by stretching the meaning of "supervise". Especially if there is some kind of heartbeat mechanism too, but maybe there isn't. If you /really/ want to sleep forever and consume minimal resources on Unix, the optimal way would be to close all file descriptors, release anything else which would survive an exec(), and exec() a tiny 'sleep_forever' binary. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
| scott@slp53.sl.home (Scott Lurndal): Oct 24 01:40PM >The software (as I understand it) is Unix-specific; standard practice >there is to have parent--child process relationships, and to use the >wait() family of functions to detect when a child dies. Or poll periodically using kill(pid, 0); kill will return ESRCH if the process doesn't exist. Using SIGSTOP to suspend a process (either in-context using raise(3) or externally from a monitor process using kill(2)) is the most efficient way of pausing a process. |
| "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 23 07:42PM -0700 On 9/26/2019 2:13 AM, Bonita Montero wrote: > CMPXCHGs on three 64 bit values (in a 64-bit-system, in a 32-bit > system you would have two 64- and one 32-bit exchange) if there's > no collision. Are you using an embedded version count within the data the CAS's work on? Like an ABA counter? If a CAS fails, how far back do you have to restart, or unroll if you will? I'm asking myself if this would be faster with trans- |
| Bonita Montero <Bonita.Montero@gmail.com>: Oct 24 11:00AM +0200 > Are you using an embedded version count within the data the CAS's work > on? Like an ABA counter? If a CAS fails, how far back do you have to > restart, or unroll if you will? No, I'm not using ABA-counters. |
| You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment