- initial size of a std::vector of std::strings - 19 Updates
- An argument *against* (the liberal use of) references - 6 Updates
Lynn McGuire <lynnmcguire5@gmail.com>: Dec 06 11:42PM -0600 Is there a way to set the size of std::vector <std::string> formulas to max_ncp size ? Something like "std::vector <std::string> formulas [max_ncp];" Thanks, Lynn |
James Kuyper <jameskuyper@alumni.caltech.edu>: Dec 07 12:52AM -0500 On 12/7/22 00:42, Lynn McGuire wrote: > Something like "std::vector <std::string> formulas [max_ncp];" > Thanks, > Lynn 23.2.6.1: explicit vector(size_type n); 5 Effects: Constructs a vector with n default constructed elements. 6 Requires: T shall be DefaultConstructible. 7 Complexity: Linear in n. |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 07 08:17AM +0100 Am 07.12.2022 um 06:42 schrieb Lynn McGuire: > Something like "std::vector <std::string> formulas [max_ncp];" > Thanks, > Lynn If you want to seqentially fill up the vector with your data up to the size you allocated before declare an empty-sized vector<string>, do a vec.reserve( max_ncp ) and have an according number of emplace_back to that vector afterwards: vector<string> vStr; vStr.reserve( max_ncp ); for( size_t i = 0; i != max_ncp; ++i ) vStr.emplace_back( ... ); |
"Öö Tiib" <ootiib@hot.ee>: Dec 07 12:23AM -0800 On Wednesday, 7 December 2022 at 07:42:49 UTC+2, Lynn McGuire wrote: > Is there a way to set the size of std::vector <std::string> formulas to > max_ncp size ? > Something like "std::vector <std::string> formulas [max_ncp];" When that max_ncp is compile-time known immutable value then std::array<std::string, max_ncp> can be what you want. Maybe worth to note that I've seen reserve(n) of std::vector overused in practice as bad pessimization. Vector is typically designed so that we get log(max) allocations and max times element copies because of vector just growing freely during its life-time. That can be turned into way worse by someone micromanaging it. |
Juha Nieminen <nospam@thanks.invalid>: Dec 07 09:13AM > we get log(max) allocations and max times element copies because > of vector just growing freely during its life-time. That can be turned into > way worse by someone micromanaging it. The typical std::vector implementation doubles its capacity every time it needs to grow. Now consider what happens if you are adding, say, at least a thousand elements to a vector (using push_back() or emplace_back()): It will do ten reallocations pretty quickly in succession, every time it has to double its capacity. While the frequency at which it's doing these reallocations is decrementing exponentially, at the beginning they are very frequent, and there's literally no advantage in doing them. If you know that you will be adding at least a thousand or so elements, doing an initial reserve(1024) will remove *all* of those useless initial ten reallocations. Avoiding the useless reallocations is not just a question of speed, but also of avoiding memory fragmentation. The less dynamic memory allocations you do, the better (in terms of both). |
"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Dec 07 12:28PM +0100 On 7 Dec 2022 06:42, Lynn McGuire wrote: > Is there a way to set the size of std::vector <std::string> formulas to > max_ncp size ? > Something like "std::vector <std::string> formulas [max_ncp];" The most reasonable interpretation to me is that you're asking how to set the initial /capacity/ of the vector in the declaration, to avoid an explicit .reserve() invocation as a separate following statement. Presumably because you declare such vectors in many places. To do that you can to wrap the vector, either via a factory function or via a simple class, like struct Capacity{ int value; }; struct Fast_stringvec: public vector<string> { template< class... Args > Fast_stringvec( const Capacity capacity, Args&&... args ): vector<string>( forward<Args>( args )... ) { reserve( capacity.value ); } }; Then do e.g. auto formulas = Fast_stringvec( Capacity( max_ncp ) ); ... where I believe the use of `auto` helps avoid the Most Vexing Parse. ;-) If `max_ncp` is a constant you can add a constructor that uses that by default. --- A not so reasonable interpretation is that you're asking how to make the vector contain `max_ncp` strings initially, for which you can just do vector<string> formulas( max_ncp, ""s ); --- Crossing the line to perhaps unreasonable interpretation is that you're asking how you can make it so that when you add a new to string to the vector, the new string item will guaranteed be of size `max_ncp`. For that you need to make the item type one that holds a `string` of exactly that size. It would probably be necessary to take control of copying to such items, to avoid inadvertent size changes. - Alf |
"Öö Tiib" <ootiib@hot.ee>: Dec 07 06:03AM -0800 On Wednesday, 7 December 2022 at 11:13:29 UTC+2, Juha Nieminen wrote: > If you know that you will be adding at least a thousand or so > elements, doing an initial reserve(1024) will remove *all* of those > useless initial ten reallocations. That is is all trivially true. One reserve() call is not "micromanaging" so you missed my point. What I described can be achieved by calling reserve more than once for example in 1024 element steps for vector that grows to million elements. Result is that you get like 50 times more allocations and element copies. It does replace first 10 allocations with one but next 10 with 1000. > Avoiding the useless reallocations is not just a question of speed, > but also of avoiding memory fragmentation. The less dynamic memory > allocations you do, the better (in terms of both). Yes, the misused reserve calls can also fragment the memory far worse as the less dynamic memory allocations you do, the better. |
Paavo Helde <eesnimi@osa.pri.ee>: Dec 07 04:34PM +0200 07.12.2022 11:13 Juha Nieminen kirjutas: >> way worse by someone micromanaging it. > The typical std::vector implementation doubles its capacity every time it > needs to grow. Not any more. Nowadays a factor like 1.5 is more popular, meaning something like 18 allocations for 1000 push_back()-s. This is from MSVC++19: std::vector<int, my_allocator<int>> v; for (int i = 0; i < 1000; ++i) { v.push_back(i); } Allocating 1 elements Allocating 1 elements Allocating 2 elements Allocating 3 elements Allocating 4 elements Allocating 6 elements Allocating 9 elements Allocating 13 elements Allocating 19 elements Allocating 28 elements Allocating 42 elements Allocating 63 elements Allocating 94 elements Allocating 141 elements Allocating 211 elements Allocating 316 elements Allocating 474 elements Allocating 711 elements Allocating 1066 elements |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 07 03:37PM +0100 Am 07.12.2022 um 15:03 schrieb Öö Tiib: > that grows to million elements. Result is that you get like 50 times > more allocations and element copies. It does replace first 10 > allocations with one but next 10 with 1000. For each append where the capacity grows the vector's capacity is doubled with libstdc++ and libc++. With MSVC the capacity grows in 50%-increments. So the number of resizes actually isn't so huge. Java does it the same way and stores the data of an ArrayList also in a normal array that grows by 100% if the capacity is exhausted. > Yes, the misused reserve calls can also fragment the memory ... With moderen allocators like mimalloc, jemalloc and tcmalloc external fragementation isn't actually an issue since there's no external frag- mentation up to a size of two pages (but an average internal fragmen- tation of 25%) and beyond that the memory blocks grow in page incre- ments and each page can easily be deallocated so that external frag- mentation doesn't hurt much. |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 07 03:38PM +0100 Am 07.12.2022 um 15:34 schrieb Paavo Helde: > Not any more. Nowadays a factor like 1.5 is more popular, ... Only MSVC uses 50% increments to satisfy the amortized constant overhead constraint while inserting. libstdc++ and libc++ use a 100% increment. |
"Öö Tiib" <ootiib@hot.ee>: Dec 07 09:33AM -0800 On Wednesday, 7 December 2022 at 16:37:25 UTC+2, Bonita Montero wrote: > For each append where the capacity grows the vector's capacity is > doubled with libstdc++ and libc++. With MSVC the capacity grows in > 50%-increments. So the number of resizes actually isn't so huge. Yes that was my point that the logarithm of base 2 of million is about 20 and logarithm of base 1.5 of million is about 28. Neither is large number and so default behaviour of mainstream standard library vector implementations is reasonable. Application programmer is in my experience rather capable of being unreasonable and micromanaging and reserving linearly. So when suggesting manual reserve() then it is always worth to warn that it is double edged sword not some kind of magic tool. One example of common naive usage of reserve() is instead of plain insert() of range (that grows exponentially if needed) is checking how lot more storage is needed, reserving for it and then adding the elements. Other tool that is on about 9/10 cases pessimally used is shrink_to_fit(). |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 07 06:45PM +0100 Am 07.12.2022 um 18:33 schrieb Öö Tiib: > being unreasonable and micromanaging and reserving linearly. > So when suggesting manual reserve() then it is always worth to > warn that it is double edged sword not some kind of magic tool. reserve isn't hard to use and it is used if you know the number of inserted items. So I don't believe that a misuse of reserve() is common. > plain insert() of range (that grows exponentially if needed) is > checking how lot more storage is needed, reserving for it and > then adding the elements. ... That may sense if you have a really large vector that isn't backed by a memory pool but allocated directly from the kernel and given back to the kernel when being freed becaus of the size of the allo- cation. If that happens the allocated pages aren't actually commit- ted until you touch them and on some systems they're even not sub- tracted from swap on allocation (overcommit). |
Lynn McGuire <lynnmcguire5@gmail.com>: Dec 07 02:54PM -0600 On 12/7/2022 5:28 AM, Alf P. Steinbach wrote: > It would probably be necessary to take control of copying to such items, > to avoid inadvertent size changes. > - Alf I am trying to declare a vector with max_ncp strings in a struct. struct component_data { std::vector <std::string> component_names [max_ncp]; doublereal heatoffusionatmeltingpoint [max_ncp]; doublereal meltingpointtemperature [max_ncp]; doublereal sublimationtemperature [max_ncp]; doublereal triplepointtemperature [max_ncp]; doublereal triplepointpressure [max_ncp]; }; Visual Studio 2015 does not like std::vector <std::string> component_names (max_ncp); or std::vector <std::string> component_names (max_ncp, ""); But it does like std::vector <std::string> component_names [max_ncp]; But it does not work. Thanks, Lynn |
"daniel...@gmail.com" <danielaparker@gmail.com>: Dec 07 01:16PM -0800 On Wednesday, December 7, 2022 at 3:54:23 PM UTC-5, Lynn McGuire wrote: > But it does like > std::vector <std::string> component_names [max_ncp]; > But it does not work. Is this what you want? struct component_data { const std::size_t max_ncp = 10; std::vector <std::string> component_names; component_data() : component_names(max_ncp) {} }; VS 2015 is fine with that. Daniel |
"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Dec 07 10:33PM +0100 On 7 Dec 2022 21:54, Lynn McGuire wrote: > But it does like > std::vector <std::string> component_names [max_ncp]; > But it does not work. It seems to me that what you're trying to do is simply struct Components_data { int n_components; std::string name [max_ncp]; double heat_of_fusion_at_melting_point [max_ncp]; double melting_point_temperature [max_ncp]; double sublimation_temperature [max_ncp]; double triple_point_temperature [max_ncp]; double triple_point_pressure [max_ncp]; }; But with `std::string` involved this is not a memory layout of a Fortran structure. So, do consider whether this can serve your requirements instead: struct Component { std::string name; double heat_of_fusion_at_melting_point; double melting_point_temperature; double sublimation_temperature; double triple_point_temperature; double triple_point_pressure; }; using Components_data = std::vector<Component>; Depending on the processing you do this might be slower or faster, needs measuring if that's important. It /is/ however IMO very likely much more convenient and easy to use correctly. - Alf |
Michael S <already5chosen@yahoo.com>: Dec 07 01:34PM -0800 On Wednesday, December 7, 2022 at 10:54:23 PM UTC+2, Lynn McGuire wrote: > But it does not work. > Thanks, > Lynn Works fine on godbolt's vs2015 https://godbolt.org/z/8WPo33Yn1 |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 07 10:37PM +0100 Sorry, this Lady is writing code with several hundered thousand lines of code and has issues with such things ... ??? |
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Dec 07 01:49PM -0800 Lynn McGuire <lynnmcguire5@gmail.com> writes: [...] > std::vector <std::string> component_names (max_ncp); > or > std::vector <std::string> component_names (max_ncp, ""); How does it express its dislike? Both work for me: #include <vector> #include <string> #include <iostream> int main() { int max_ncp = 42; { std::vector<std::string> component_names(max_ncp); std::cout << component_names.size() << ' '; } { std::vector<std::string> component_names(max_ncp, ""); std::cout << component_names.size() << '\n'; } } The output is "42 42". > But it does like > std::vector <std::string> component_names [max_ncp]; > But it does not work. That defines an array (not a std::array, just a C-style array object) of max_ncp vectors. I think you want a single vector. Do you want to set the initial size or the initial capacity? -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for XCOM Labs void Void(void) { Void(); } /* The recursive call of the void */ |
Lynn McGuire <lynnmcguire5@gmail.com>: Dec 07 04:01PM -0600 On 12/7/2022 3:49 PM, Keith Thompson wrote: > That defines an array (not a std::array, just a C-style array object) of > max_ncp vectors. I think you want a single vector. > Do you want to set the initial size or the initial capacity? One vector with max_ncp strings in it. Thanks, Lynn |
"Öö Tiib" <ootiib@hot.ee>: Dec 06 09:38PM -0800 On Wednesday, 7 December 2022 at 00:04:34 UTC+2, Scott Lurndal wrote: > 1) most C++ programmers don't bother or don't know how > 2) Even then there is unnecessary overhead unless the allocator is pool based. > KISS applies, always. No one argues with that. Just that keeping it simple is far from simple. For example it is tricky to keep dynamic allocations minimal. That is not fault of smart pointers. The std::unique_ptr helps at places where dynamic allocations are needed greatly, especially when there can be exceptions. It has next to no overhead. Yes, the std::shared_ptr is loaded. Usage of std::make_shared helps a bit but thinking about how to make it simpler and to get rid of shared ownership or even dynamic allocations is hard and not always fruitful. |
Juha Nieminen <nospam@thanks.invalid>: Dec 07 09:05AM > A thread-safe reference counted pointer can heavily damage performance > in certain usage scenarios. Blasting the system with memory barriers and > atomic RMW ops all over the place. If you really need to copy/assign smart pointers in tight number-crunching inner loops, then perhaps *don't* copy/assign such smart pointers in such loops (or use any smart pointers for that matter)? In scenarios that don't require the last clock cycles squeezed out of them, does a "memory barrier" or other optimization hindrances really matter all that much? If code that takes 0.01% of the total runtime gets 1% slower... how much will it slow down the overall program? Do the math. |
scott@slp53.sl.home (Scott Lurndal): Dec 07 03:22PM >No one argues with that. Just that keeping it simple is far from simple. >For example it is tricky to keep dynamic allocations minimal. That is >not fault of smart pointers. In my experience, it has been generally sufficent to pre-allocate the data structures and store them in a table or look-aside list, as the maximum number is bounded. For example, an application handling network packets on a processor with 64 cores, may only need 128 jumbo packet buffers if the packet processing thread count matches the core count. These can be preallocated and then passed as regular pointers throughout the flow.\ (Specialized DPUs have a custom hardware block (network pool allocator) that allocates hardware buffers to packets on ingress and those buffers are passed by hardware to the other blocks in the flow, such as blocks to identify the flow, fragment/defragment a packet, apply encryption/decryption algorithms, all controlled by a hardware scheduler block, etc.). Likewise for a simulation of an internal processor interconnect such as ring or mesh structure, there are a fixed maximum number of flits than can be active at any point in time. Preallocating them into a lookaside list eliminates allocation and deallocation overhead on every flit. When simulating a full SoC, the maximum inflight objects is likewise bounded and for the most part, can be preallocated. |
Paavo Helde <eesnimi@osa.pri.ee>: Dec 07 06:37PM +0200 07.12.2022 17:22 Scott Lurndal kirjutas: > In my experience, it has been generally sufficent to pre-allocate the > data structures and store them in a table or look-aside list, > as the maximum number is bounded. My experience is more that the user wants to read in unknown number of tiff files containing unknown number of image frames of unknown sizes, then start to process them by script-driven flexible algorithms, producing an unknown number of intermediate and final results of unknown size. And this processing ought to be as fast as possible, as nobody wants to wait for hours (although with large data sets and complex processing it inevitable gets into hours). And this processing ought better to make use of all the cpu cores and should not run out of computer memory while doing that. So it seems preallocating a fixed number of data structures of fixed size would not really work in my case. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 07 12:48PM -0800 On 12/7/2022 1:05 AM, Juha Nieminen wrote: > If you really need to copy/assign smart pointers in tight number-crunching > inner loops, then perhaps *don't* copy/assign such smart pointers in such > loops (or use any smart pointers for that matter)? It really rears its ugly head when iterating large linked lists of nodes... Read mostly, write rather rarely. > In scenarios that don't require the last clock cycles squeezed out of > them, does a "memory barrier" or other optimization hindrances really > matter all that much? Big time. Have you ever studied up on RCU? That is one of the reasons it was created in the first place: to get rid of memory barriers on the read side of the algorithm. > If code that takes 0.01% of the total runtime > gets 1% slower... how much will it slow down the overall program? > Do the math. RCU beats them all, it is memory barrier free, well except for systems that _need_ a membar for data-dependent loads, ala dec alpha. Iirc, SPARC in RMO mode does not even need membars for such loads. Proxy collection does pretty damn good, but not as good as RCU... The membars take a big toll, especially the god damn #StoreLoad barrier in SMR (aka, hazard pointers). |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 07 12:50PM -0800 On 12/7/2022 12:48 PM, Chris M. Thomasson wrote: > collection does pretty damn good, but not as good as RCU... > The membars take a big toll, especially the god damn #StoreLoad barrier > in SMR (aka, hazard pointers). Fwiw, here is a paper on SMR (Safe Memory Reclamation): https://www.liblfds.org/downloads/white%20papers/%5BSMR%5D%20-%20%5BMichael%5D%20-%20Hazard%20Pointers;%20Safe%20Memory%20Reclaimation%20for%20Lock-Free%20Objects.pdf Joe Seigh cleverly combined SMR with RCU to get rid of the NASTY #StoreLoad membar in SMR. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment