- DataItem cache - 6 Updates
- Commenting code considered harmful - 19 Updates
| Paavo Helde <myfirstname@osa.pri.ee>: Feb 04 11:29PM +0200 On 4.02.2016 23:19, Lynn McGuire wrote: > for our software. In the latest release, there can be tens of millions > of these objects which is using up all our memory in Windows (we run out > at 1.9 GB of memory usage). AFAIK there is a simple way to increase that to 3 GB. > I am wondering if I can make this more efficient (less memory usage). > Any thoughts here? One of my staff is totally for this and another is > not. Totally for what? What is the alternative? |
| Lynn McGuire <lmc@winsim.com>: Feb 04 03:19PM -0600 I have an object called DataItem that is the basic variant storage unit for our software. In the latest release, there can be tens of millions of these objects which is using up all our memory in Windows (we run out at 1.9 GB of memory usage). We cannot change to x64 at the moment so I have decided to build a DataItem cache and use one DataItem for many of the same objects wherever possible. I will use copy on write mechanism to create new DataItems that are being modified. I have structured the DataItem cache using a vector inside a vector inside a map: static std::map <int, std::vector <std::vector <DataItem *>>> g_DataItem_Cache; In other words, a sparse cube. The outside map is for the identity of the major object type, i.e. SYM_AirCoolerGroup. The middle vector is for the index of the DataItems in that group, i.e. AIR_DUT. The inside vector is for the various different copies of DataItems that are referenced by that Data Group type and index. Each Data Group will need to know which version of the Data Item that it is using. I am wondering if I can make this more efficient (less memory usage). Any thoughts here? One of my staff is totally for this and another is not. I am about 25% complete on making the code changes for testing. Thanks, Lynn |
| Lynn McGuire <lmc@winsim.com>: Feb 04 04:03PM -0600 On 2/4/2016 3:29 PM, Paavo Helde wrote: >> Any thoughts here? One of my staff is totally for this and another is >> not. > Totally for what? What is the alternative? We will hit the 3 GB barrier also using the current storage methodology. My other programmer wants to move to x64. At least 1/4 of our customers are still running x86 Windows, not gonna happen yet. Thanks, Lynn |
| Marcel Mueller <news.5.maazl@spamgourmet.org>: Feb 05 08:34AM +0100 On 04.02.16 22.19, Lynn McGuire wrote: > have decided to build a DataItem cache and use one DataItem for many of > the same objects wherever possible. I will use copy on write mechanism > to create new DataItems that are being modified. Whether COW is efficient or not depends on the recombination rate, i.e. how often happens a modified item to be identical to another instance. If this is likely COW is bad. > I have structured the DataItem cache using a vector inside a vector > inside a map: If you use COW you do not need a cache at all. You just need to deal with references that are aware of COW. As soon as you intend to modify the item a copy is made. > static std::map <int, std::vector <std::vector <DataItem *>>> > g_DataItem_Cache; If you need an index then you probably want to do deduplication rather than COW. I.e. you seek for identical or matching instances before (or after) you create new ones. Because you easily will run into serious race-conditions here I strongly recommend to use smart pointers here. Intrusive reference count is the first choice. Of course, if your application never uses a second thread you are safe. > In other words, a sparse cube. The outside map is for the identity of > the major object type, i.e. SYM_AirCoolerGroup. The middle vector is > for the index of the DataItems in that group, i.e. AIR_DUT. Note that vector is quite inefficient in dealing with sparse content. Direct lookup is only efficient for types with a small domain like enum types where most of the values are really used. Furthermore std::map is not efficient with respect to memory and memory cache efficiency too if the number of nodes becomes large. The typical implementations uses a Red-black tree. You should prefer a B-tree if memory counts. Almost every database do so. Unfortunately this is not part of the standard. But there are good public implementations available. E.g. Google published a quite good Java implementation that can be ported to C++ with reasonable effort (take care of license issues). > vector is for the various different copies of DataItems that are > referenced by that Data Group type and index. Each Data Group will need > to know which version of the Data Item that it is using. Version? > I am wondering if I can make this more efficient (less memory usage). > Any thoughts here? One of my staff is totally for this and another is > not. I am about 25% complete on making the code changes for testing. Deduplication can significantly reduce memory usage. Typically for business database are factors in the order of 10. This is basically one of the concepts behind in memory databases. I also have achieved factors up to 100 in some applications. But there are challenges, too. Fist of all you strictly need to distinguish between read and write access. For efficiency reasons writable instances should never make it to the main index. You should share only immutable instances. Otherwise you will need to synchronize until death. I recommend to put this in the type system. I.e. make DataItem immutable and use LocalDataItem as local, writable copy without deduplication. LocalDataItem should inherit from DataItem to make reading code to be able to deal with a mix of a both, i.e. many immutable instances and a few mutable ones. At least you should ensure that the shared instances use a compact memory representation, i.e. no half full vectors and so on. Even std::string is not the best choice as it is optimized for mutability. To give further hints, more knowledge about the structure of the DataItems and your application is required. What are their properties? Why do many instances have the same data? (Otherwise your concept would not work.) What kind of data do they contain? Strings? Maybe it is easier to deduplicate them. What is the typical access pattern to look up the DataItems? Do they have something like a primary key? How do changes apply to the data structures? Transactions? Revisions? Snapshot isolation? Do you have a database backend? What about concurrency? Is it likely that the same items are accessed concurrently? For writing or only for reading? What about the object lifetime? May you have a memory leak? Since you deal with raw pointers (i strongly disadvise to do so) this is not that unlikely. And last but not least: is it really the space for the DataItems that clobbers your memory? Or is it management overhead? Or maybe even fragmentation of the virtual address space? Marcel |
| Ian Collins <ian-news@hotmail.com>: Feb 06 12:01PM +1300 Lynn McGuire wrote: > 4. DataItems are stored in a hierarchical object system using a primary key in DataGroup objects > 5. not sure what you are asking > 6. no database backend If you have a large number of objects that are suitable for deduplication, you might be better off using a proper in memory database. Then you wouldn't have to worry about such things your self. Being a JSON/BSON fan, I tend towards MongoDB for this type of data. If the data is relational, I would look to MySQL in memory tables. > int unitsClass; // nil or the symbol of the class > std::string unitsArgs; // a coded string of disallowed units > std::map <int, std::vector <int> > dependentsListMap; Could these (and the vector above) be fixed size? If not, you might benefit in both space and performance if you use a custom allocator for them. Maps within vectors within maps will probably lead to a very fragmented heap, wasting both memory and possible cache hits. > DataDescriptor * myDataDescriptor; > BOOL scratchChangedComVector; // if the scratch value was changed in the changeComVector() method BOOL? > virtual void discardInput (DataGroup * ownerDG); > public: > // constructor Time for Flibble to start a "Gratuitous considered harmful" thread :) <snip> > std::vector <int> * intArrayValue; > std::vector <double> * doubleArrayValue; > std::vector <std::string> * stringArrayValue; Are these local to the class? If so, the allocator comment above might be relevant. -- Ian Collins |
| Lynn McGuire <lmc@winsim.com>: Feb 05 04:26PM -0600 On 2/5/2016 1:34 AM, Marcel Mueller wrote: > And last but not least: is it really the space for the DataItems that clobbers your memory? Or is it management overhead? Or maybe > even fragmentation of the virtual address space? > Marcel Answers to your questions: 1. part of the DataItem declaration is below 2. many of the DataItem instances are exactly alike since they are snapshots of a user's workspace 3. all kinds of data: strings, integers, doubles, string arrays, double arrays, integer array, strings larger than 300 characters are compressed using zlib 4. DataItems are stored in a hierarchical object system using a primary key in DataGroup objects 5. not sure what you are asking 6. no database backend 7. no concurrency (yet) 8. when the storage used is 1.5 GB, the memory leakage is 10 MB (observed) 8a. the lifetime of the objects is controlled by the user by opening a file or closing a file 9. I think that it is DataItems but will not know for sure until completion of the current deduplication project Here is part of the declaration for the DataItem and DesValue classes. There are no member variables in the ObjPtr class. class DataItem : public ObjPtr { private: int datatype; // Either #Int, #Real, #String or #Enumerated int vectorFlag; // Flag indicating value contains an Array. int descriptorName; // name of Corresponding DataDescriptor // DataGroup * owner; // The DataGroup instance to which this item belongs std::vector <DataGroup *> owners; // The DataGroup instance(s) to which this item belongs DesValue * inputValue; // DesValue containing permanent input value DesValue * scratchValue; // DesValue containing scratch input value int writeTag; // a Long representing the object for purposes of reading/writing int unitsClass; // nil or the symbol of the class std::string unitsArgs; // a coded string of disallowed units std::map <int, std::vector <int> > dependentsListMap; DataDescriptor * myDataDescriptor; BOOL scratchChangedComVector; // if the scratch value was changed in the changeComVector() method protected: virtual void discardInput (DataGroup * ownerDG); public: // constructor DataItem (); DataItem (const DataItem & rhs); DataItem & operator = (const DataItem & rhs); // destructor virtual ~DataItem (); // comparison of equality virtual bool operator == (DataItem const & right) const; virtual bool operator != (DataItem const & right) const; virtual int isDataItem () { return true; }; class DesValue : public ObjPtr { public: int datatype; // Either #Int, #Real, or #String. int vectorFlag; // Flag indicating value contains an Array. int optionListName; // name of the option list item int * intValue; // Either nil, an Int, a Real, a String, or an Array thereof. double * doubleValue; std::string * stringValue; std::vector <int> * intArrayValue; std::vector <double> * doubleArrayValue; std::vector <std::string> * stringArrayValue; unsigned char * compressedData; unsigned long compressedDataLength; std::vector <unsigned long> uncompressedStringLengths; int isTouched; // Flag indicating if value, stringValue, or units have been modified since this DesValue was created. Set to true by setValue, setString, setUnits, and convertUnits. int isSetFlag; // Flag indicating whether the contents of the DesValue is defined or undefined. If isSet is false, getValue returns nil despite the contents of value, while getString and getUnits return the empty string despite the contents of stringValue and units. int unitsValue; // current string value index in $UnitsList (single or top) int unitsValue2; // current string value index in $UnitsList (bottom) std::string errorMessage; // message about last conversion of string to value std::string unitsArgs; // a coded string of disallowed units protected: virtual void deleteValues (); public: // constructor DesValue (); DesValue (const DesValue & rhs); DesValue & operator = (const DesValue & rhs); // destructor virtual ~DesValue (); // comparison of equality virtual bool operator == (DesValue const & right) const; virtual bool operator != (DesValue const & right) const; Lynn |
| Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 03 05:35PM Verbose commenting of code (especially of implementation details) can be dangerous as quite often the code evolves and old comments are not updated in tandem with the code changes and become out of date no longer reflecting what the code actually does. These erroneous comments can have disastrous consequences if incorrect assumptions are made based on them. The best form of documentation is the code itself! /Flibble |
| Christian Gollwitzer <auriocus@gmx.de>: Feb 03 07:08PM +0100 Am 03.02.16 um 18:44 schrieb Gareth Owen: >> be dangerous ... The best form of documentation is the code itself! > Bravo! You have excelled yourself. > Expect a million bites, and not just on sausages. Where it hurts most :P |
| Gareth Owen <gwowen@gmail.com>: Feb 03 05:44PM > Verbose commenting of code (especially of implementation details) can > be dangerous ... The best form of documentation is the code itself! Bravo! You have excelled yourself. Expect a million bites, and not just on sausages. |
| Ian Collins <ian-news@hotmail.com>: Feb 04 08:01AM +1300 Mr Flibble wrote: > have disastrous consequences if incorrect assumptions are made based on > them. > The best form of documentation is the code itself! Aren't you going to offer up your critique of Uncle Bob's TDD sausages? -- Ian Collins |
| Vir Campestris <vir.campestris@invalid.invalid>: Feb 03 09:46PM On 03/02/2016 17:35, Mr Flibble wrote: > them. > The best form of documentation is the code itself! > /Flibble I'll bite. It can't make things wurst. A good comment tells you what the code is supposed to do, and tells you why it doesn't do something else that seems obvious. The code tells you what it does. Nothing more. Andy |
| Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 03 11:04PM On 03/02/2016 21:46, Vir Campestris wrote: > A good comment tells you what the code is supposed to do, and tells you > why it doesn't do something else that seems obvious. > The code tells you what it does. Nothing more. Well written and designed code with sensible, descriptive variable, function and class names is virtually self-documenting. /Flibble |
| Ian Collins <ian-news@hotmail.com>: Feb 04 12:09PM +1300 Mr Flibble wrote: > Well written and designed code with sensible, descriptive variable, > function and class names is virtually self-documenting. Especially if it was written with TDD where you have a lovely set of tests that tell you exactly what the code does :) Aren't you going to offer up your critique of Uncle Bob's TDD sausages? -- Ian Collins |
| Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 03 11:19PM On 03/02/2016 23:09, Ian Collins wrote: > Especially if it was written with TDD where you have a lovely set of > tests that tell you exactly what the code does :) > Aren't you going to offer up your critique of Uncle Bob's TDD sausages? Perhaps your problem is that you are confusing TDD with unit testing? Unit tests are great, TDD isn't. /Flibble |
| Ian Collins <ian-news@hotmail.com>: Feb 04 12:24PM +1300 Mr Flibble wrote: >> Aren't you going to offer up your critique of Uncle Bob's TDD sausages? > Perhaps your problem is that you are confusing TDD with unit testing? > Unit tests are great, TDD isn't. Nope. Aren't you going to offer up your critique of Uncle Bob's TDD sausages? -- Ian Collins |
| Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 04 10:35AM On 04/02/2016 07:11, Öö Tiib wrote: > break some unit test naively and waste her time. If there are also no > unit tests that demonstrate the reason why then that typically results > with regression. Nonsense. Why is not important, what is. If you were implementing std::copy would you comment why? Of course not, the what is what is important and the code itself tells you what. /Flibble |
| "Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 04 12:17PM +0100 On 2/3/2016 6:35 PM, Mr Flibble wrote: > have disastrous consequences if incorrect assumptions are made based on > them. > The best form of documentation is the code itself! I agree with all that. Of course there are exceptions. But in favor of your view, I once had to help a colleague with a little Java class dealing with timestamps. I first sent her a simple non-commented class she could use as starting point, and she was well satisfied with that. However, our project coding guidelines required comments on everything, to serve as automatically generated documentation, and I had a little free time so I added what I thought was reasonable commenting and sent that. This would be very helpful, I thought, and the code was exactly the same. But now the clear understanding evaporated, "I don't understand any of this!". I guess what happened was not that the comments misled intellectually, but that with comments added the code LOOKED MORE COMPLICATED. In a similar vein, my late father once thought he couldn't use my calculator, because it looked so complex, lots of "math" keys. It didn't matter that the keys he'd use were the same as on other calculators he'd used. There was the uncertainty about the thing. Francis Glassborow once remarked that the nice thing about the introduction of syntax colouring was that one could now configure the editor to show comments as white on white. ;-) Which, I think, goes to show that your sentiment is not new, and is shared by many who have suffered other's "well-commented" code. Looks, not content. Cheers, - Alf |
| JiiPee <no@notvalid.com>: Feb 04 12:05PM On 04/02/2016 11:17, Alf P. Steinbach wrote: >> them. >> The best form of documentation is the code itself! > I agree with all that. Does this mean no comments at all, even not outside the code? Like I make a code to handle 3 base numbers (as I need to have 3 values per slot, not binary values like 101100, but could have 201200). Now explaining the theory (and put couple of examples also) near that code helps me when I come back year after. It speeds up things. In a comment I tell what is the mathematical logic behind it and coupld of short examples. Then its easy to understand the code after that. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
| Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 04 12:08PM On 04/02/2016 12:05, JiiPee wrote: > helps me when I come back year after. It speeds up things. > In a comment I tell what is the mathematical logic behind it and coupld > of short examples. Then its easy to understand the code after that. I guess we can summarize both those points as never document HOW you are doing something as the code itself does that. /Flibble |
| David Brown <david.brown@hesbynett.no>: Feb 04 12:33PM +0100 On 04/02/16 00:09, Ian Collins wrote: > Especially if it was written with TDD where you have a lovely set of > tests that tell you exactly what the code does :) > Aren't you going to offer up your critique of Uncle Bob's TDD sausages? Not long ago, I had the pleasure of bug-fixing code from a different company that combined incompressible code, badly named variables and functions, minimal commenting (some of which was other languages), and no possibility of any sort of testing. However, the authors clearly understood the importance of testing, since the one appropriate comment was "// Test this shit!". |
| JiiPee <no@notvalid.com>: Feb 04 12:32PM On 04/02/2016 12:08, Mr Flibble wrote: >> of short examples. Then its easy to understand the code after that. > I guess we can summarize both those points as never document HOW you > are doing something as the code itself does that. if I explain in the code also why I use that 3-base system, then it helps to understand the code around it. The first question when seeing 3-base calculations there is: "why are we doing it like this? why use 3-base numbers here?". I did have that question when I came back to code months after... and comments above it helped to undertand the motive behind it. The code does not answer questions like "why are we doing like this? what is the motive doing this? why not doing another way? why is this the best way to do this?" --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
| "Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 04 01:43PM +0100 On 2/4/2016 1:32 PM, JiiPee wrote: > helps to understand the code around it. The first question when seeing > 3-base calculations there is: "why are we doing it like this? why use > 3-base numbers here?". That's because the NIM game with 3 heaps has a simple solution in base 3. > The code does not answer questions like "why are we doing like this? > what is the motive doing this? why not doing another way? why is this > the best way to do this?" Could be useful. IMHO it all depends on whether the comments really add something that is useful and can't be easily expressed in the code itself. Cheers!, - Alf |
| JiiPee <no@notvalid.com>: Feb 04 04:07PM On 04/02/2016 12:43, Alf P. Steinbach wrote: > On 2/4/2016 1:32 PM, JiiPee wrote: > IMHO it all depends on whether the comments really add something that > is useful and can't be easily expressed in the code itself. you mean not like this: // here we are looping though all the humans in the vector and printing their information! for(const auto& a : humans) a.print(); hehe --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
| red floyd <no.spam@its.invalid>: Feb 04 11:11AM -0800 On 2/4/2016 10:38 AM, Andrea Venturoli wrote: >> BOOST_CHECK_THROW(z(3,4)) > Think 20-40 lines of those comments, followed by the 20-40 lines of > code, which soon will get out of sync. I once had the dubious "pleasure" of examining code where the standards said that each function would have a block comment describing functionality. So far, so good. However, this code was written in Ada by an idiot who figured that since well written Ada code was "self-documenting", the block comment was literally a commented out copy of the function. Of course, the code was neither well written nor self documenting. FWIW, my block comments look like this: // // func_name() -- one line description // // INPUTS: [input 1] -- description or NONE // ... // // OUTPUTS: [output 1] -- description // ... // // RETURNS: description, or NONE // // Short description of what function is intended to do and why // |
| red floyd <no.spam@its.invalid>: Feb 04 11:16AM -0800 On 2/4/2016 6:03 AM, Jerry Stuckle wrote: > Well written comments indicate WHY the code does what it does. It also > defines input and output conditions to a function, and other information > not part of the code. One time, I wrote a comment that was about five times the length of the actual code. I was working on a Z80, and using 14-bit scaled fixed point trig. To avoid losing precision when I multiplied the sines and cosines, I had worked out a whole bunch of transformations that involved adding angles instead of multiplying sin/cos. The comment was the derivation of the transformations, since the code was non-obvious. However, upon reading the comment, anyone familiar with trig would understand what I had done, and why. |
| You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment