- What is "TP_NUM_C_BUFS"? - 9 Updates
- The unobtrusive initialization in C++17 - 4 Updates
- How to split a char array? - 1 Update
- Stringify a list? - 2 Updates
Robbie Hatley <see.my.sig@for.my.address>: Apr 10 08:59PM -0700 I spent most of today struggling to fix a bug in a function, a regular-expression substitution function based on the GNU C library's regex facilities. (Though, that library is pissing me off, so I may end up going to the new C++ regex functionality instead.) The point where the bug occurred looked something like this: std::string rhregex::Substitute(std::string const & Text...) { ...hundreds of lines of code... cerr << "In rhregex::Substitute(), about to set Text2 equal to Text." << endl; std::string Text2 = Text; // CATASTROPHIC PROGRAM CRASH!!! Now, there's nothing wrong with that code itself. It just copies the contents of one string (via a const ref function parameter) to a second string. Legal, harmless, safe, impossible to crash. AND YET, the program was crashing at that point consistently! The error messages sometimes read: segmentation fault, core dumped And on other occasions read: 0 [main] renamefiles 6040 C:\cygwin64\rhe\bin\util\renamefiles.exe: *** fatal error - Internal error: TP_NUM_C_BUFS too small: 50 Hangup (And on one occasion the "error message" consisted of several hundred pages of absolute gibberish: graphics characters, smudges, planet symbols, file listings from my current directory, the contents of a couple of C++ source files, all jumbled up. Talk about "undefined behavior"!!! That was the most "undefined" result I've ever got from running *any* computer program!) So... what are "TP_NUM_C_BUFS"? And why is it a problem that there's only 50 of them? And why were they making my program crash? Now, for information purposes, I *did* finally make the program stop crashing. It was a process of elimination, commenting-out larger and larger portions of the code, starting from the crash point and working UP towards the top of the function. The CAUSE turned out to be the very first line of the function: regex_t RegEx; // ACTUAL CAUSE OF PROGRAM CRASH To make the program stop crashing, I replaced that line of code with the following: regex_t * RegEx = NULL; RegEx = reinterpret_cast<regex_t *>(malloc(sizeof(regex_t))); assert(RegEx); So obviously the regex_t type has the buggy undocumented "feature" that you can't safely define it as an auto variable on the stack; it *must* be malloc'ed on the heap instead. I wish this factoid had made it into the GNU C-lib manual. (And this explains all the other crashes, messages, and gibberish: stack corruption.) But the nagging question remains: What is "TP_NUM_C_BUFS"??? -- Cheers, Robbie Hatley Midway City, CA, USA perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"' http://www.well.com/user/lonewolf/ https://www.facebook.com/robbie.hatley |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 11 06:47AM +0200 On 11.04.2016 05:59, Robbie Hatley wrote: > << "In rhregex::Substitute(), about to set Text2 equal to Text." > << endl; > std::string Text2 = Text; // CATASTROPHIC PROGRAM CRASH!!! Probably `Text` has been initialized with 0 or `nullptr`. That arms an UB bomb. Other causes include a buffer overflow somewhere (ditto). > regex_t * RegEx = NULL; > RegEx = reinterpret_cast<regex_t *>(malloc(sizeof(regex_t))); > assert(RegEx); Unless `RegEx` is a POD type this introduces more UB. And if it doesn't, then its only effects are to (1) add a time consuming operation, (2) hide the real problem, and (3) possibly introducing a memory leak or other such issue. The idea of searching systematically for the problem is good. But most likely the problem is in the calling code, not in the function itself. > But the nagging question remains: What is "TP_NUM_C_BUFS"??? It's just the g++ way of saying, "got you... muwhahaha!" Cheers & hth., - Alf |
Daniel <danielaparker@gmail.com>: Apr 10 11:05PM -0700 On Sunday, April 10, 2016 at 11:59:54 PM UTC-4, Robbie Hatley wrote: > But the nagging question remains: What is "TP_NUM_C_BUFS"??? It indicates stack corruption. It may be a cygwin issue (similar issues have been reported by others.) Daniel |
Paavo Helde <myfirstname@osa.pri.ee>: Apr 11 12:58PM +0300 On 11.04.2016 6:59, Robbie Hatley wrote: > 0 [main] renamefiles 6040 C:\cygwin64\rhe\bin\util\renamefiles.exe: > *** fatal error - Internal error: TP_NUM_C_BUFS too small: 50 > Hangup Looks like an Undefined Behavior bug expressed. Probably some memory corruption somewhere. > that you can't safely define it as an auto variable on the stack; > it *must* be malloc'ed on the heap instead. I wish this factoid > had made it into the GNU C-lib manual. No, this is not the reason and you did not fix the issue. All you did was to move the memory corruption to some other place, hiding the bug. An hidden bug is much worse than a clear bug. Suggesting to review your code to find out the actual bug. Most probably you are passing invalid arguments to the library. It is not excluded that the bug is in the library itself, the CygWin ports are not very widely used or tested, so there is some chance (say 1%) that the bug is not in your code. HTH Paavo |
Robbie Hatley <see.my.sig@for.my.address>: Apr 11 11:24AM -0700 On 4/11/2016 2:58 AM, Paavo Helde wrote: > An hidden bug is much worse than a clear bug. Suggesting to review your code > to find out the actual bug. Most probably you are passing invalid arguments > to the library. I want to say it can't be so, but that is a possibility, because the library I'm using (Cygwin C Library) is not written by any one person or company, but is a hodge-podge of code stolen from Red Hat, BSD, and FSF/GNU, along with some written by the Cygwin folks themselves. And while Cygwin's C Lib comes with a "Red Hat C Standard Library" html documentention file, it doesn't cover their Regex stuff. GNU's C Library Manual covers it, but it quite vague in many places (such as, doesn't tell you if a function allocates memory for an input variable, or if you should allocate it yourself, and if so, whether it should be stack, static, dynamic, or whatever). And the Regex stuff in Cygwin's C Lib was actually taken from BSD, so I suppose I should hunt down a BSD C Lib manual. Sigh. What a mess. But I just learned that there *is* a better way (see below).... > It is not excluded that the bug is in the library itself, the CygWin > ports are not very widely used or tested, so there is some chance (say 1%) > that the bug is not in your code. Sounds like a good reason to avoid the C Lib in C++ code. Especially since I just learned this morning (from reading Stroustrup's "The C++ Programming Language 4th Ed.", which I just got my hands on; the version I WAS using was the 2000 edition) that I can replace the entire function I was creating -- a regex substitution function -- with the following, using the new C++ 2011 std lib: // #include <regex.h> // No, DON'T include this piece of $#%^! #include <regex> // Let's try THIS header instead. regex_replace(.......); // Already written for me. Cool! :D Replaces 343 lines of code with... well... 0. I can erase my entire homebrew regex module, seeing as how C++ 2011 std lib already has easy-to-use regex match, search, replace tools. Cool. Well, it was any interesting exercise in tracing an elusive memory corruption bug (which is clearly what it was/is). Perhaps I'll pursue debugging my buggy C regex module later as an exercise. But for any serious usage I think I'll use C++ regex instead. I *still* wonder what the heck "TP_NUM_C_BUFS" is, though. :-) -- Cheers, Robbie Hatley Midway City, CA, USA perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"' http://www.well.com/user/lonewolf/ https://www.facebook.com/robbie.hatley |
Paavo Helde <myfirstname@osa.pri.ee>: Apr 11 09:45PM +0300 On 11.04.2016 21:24, Robbie Hatley wrote: > Replaces 343 lines of code with... well... 0. I can erase my entire > homebrew regex module, seeing as how C++ 2011 std lib already > has easy-to-use regex match, search, replace tools. Cool. Good to hear! Yes of course it is better to use "builtin" C++ regex whenever possible. And I am still pretty sure the actual bug was in some of those deleted 343 lines (and deleting them was a good way to get rid of the bug!) > debugging my buggy C regex module later as an exercise. But for > any serious usage I think I'll use C++ regex instead. > I *still* wonder what the heck "TP_NUM_C_BUFS" is, though. :-) According to Google, this is a constant defined in some cygtls.h header file (e.g. https://sourceforge.net/u/earnie/msys2/msys2/ci/01e8fa7372b33b9766bb5dd3e059cfab695bbf90/tree/winsup/cygwin/cygtls.h). "tls" means thread local storage, a known source of problems and confusion on Windows. But I believe your issue had nothing to do with the thread local storage, it's just that this code bothered to do some consistency checks and noticed the memory corruption. Cheers Paavo |
Robbie Hatley <see.my.sig@for.my.address>: Apr 11 12:18PM -0700 On 4/10/2016 9:47 PM, Alf P. Steinbach wrote: > Probably `Text` has been initialized with 0 or `nullptr`. > That arms an UB bomb. At the point where the program always crashed, Text is a std::string const & parameter to my "substitute" function, and it refers to a std::string containing "randip.c". Drastically simplified call chain: int main () { ... CallingFunc(); ... return 0; } int CallingFunc () { ... std::string source = "randip.c"; std::string pat = "and"; std::string repl = "or"; std::string result = substitute(source, pat, repl, 'g'); ... return 0; } std::string substitute ( std::string const & Text, std::string const & Pat, std::string const & Repl, char Flag // 'g' for global ) { regex_t Regex; // line which appears to have corrupted stack ... hundreds of lines of code... std::string Text2 = Text; // accesses previously corrupted stack??? ... hundreds more lines of code... return Text2; } > Other causes include a buffer overflow somewhere (ditto). I don't make much use of fixed-length buffers in C++ code, preferring std::string over char[], vector<MyClass> over MyClass[], etc. So it's not something as simple as stick 73 objects in a 50-object array. Or if it is, it's in library code. Perhaps that's what the error msg, "TP_NUM_C_BUFS too small: 50" was all about. :::shrugs::: > Unless `RegEx` is a POD type this introduces more UB. And if it doesn't, > then its only effects are to (1) add a time consuming operation, > (2) hide the real problem, The "regex_t" type from the BSD and GNU C libs's "regex.h" headers is poorly documented, so it's hard to know what's in it. The documentation says "This struct contains several more fields than are documented here, but you don't need to know about those, as only library internal functions should access them". So GNU's attitude is, "You don't want to know; and if you do, bugger off." But I'm curious: why would dynamically allocating memory for an object which later dynamically allocates still more memory for its own use be a problem? Seems straightforward to me. Since this is a C library, I doubt regex_t has "member functions"; the struct itself is likely POD, but it definitely has associated functions in the library for dynamic memory allocation. > and (3) possibly introducing a memory leak or other such issue. A function "regfree" is provided for freeing memory allocated by objects of type regex_t (or more accurately, by their associated functions), but the documentation says "regfree doesn't free the object itself", implicitly implying that you malloced the regex_t object. And indeed, when I did that, the bug seemed to go away. But creating and freeing a regex_t object is a hassle: // Make the object: regex_t * RegEx = NULL; RegEx = (regex_t *)malloc(sizeof(regex_t)); assert(RegEx); ... use the object.... // Free the object: regfree(RegEx); assert(RegEx); free(RegEx); > The idea of searching systematically for the problem is good. > But most likely the problem is in the calling code, not in the > function itself. I suppose it's possible. I'll also look there, and see if I'm doing anything funky. But I don't think it's as simple as sending wrong stuff to substitute(), as those arguments are very straightforward. > > But the nagging question remains: What is "TP_NUM_C_BUFS"??? > It's just the g++ way of saying, "got you... muwhahaha!" :-) Well, the Cygwin C regex module appears to be the work of Henry Spencer of the University of Toronto, who contributed it to UC Berkeley, who used it in BSD, from whence Cygwin borrowed and modified it, stirring in stuff from Red Hat and FSF/GNU. So maybe it was Henry Spencer, or Richard Stallman, or someone else, who thought, "HEY, I know, I'll name the error code for when xxxxxxx happens 'TP_NUM_C_BUFS too low'; that'll confuse the hell out of our users! MUAH-HA-HA-HA-HA!!!" Anyway, long-term fix is to go over to using header <regex> (C++ 2011) instead of header <regex.h> (inherited from C). It already has a regex_replace() function, making my module obsolete except as an exercise in troubleshooting obscure memory corruption bugs. (I just learned of the existence of the 2011 C++ std lib regex functionality this very morning.) -- Cheers, Robbie Hatley Midway City, CA, USA perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"' http://www.well.com/user/lonewolf/ https://www.facebook.com/robbie.hatley |
scott@slp53.sl.home (Scott Lurndal): Apr 11 08:08PM >and if so, whether it should be stack, static, dynamic, or whatever). >And the Regex stuff in Cygwin's C Lib was actually taken from BSD, so >I suppose I should hunt down a BSD C Lib manual. Sigh. What a mess. http://pubs.opengroup.org/onlinepubs/9699919799/functions/regexec.html http://pubs.opengroup.org/onlinepubs/9699919799/ |
legalize+jeeves@mail.xmission.com (Richard): Apr 11 08:39PM [Please do not mail me a copy of your followup] Robbie Hatley <see.my.sig@for.my.address> spake the secret code > << endl; > std::string Text2 = Text; // CATASTROPHIC PROGRAM CRASH!!! > [...] I'm going to take a step back here to add a viewpoint that hasn't yet been expressed in this thread. I agree that it is unlikely that the GNU C Library Regex facility requires that it's values be allocated on the heap in order to function properly. That would be unusual for the any Standard Library facility. They're designed intentionally so that you "only pay for what you use". TL;DR: Monster methods are a source of bugs. Refactor them into smaller units that your brain can reason about with confidence. If this code is easy to test in isolation, I should first write a unit test for it that reproduces the problem. What you've written suggests that this is a "monster method". Writing an isolated test for it is likely to be difficult because it almost certainly doesn't have "hundreds of lines of code" that execute in a linear fashion. That means lots of branching and conditionally executed code. So writing a test is likely to be difficult. If not, certainly write one that reproduces the problem. That "hundreds of lines of code" is really bugging me. Chances are that there is another "hundreds of lines of code" after the part you're showing us. Compose Method is your friend here.[1] (See below for an alternative: Replace Method with Method Object.) Most likely your "hundreds of lines of code" is introducing lots and lots of local variables throughout. Almost certainly it was decided that reusing local variables was preferable to each chunk of code using only it's own variables. Code like this makes it difficult to apply Compose Method because there is lots of shared state (the variables) between various chunks of code. They aren't coupled through the values of the variables because each chunk overwrites the value with it's own local initialization, but they are coupled through the names of the variables that are reused. In order to break this up, you'll want to refactor each isolated use of a reused variable into a use of a distinct variable. Unfortunately there isn't any good tool for C++ that can do this automatically and reliably, so you'll need to do it by hand. Here's where you should lean on your version control system really heavily. Each time you decouple a reused variable you'll want to decouple a single reuse instance and then make a commit into your version control system for it. Yes, this may result in 50+ commits for these several hundred lines of code, but so what? Commits are free. From the way you describe it, I assume this code has 0% unit test coverage. Small, slow steps are what you need in order to bring some sanity to this monster method. When you transform code in small steps anyone can follow the version control history and see what was done and how it was done. When a huge method like this is refactored in one gigantic step, then looking at the diffs becomes pointless. The diff viewer will say "several hundred lines of code were removed and replaced with several hundred other lines of code", making for an unintelligible change. When looking at the diff, the average person will be hard pressed to determine if the change was for the better or not in terms of bugs and preserved functionality. Once we've reduced the coupling of various chunks of code, we can start to Extract Method on the various chunks in order to achieve Compose Method. Here, there are good tools to help: CLion for Linux/Mac and VS2015/ReSharper for C++ for Windows. They all have pretty good automated Extract Method functionality that handles the dataflow needed to properly deduce input arguments, return types and their proper signatures. (Sorry, VisualAssist fairs pretty poorly here and I can't recomment it as it causes too much manual fixup after the fact.) If they don't deduce the types to best practices (passing std::string input args by value instead of const reference), then applying Change Signature on the extracted function gets you where you want to be. Why do Compose Method? We do it because large methods with lots of local variables are nearly impossible to reason about with confidence. Breaking a monster method up into a series of calls to smaller methods that we named with intention revealing names[2] let's us see the forest instead of the trees. Once things are broken down into small chunks we can reason about easily, we often spot the small error that caused the bug in the first place. It was "hiding in plain sight" inside that monster method. If you end up with very large argument lists because all these extracted chunks are communicating large amounts of state between them, then Replace Method with Method Object[3] may be a better alternative to Compose Method. With a Method Object, almost everything that was local state in the monster method becomes instance variables in the new object. This instance state doesn't need to be passed around between methods. We apply the same process as in Compose Method to break the monster method into smaller chunks. There's something nice that can happen when we use a Method Object instead of Compose Method on an existing object. The new Method Object can make all of it's extracted methods public, increasing the visible surface area that we can unit test from the outside. The Method Object is visible to us internally for testing purposes, but isn't exposed to clients as part of our public API anymore than the local variables in the monster method were exposed publicly. [1] <https://www.industriallogic.com/xp/refactoring/composeMethod.html> [2] <http://c2.com/cgi/wiki?IntentionRevealingNames> [3] <http://refactoring.com/catalog/replaceMethodWithMethodObject.html> -- "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline> The Computer Graphics Museum <http://computergraphicsmuseum.org> The Terminals Wiki <http://terminals.classiccmp.org> Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com> |
woodbrian77@gmail.com: Apr 10 08:30PM -0700 On Sunday, April 10, 2016 at 4:15:41 PM UTC-5, Öö Tiib wrote: > Oh. AFAIK also woodbrian is devoted follower of :: before std. > He claims that it protects his middlewriter's code from some sort of > evil and stupid users. I use ::std to prevent surprises for users. We discussed this several times now. Brian Ebenezer Enterprises - In G-d we trust. http://webEbenezer.net |
Daniel <danielaparker@gmail.com>: Apr 10 09:22PM -0700 > I use ::std to prevent surprises for users. Interesting point, you might also want to consider using ::god, to avoid other surprises Daniel |
Juha Nieminen <nospam@thanks.invalid>: Apr 11 07:14AM >> auto p { ::std::make_pair( 0, 0 ) }; >> ? > There is nothing "wrong" with it. Wasn't it so that the = initialization requires for the type to have a callable copy constructor (even if the compiler never produces the call)? Thus the two types of initialization are not completely identical. --- news://freenews.netfront.net/ - complaints: news@netfront.net --- |
Victor Bazarov <v.bazarov@comcast.invalid>: Apr 11 08:28AM -0400 On 4/11/2016 3:14 AM, Juha Nieminen wrote: > Wasn't it so that the = initialization requires for the type to have a > callable copy constructor (even if the compiler never produces the call)? > Thus the two types of initialization are not completely identical. Yes. Does it make it "wrong"? Or is it so for 'std::pair'? V -- I do not respond to top-posted replies, please don't ask |
Juha Nieminen <nospam@thanks.invalid>: Apr 11 07:09AM > It remains painfully obvious that it is YOU who doesn't know when to use > std::list as you are only aware of one (non-ideal) use-case (range variant > of splice). You seem to be like a broken record. Did you have an actual argument to make? --- news://freenews.netfront.net/ - complaints: news@netfront.net --- |
Robbie Hatley <see.my.sig@for.my.address>: Apr 10 10:17PM -0700 On 4/8/2016 4:24 AM, Christian Gollwitzer wrote: > generate this code > const char * text[] = { "add", "subtract", "multiply", NULL }; > void * fun[] = { add, subtract, multiply }; Easy as pie. Save the following to file called "table-macro-test.cpp": #define TABLE(X,Y,Z) \ const char * text[] = { #X, #Y, #Z, NULL }; \ void * fun[] = { add, subtract, multiply }; TABLE ( add , subtract , multiply ) Then run that through the preprocessor: %cpp table-macro-test.cpp # 1 "table-macro-test.cpp" # 1 "<built-in>" # 1 "<command-line>" # 1 "table-macro-test.cpp" const char * text[] = { "add", "subtract", "multiply", NULL }; void * fun[] = { add, subtract, multiply }; Voila. -- Cheers, Robbie Hatley Midway City, CA, USA perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"' http://www.well.com/user/lonewolf/ https://www.facebook.com/robbie.hatley |
Christian Gollwitzer <auriocus@gmx.de>: Apr 11 08:28AM +0200 Am 11.04.16 um 07:17 schrieb Robbie Hatley: > const char * text[] = { #X, #Y, #Z, NULL }; \ > void * fun[] = { add, subtract, multiply }; > TABLE ( add , subtract , multiply ) Thanks for your answer. This works for exactly three arguments. I want to use the macro several times with a varying number of arguments. The rest of the macro is also larger and generates a function. Fortunately, I found a solution: This macro: https://github.com/swansontec/map-macro can perform a maplike operation over a list of up to 360 arguments. I needed to enhance it to pass an extra argument to the "lambda"-macro, which turned out to be easy: https://github.com/auriocus/AsynCA/blob/master/generic/map.h Best regards, Christian |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment