Monday, April 11, 2016

Digest for comp.lang.c++@googlegroups.com - 16 updates in 4 topics

Robbie Hatley <see.my.sig@for.my.address>: Apr 10 08:59PM -0700

I spent most of today struggling to fix a bug in a function,
a regular-expression substitution function based on the GNU C
library's regex facilities. (Though, that library is pissing me off,
so I may end up going to the new C++ regex functionality instead.)
 
The point where the bug occurred looked something like this:
 
std::string rhregex::Substitute(std::string const & Text...)
{
...hundreds of lines of code...
cerr
<< "In rhregex::Substitute(), about to set Text2 equal to Text."
<< endl;
std::string Text2 = Text; // CATASTROPHIC PROGRAM CRASH!!!
 
Now, there's nothing wrong with that code itself. It just copies
the contents of one string (via a const ref function parameter)
to a second string. Legal, harmless, safe, impossible to crash.
 
AND YET, the program was crashing at that point consistently!
The error messages sometimes read:
 
segmentation fault, core dumped
 
And on other occasions read:
 
0 [main] renamefiles 6040 C:\cygwin64\rhe\bin\util\renamefiles.exe:
*** fatal error - Internal error: TP_NUM_C_BUFS too small: 50
Hangup
 
(And on one occasion the "error message" consisted of several hundred
pages of absolute gibberish: graphics characters, smudges, planet
symbols, file listings from my current directory, the contents of
a couple of C++ source files, all jumbled up. Talk about "undefined
behavior"!!! That was the most "undefined" result I've ever got from
running *any* computer program!)
 
So... what are "TP_NUM_C_BUFS"?
And why is it a problem that there's only 50 of them?
And why were they making my program crash?
 
Now, for information purposes, I *did* finally make the program
stop crashing. It was a process of elimination, commenting-out
larger and larger portions of the code, starting from the crash
point and working UP towards the top of the function. The CAUSE
turned out to be the very first line of the function:
 
regex_t RegEx; // ACTUAL CAUSE OF PROGRAM CRASH
 
To make the program stop crashing, I replaced that line of code
with the following:
 
regex_t * RegEx = NULL;
RegEx = reinterpret_cast<regex_t *>(malloc(sizeof(regex_t)));
assert(RegEx);
 
So obviously the regex_t type has the buggy undocumented "feature"
that you can't safely define it as an auto variable on the stack;
it *must* be malloc'ed on the heap instead. I wish this factoid
had made it into the GNU C-lib manual.
 
(And this explains all the other crashes, messages, and gibberish:
stack corruption.)
 
 
 
But the nagging question remains: What is "TP_NUM_C_BUFS"???
 
 
 
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 11 06:47AM +0200

On 11.04.2016 05:59, Robbie Hatley wrote:
> << "In rhregex::Substitute(), about to set Text2 equal to Text."
> << endl;
> std::string Text2 = Text; // CATASTROPHIC PROGRAM CRASH!!!
 
Probably `Text` has been initialized with 0 or `nullptr`.
 
That arms an UB bomb.
 
Other causes include a buffer overflow somewhere (ditto).
 
 
 
> regex_t * RegEx = NULL;
> RegEx = reinterpret_cast<regex_t *>(malloc(sizeof(regex_t)));
> assert(RegEx);
 
Unless `RegEx` is a POD type this introduces more UB. And if it doesn't,
then its only effects are to (1) add a time consuming operation, (2)
hide the real problem, and (3) possibly introducing a memory leak or
other such issue.
 
The idea of searching systematically for the problem is good.
 
But most likely the problem is in the calling code, not in the function
itself.
 
 
 
> But the nagging question remains: What is "TP_NUM_C_BUFS"???
 
It's just the g++ way of saying, "got you... muwhahaha!"
 
 
Cheers & hth.,
 
- Alf
Daniel <danielaparker@gmail.com>: Apr 10 11:05PM -0700

On Sunday, April 10, 2016 at 11:59:54 PM UTC-4, Robbie Hatley wrote:
 
> But the nagging question remains: What is "TP_NUM_C_BUFS"???
 
It indicates stack corruption. It may be a cygwin issue (similar issues have
been reported by others.)
 
Daniel
Paavo Helde <myfirstname@osa.pri.ee>: Apr 11 12:58PM +0300

On 11.04.2016 6:59, Robbie Hatley wrote:
 
> 0 [main] renamefiles 6040 C:\cygwin64\rhe\bin\util\renamefiles.exe:
> *** fatal error - Internal error: TP_NUM_C_BUFS too small: 50
> Hangup
 
Looks like an Undefined Behavior bug expressed. Probably some memory
corruption somewhere.
 
> that you can't safely define it as an auto variable on the stack;
> it *must* be malloc'ed on the heap instead. I wish this factoid
> had made it into the GNU C-lib manual.
 
No, this is not the reason and you did not fix the issue. All you did
was to move the memory corruption to some other place, hiding the bug.
An hidden bug is much worse than a clear bug.
 
Suggesting to review your code to find out the actual bug. Most probably
you are passing invalid arguments to the library.
 
It is not excluded that the bug is in the library itself, the CygWin
ports are not very widely used or tested, so there is some chance (say
1%) that the bug is not in your code.
 
HTH
Paavo
Robbie Hatley <see.my.sig@for.my.address>: Apr 11 11:24AM -0700

On 4/11/2016 2:58 AM, Paavo Helde wrote:
 
 
> An hidden bug is much worse than a clear bug. Suggesting to review your code
> to find out the actual bug. Most probably you are passing invalid arguments
> to the library.
 
 
I want to say it can't be so, but that is a possibility, because the library
I'm using (Cygwin C Library) is not written by any one person or company,
but is a hodge-podge of code stolen from Red Hat, BSD, and FSF/GNU, along
with some written by the Cygwin folks themselves. And while Cygwin's
C Lib comes with a "Red Hat C Standard Library" html documentention file,
it doesn't cover their Regex stuff. GNU's C Library Manual covers it, but
it quite vague in many places (such as, doesn't tell you if a function
allocates memory for an input variable, or if you should allocate it yourself,
and if so, whether it should be stack, static, dynamic, or whatever).
And the Regex stuff in Cygwin's C Lib was actually taken from BSD, so
I suppose I should hunt down a BSD C Lib manual. Sigh. What a mess.
 
But I just learned that there *is* a better way (see below)....
 
> It is not excluded that the bug is in the library itself, the CygWin
> ports are not very widely used or tested, so there is some chance (say 1%)
> that the bug is not in your code.
 
Sounds like a good reason to avoid the C Lib in C++ code.
 
Especially since I just learned this morning (from reading Stroustrup's
"The C++ Programming Language 4th Ed.", which I just got my hands on;
the version I WAS using was the 2000 edition) that I can replace
the entire function I was creating -- a regex substitution function --
with the following, using the new C++ 2011 std lib:
 
// #include <regex.h> // No, DON'T include this piece of $#%^!
#include <regex> // Let's try THIS header instead.
regex_replace(.......); // Already written for me. Cool! :D
 
Replaces 343 lines of code with... well... 0. I can erase my entire
homebrew regex module, seeing as how C++ 2011 std lib already
has easy-to-use regex match, search, replace tools. Cool.
 
Well, it was any interesting exercise in tracing an elusive memory
corruption bug (which is clearly what it was/is). Perhaps I'll pursue
debugging my buggy C regex module later as an exercise. But for
any serious usage I think I'll use C++ regex instead.
 
 
I *still* wonder what the heck "TP_NUM_C_BUFS" is, though. :-)
 
 
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
Paavo Helde <myfirstname@osa.pri.ee>: Apr 11 09:45PM +0300

On 11.04.2016 21:24, Robbie Hatley wrote:
 
> Replaces 343 lines of code with... well... 0. I can erase my entire
> homebrew regex module, seeing as how C++ 2011 std lib already
> has easy-to-use regex match, search, replace tools. Cool.
 
Good to hear! Yes of course it is better to use "builtin" C++ regex
whenever possible. And I am still pretty sure the actual bug was in some
of those deleted 343 lines (and deleting them was a good way to get rid
of the bug!)
 
> debugging my buggy C regex module later as an exercise. But for
> any serious usage I think I'll use C++ regex instead.
 
> I *still* wonder what the heck "TP_NUM_C_BUFS" is, though. :-)
 
According to Google, this is a constant defined in some cygtls.h header
file (e.g.
https://sourceforge.net/u/earnie/msys2/msys2/ci/01e8fa7372b33b9766bb5dd3e059cfab695bbf90/tree/winsup/cygwin/cygtls.h).
"tls" means thread local storage, a known source of problems and
confusion on Windows. But I believe your issue had nothing to do with
the thread local storage, it's just that this code bothered to do some
consistency checks and noticed the memory corruption.
 
Cheers
Paavo
Robbie Hatley <see.my.sig@for.my.address>: Apr 11 12:18PM -0700

On 4/10/2016 9:47 PM, Alf P. Steinbach wrote:
 
> Probably `Text` has been initialized with 0 or `nullptr`.
> That arms an UB bomb.
 
At the point where the program always crashed, Text is a
std::string const & parameter to my "substitute" function,
and it refers to a std::string containing "randip.c".
 
Drastically simplified call chain:
 
int main ()
{
...
CallingFunc();
...
return 0;
}
 
int CallingFunc ()
{
...
std::string source = "randip.c";
std::string pat = "and";
std::string repl = "or";
std::string result = substitute(source, pat, repl, 'g');
...
return 0;
}
 
std::string substitute
(
std::string const & Text,
std::string const & Pat,
std::string const & Repl,
char Flag // 'g' for global
)
{
regex_t Regex; // line which appears to have corrupted stack
... hundreds of lines of code...
std::string Text2 = Text; // accesses previously corrupted stack???
... hundreds more lines of code...
return Text2;
}
 
> Other causes include a buffer overflow somewhere (ditto).
 
I don't make much use of fixed-length buffers in C++ code,
preferring std::string over char[], vector<MyClass> over MyClass[], etc.
So it's not something as simple as stick 73 objects in a 50-object array.
 
Or if it is, it's in library code. Perhaps that's what the error msg,
"TP_NUM_C_BUFS too small: 50" was all about. :::shrugs:::
 
> Unless `RegEx` is a POD type this introduces more UB. And if it doesn't,
> then its only effects are to (1) add a time consuming operation,
> (2) hide the real problem,
 
The "regex_t" type from the BSD and GNU C libs's "regex.h" headers
is poorly documented, so it's hard to know what's in it. The documentation
says "This struct contains several more fields than are documented here,
but you don't need to know about those, as only library internal functions
should access them". So GNU's attitude is, "You don't want to know; and if
you do, bugger off."
 
But I'm curious: why would dynamically allocating memory for an object
which later dynamically allocates still more memory for its own use
be a problem? Seems straightforward to me. Since this is a C library,
I doubt regex_t has "member functions"; the struct itself is likely POD,
but it definitely has associated functions in the library for dynamic memory
allocation.
 
> and (3) possibly introducing a memory leak or other such issue.
 
A function "regfree" is provided for freeing memory allocated by
objects of type regex_t (or more accurately, by their associated functions),
but the documentation says "regfree doesn't free the object itself",
implicitly implying that you malloced the regex_t object. And indeed,
when I did that, the bug seemed to go away. But creating and freeing
a regex_t object is a hassle:
 
// Make the object:
regex_t * RegEx = NULL;
RegEx = (regex_t *)malloc(sizeof(regex_t));
assert(RegEx);
... use the object....
// Free the object:
regfree(RegEx);
assert(RegEx);
free(RegEx);
 
> The idea of searching systematically for the problem is good.
> But most likely the problem is in the calling code, not in the
> function itself.
 
I suppose it's possible. I'll also look there, and see if I'm doing
anything funky. But I don't think it's as simple as sending wrong stuff
to substitute(), as those arguments are very straightforward.
 
> > But the nagging question remains: What is "TP_NUM_C_BUFS"???
 
> It's just the g++ way of saying, "got you... muwhahaha!"
 
:-)
 
Well, the Cygwin C regex module appears to be the work of Henry Spencer
of the University of Toronto, who contributed it to UC Berkeley,
who used it in BSD, from whence Cygwin borrowed and modified it,
stirring in stuff from Red Hat and FSF/GNU.
 
So maybe it was Henry Spencer, or Richard Stallman, or someone else,
who thought, "HEY, I know, I'll name the error code for when xxxxxxx
happens 'TP_NUM_C_BUFS too low'; that'll confuse the hell out of
our users! MUAH-HA-HA-HA-HA!!!"
 
Anyway, long-term fix is to go over to using header <regex> (C++ 2011)
instead of header <regex.h> (inherited from C). It already has a
regex_replace() function, making my module obsolete except as an
exercise in troubleshooting obscure memory corruption bugs. (I just
learned of the existence of the 2011 C++ std lib regex functionality
this very morning.)
 
 
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
scott@slp53.sl.home (Scott Lurndal): Apr 11 08:08PM

>and if so, whether it should be stack, static, dynamic, or whatever).
>And the Regex stuff in Cygwin's C Lib was actually taken from BSD, so
>I suppose I should hunt down a BSD C Lib manual. Sigh. What a mess.
 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/regexec.html
http://pubs.opengroup.org/onlinepubs/9699919799/
legalize+jeeves@mail.xmission.com (Richard): Apr 11 08:39PM

[Please do not mail me a copy of your followup]
 
Robbie Hatley <see.my.sig@for.my.address> spake the secret code
> << endl;
> std::string Text2 = Text; // CATASTROPHIC PROGRAM CRASH!!!
> [...]
 
I'm going to take a step back here to add a viewpoint that hasn't yet
been expressed in this thread. I agree that it is unlikely that the
GNU C Library Regex facility requires that it's values be allocated on
the heap in order to function properly. That would be unusual for the
any Standard Library facility. They're designed intentionally so that
you "only pay for what you use".
 
TL;DR: Monster methods are a source of bugs. Refactor them into
smaller units that your brain can reason about with confidence.
 
If this code is easy to test in isolation, I should first write a
unit test for it that reproduces the problem. What you've written
suggests that this is a "monster method". Writing an isolated test
for it is likely to be difficult because it almost certainly doesn't
have "hundreds of lines of code" that execute in a linear fashion.
That means lots of branching and conditionally executed code. So
writing a test is likely to be difficult. If not, certainly write one
that reproduces the problem.
 
That "hundreds of lines of code" is really bugging me. Chances are
that there is another "hundreds of lines of code" after the part
you're showing us. Compose Method is your friend here.[1]
(See below for an alternative: Replace Method with Method Object.)
Most likely your "hundreds of lines of code" is introducing lots and
lots of local variables throughout. Almost certainly it was decided
that reusing local variables was preferable to each chunk of code using
only it's own variables.
 
Code like this makes it difficult to apply Compose Method because there
is lots of shared state (the variables) between various chunks of code.
They aren't coupled through the values of the variables because each
chunk overwrites the value with it's own local initialization, but
they are coupled through the names of the variables that are reused.
In order to break this up, you'll want to refactor each isolated use of
a reused variable into a use of a distinct variable. Unfortunately
there isn't any good tool for C++ that can do this automatically and
reliably, so you'll need to do it by hand.
 
Here's where you should lean on your version control system really
heavily. Each time you decouple a reused variable you'll want to
decouple a single reuse instance and then make a commit into your
version control system for it. Yes, this may result in 50+ commits
for these several hundred lines of code, but so what? Commits are
free. From the way you describe it, I assume this code has 0% unit
test coverage. Small, slow steps are what you need in order to bring
some sanity to this monster method. When you transform code in small
steps anyone can follow the version control history and see what was
done and how it was done. When a huge method like this is refactored
in one gigantic step, then looking at the diffs becomes pointless.
The diff viewer will say "several hundred lines of code were removed
and replaced with several hundred other lines of code", making for an
unintelligible change. When looking at the diff, the average person
will be hard pressed to determine if the change was for the better or
not in terms of bugs and preserved functionality.
 
Once we've reduced the coupling of various chunks of code, we can
start to Extract Method on the various chunks in order to achieve
Compose Method. Here, there are good tools to help: CLion for
Linux/Mac and VS2015/ReSharper for C++ for Windows. They all have
pretty good automated Extract Method functionality that handles the
dataflow needed to properly deduce input arguments, return types and
their proper signatures. (Sorry, VisualAssist fairs pretty poorly
here and I can't recomment it as it causes too much manual fixup after
the fact.) If they don't deduce the types to best practices (passing
std::string input args by value instead of const reference), then
applying Change Signature on the extracted function gets you where you
want to be.
 
Why do Compose Method? We do it because large methods with lots of
local variables are nearly impossible to reason about with confidence.
Breaking a monster method up into a series of calls to smaller methods
that we named with intention revealing names[2] let's us see the
forest instead of the trees.
 
Once things are broken down into small chunks we can reason about
easily, we often spot the small error that caused the bug in the first
place. It was "hiding in plain sight" inside that monster method.
 
If you end up with very large argument lists because all these
extracted chunks are communicating large amounts of state between
them, then Replace Method with Method Object[3] may be a better
alternative to Compose Method. With a Method Object, almost everything
that was local state in the monster method becomes instance variables in
the new object. This instance state doesn't need to be passed around
between methods. We apply the same process as in Compose Method to
break the monster method into smaller chunks.
 
There's something nice that can happen when we use a Method Object
instead of Compose Method on an existing object. The new Method
Object can make all of it's extracted methods public, increasing the
visible surface area that we can unit test from the outside. The
Method Object is visible to us internally for testing purposes, but
isn't exposed to clients as part of our public API anymore than the
local variables in the monster method were exposed publicly.
 
[1] <https://www.industriallogic.com/xp/refactoring/composeMethod.html>
[2] <http://c2.com/cgi/wiki?IntentionRevealingNames>
[3] <http://refactoring.com/catalog/replaceMethodWithMethodObject.html>
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
woodbrian77@gmail.com: Apr 10 08:30PM -0700

On Sunday, April 10, 2016 at 4:15:41 PM UTC-5, Öö Tiib wrote:
 
> Oh. AFAIK also woodbrian is devoted follower of :: before std.
> He claims that it protects his middlewriter's code from some sort of
> evil and stupid users.
 
I use ::std to prevent surprises for users. We discussed this
several times now.
 
Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net
Daniel <danielaparker@gmail.com>: Apr 10 09:22PM -0700


> I use ::std to prevent surprises for users.
 
Interesting point, you might also want to consider using ::god, to avoid other surprises
 
Daniel
Juha Nieminen <nospam@thanks.invalid>: Apr 11 07:14AM


>> auto p { ::std::make_pair( 0, 0 ) };
 
>> ?
 
> There is nothing "wrong" with it.
 
Wasn't it so that the = initialization requires for the type to have a
callable copy constructor (even if the compiler never produces the call)?
 
Thus the two types of initialization are not completely identical.
 
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
Victor Bazarov <v.bazarov@comcast.invalid>: Apr 11 08:28AM -0400

On 4/11/2016 3:14 AM, Juha Nieminen wrote:
 
> Wasn't it so that the = initialization requires for the type to have a
> callable copy constructor (even if the compiler never produces the call)?
 
> Thus the two types of initialization are not completely identical.
 
Yes. Does it make it "wrong"? Or is it so for 'std::pair'?
 
V
--
I do not respond to top-posted replies, please don't ask
Juha Nieminen <nospam@thanks.invalid>: Apr 11 07:09AM

> It remains painfully obvious that it is YOU who doesn't know when to use
> std::list as you are only aware of one (non-ideal) use-case (range variant
> of splice).
 
You seem to be like a broken record.
 
Did you have an actual argument to make?
 
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
Robbie Hatley <see.my.sig@for.my.address>: Apr 10 10:17PM -0700

On 4/8/2016 4:24 AM, Christian Gollwitzer wrote:
 
 
> generate this code
 
> const char * text[] = { "add", "subtract", "multiply", NULL };
> void * fun[] = { add, subtract, multiply };
 
Easy as pie.
 
Save the following to file called "table-macro-test.cpp":
 
#define TABLE(X,Y,Z) \
const char * text[] = { #X, #Y, #Z, NULL }; \
void * fun[] = { add, subtract, multiply };
 
TABLE ( add , subtract , multiply )
 
Then run that through the preprocessor:
 
 
%cpp table-macro-test.cpp
# 1 "table-macro-test.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "table-macro-test.cpp"
const char * text[] = { "add", "subtract", "multiply", NULL };
void * fun[] = { add, subtract, multiply };
 
 
Voila.
 
 
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
Christian Gollwitzer <auriocus@gmx.de>: Apr 11 08:28AM +0200

Am 11.04.16 um 07:17 schrieb Robbie Hatley:
> const char * text[] = { #X, #Y, #Z, NULL }; \
> void * fun[] = { add, subtract, multiply };
 
> TABLE ( add , subtract , multiply )
 
Thanks for your answer. This works for exactly three arguments. I want
to use the macro several times with a varying number of arguments. The
rest of the macro is also larger and generates a function. Fortunately,
I found a solution:
 
This macro:
 
https://github.com/swansontec/map-macro
 
can perform a maplike operation over a list of up to 360 arguments.
 
I needed to enhance it to pass an extra argument to the "lambda"-macro,
which turned out to be easy:
 
https://github.com/auriocus/AsynCA/blob/master/generic/map.h
 
Best regards,
 
Christian
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: