Tuesday, April 12, 2016

Digest for comp.lang.c++@googlegroups.com - 10 updates in 2 topics

Paavo Helde <myfirstname@osa.pri.ee>: Apr 11 09:45PM +0300

On 11.04.2016 21:24, Robbie Hatley wrote:
 
> Replaces 343 lines of code with... well... 0. I can erase my entire
> homebrew regex module, seeing as how C++ 2011 std lib already
> has easy-to-use regex match, search, replace tools. Cool.
 
Good to hear! Yes of course it is better to use "builtin" C++ regex
whenever possible. And I am still pretty sure the actual bug was in some
of those deleted 343 lines (and deleting them was a good way to get rid
of the bug!)
 
> debugging my buggy C regex module later as an exercise. But for
> any serious usage I think I'll use C++ regex instead.
 
> I *still* wonder what the heck "TP_NUM_C_BUFS" is, though. :-)
 
According to Google, this is a constant defined in some cygtls.h header
file (e.g.
https://sourceforge.net/u/earnie/msys2/msys2/ci/01e8fa7372b33b9766bb5dd3e059cfab695bbf90/tree/winsup/cygwin/cygtls.h).
"tls" means thread local storage, a known source of problems and
confusion on Windows. But I believe your issue had nothing to do with
the thread local storage, it's just that this code bothered to do some
consistency checks and noticed the memory corruption.
 
Cheers
Paavo
legalize+jeeves@mail.xmission.com (Richard): Apr 11 08:39PM

[Please do not mail me a copy of your followup]
 
Robbie Hatley <see.my.sig@for.my.address> spake the secret code
> << endl;
> std::string Text2 = Text; // CATASTROPHIC PROGRAM CRASH!!!
> [...]
 
I'm going to take a step back here to add a viewpoint that hasn't yet
been expressed in this thread. I agree that it is unlikely that the
GNU C Library Regex facility requires that it's values be allocated on
the heap in order to function properly. That would be unusual for the
any Standard Library facility. They're designed intentionally so that
you "only pay for what you use".
 
TL;DR: Monster methods are a source of bugs. Refactor them into
smaller units that your brain can reason about with confidence.
 
If this code is easy to test in isolation, I should first write a
unit test for it that reproduces the problem. What you've written
suggests that this is a "monster method". Writing an isolated test
for it is likely to be difficult because it almost certainly doesn't
have "hundreds of lines of code" that execute in a linear fashion.
That means lots of branching and conditionally executed code. So
writing a test is likely to be difficult. If not, certainly write one
that reproduces the problem.
 
That "hundreds of lines of code" is really bugging me. Chances are
that there is another "hundreds of lines of code" after the part
you're showing us. Compose Method is your friend here.[1]
(See below for an alternative: Replace Method with Method Object.)
Most likely your "hundreds of lines of code" is introducing lots and
lots of local variables throughout. Almost certainly it was decided
that reusing local variables was preferable to each chunk of code using
only it's own variables.
 
Code like this makes it difficult to apply Compose Method because there
is lots of shared state (the variables) between various chunks of code.
They aren't coupled through the values of the variables because each
chunk overwrites the value with it's own local initialization, but
they are coupled through the names of the variables that are reused.
In order to break this up, you'll want to refactor each isolated use of
a reused variable into a use of a distinct variable. Unfortunately
there isn't any good tool for C++ that can do this automatically and
reliably, so you'll need to do it by hand.
 
Here's where you should lean on your version control system really
heavily. Each time you decouple a reused variable you'll want to
decouple a single reuse instance and then make a commit into your
version control system for it. Yes, this may result in 50+ commits
for these several hundred lines of code, but so what? Commits are
free. From the way you describe it, I assume this code has 0% unit
test coverage. Small, slow steps are what you need in order to bring
some sanity to this monster method. When you transform code in small
steps anyone can follow the version control history and see what was
done and how it was done. When a huge method like this is refactored
in one gigantic step, then looking at the diffs becomes pointless.
The diff viewer will say "several hundred lines of code were removed
and replaced with several hundred other lines of code", making for an
unintelligible change. When looking at the diff, the average person
will be hard pressed to determine if the change was for the better or
not in terms of bugs and preserved functionality.
 
Once we've reduced the coupling of various chunks of code, we can
start to Extract Method on the various chunks in order to achieve
Compose Method. Here, there are good tools to help: CLion for
Linux/Mac and VS2015/ReSharper for C++ for Windows. They all have
pretty good automated Extract Method functionality that handles the
dataflow needed to properly deduce input arguments, return types and
their proper signatures. (Sorry, VisualAssist fairs pretty poorly
here and I can't recomment it as it causes too much manual fixup after
the fact.) If they don't deduce the types to best practices (passing
std::string input args by value instead of const reference), then
applying Change Signature on the extracted function gets you where you
want to be.
 
Why do Compose Method? We do it because large methods with lots of
local variables are nearly impossible to reason about with confidence.
Breaking a monster method up into a series of calls to smaller methods
that we named with intention revealing names[2] let's us see the
forest instead of the trees.
 
Once things are broken down into small chunks we can reason about
easily, we often spot the small error that caused the bug in the first
place. It was "hiding in plain sight" inside that monster method.
 
If you end up with very large argument lists because all these
extracted chunks are communicating large amounts of state between
them, then Replace Method with Method Object[3] may be a better
alternative to Compose Method. With a Method Object, almost everything
that was local state in the monster method becomes instance variables in
the new object. This instance state doesn't need to be passed around
between methods. We apply the same process as in Compose Method to
break the monster method into smaller chunks.
 
There's something nice that can happen when we use a Method Object
instead of Compose Method on an existing object. The new Method
Object can make all of it's extracted methods public, increasing the
visible surface area that we can unit test from the outside. The
Method Object is visible to us internally for testing purposes, but
isn't exposed to clients as part of our public API anymore than the
local variables in the monster method were exposed publicly.
 
[1] <https://www.industriallogic.com/xp/refactoring/composeMethod.html>
[2] <http://c2.com/cgi/wiki?IntentionRevealingNames>
[3] <http://refactoring.com/catalog/replaceMethodWithMethodObject.html>
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
Robbie Hatley <see.my.sig@for.my.address>: Apr 12 09:16AM -0700

For comparison purposes, here are the C++ and C versions of my
"Substitute" function, as stand-alone C++ programs, dependent
only on std libraries (and the GNU C regex library in the case
of the C-style version).
 
 
 
First, the C++ version.
Total code lines: 38.
Total time spent writing & debugging: 60 minutes.
 
#include <iostream>
#include <regex>
using std::cout;
using std::cerr;
using std::endl;
std::string Substitute
(
std::string const & Pattern,
std::string const & Replacement,
std::string const & Text,
std::string const & Flags // If Flags contains 'g', do global replace;
) // otherwise, replace first occurrence only.
{
std::regex_constants::match_flag_type RegexFlags {};
if (std::string::npos == Flags.find_first_of("g"))
{
RegexFlags = std::regex_constants::format_first_only;
}
std::regex RegEx {Pattern};
std::string Result = std::regex_replace(Text, RegEx, Replacement, RegexFlags);
return Result;
} // end function rhregex::Substitute()
int main (int Beren, char ** Luthien)
{
std::string Pattern {Luthien[1]};
std::string Replacement {Luthien[2]};
std::string Text {Luthien[3]};
std::string Flags {Luthien[4]};
if (5 != Beren)
{
cerr
<< "Error in regex-c-test.cpp: this program takes 4 arguments," << endl
<< "but you only typed " << Beren << ". Aborting program." << endl;
exit(666);
}
std::string Result = Substitute(Pattern, Replacement, Text, Flags);
std::cout << Result << std::endl;
}
 
 
 
Next, the C-style version.
Total code lines: 322.
Total time spent writing & debugging: many hours
 
#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>
#include <regex.h>
#undef NDEBUG
#include <assert.h>
#include <errno.h>
#undef BLAT_ENABLE
#ifdef BLAT_ENABLE
#define BLAT(X) std::cerr << X << std::endl;
#else
#define BLAT(X)

No comments: