soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

In the end, rason will come - 4 Updates
An example: code to list exports of a Windows DLL. - 4 Updates
[niubbo] convert a string representing a valid date to some "binary" date - 3 Updates
[niubbo] convert a string representing a valid date to some "binary" date - 1 Update
Geodetic Development Tool - 1 Update

"Öö Tiib" <ootiib@hot.ee>: Aug 01 07:53PM -0700

On Friday, 2 August 2019 00:26:16 UTC+3, David Brown wrote:
> > tricky to indicate to compiler that it may do such optimizations
> > in conforming mode.

> Such optimisations won't be tricky - they will be completely impossible.

Why? Making compiler to warn that "(x + z) > (y + z)" is maybe sub-optimal
code and to have some attribute or pragma for to suppress it when wrap
*was* reason to write it like that is not impossible just tricky.
Other likely outcome is that iterator or range based loops will simply
be tiny bit more efficient than that int i as index.
That can be even good. In Rust it is so and all are happy.

> > off wrapping signed integers will be different in any way.

> It will be a completely different thing. And I think that is so obvious
> it doesn't need more explanation.

Ok. :/

> wrapping behaviour, it is no longer a mistake. How is the compiler
> supposed to guess what was intentional behaviour and what was likely to
> be a programmer error?

I don't know who does not agree that signed integer overflow is likely
a bug and worth warning about. If someone writes some code that uses
deliberately wrapping (for example retroactively to test if overflow
did happen) then they can suppress that warning locally with some
pragma or attribute.

> >> correct.

> > That option I would also love like I said above.

> That cannot be an option if wrapping is the defined behaviour.

Note that both g++ and clang have -fwrapv and -ftrapv just neither
of those work reliably.

Requiring at least working -fwrapv would be bit better
than nothing. I could bit cheaper (or at least bit more readably) write
my own throwing or saturating integers using wrapping integers.
I like trapping best but I understand that it is most expensive on
majority of platforms.

> enabling or disabling particular features, such as exceptions or RTTI
> (features which are typically only disabled for niche uses, such as
> resource limited embedded systems).

Yes, you have good point here that switching between different well
defined behaviors is evil.
It is maybe niche ... but it is niche that we *own* right now.
I have about third of my career (plus sometimes as hobby) participated
in such projects. There are literally tons of electronics made all
around and other languages but C and C++ are only sometimes
experimentally tried there. Also the optimizations often matter
only on such limited systems somewhat.

> that there is /some/ way to get it. If unsigned arithmetic were not
> defined with wrapping in C and C++, there would have to be another way
> to do it for these occasional uses.

Yes, some of the code to avoid that undefined behavior with signed
integers uses unsigned right now (and assumes twos complement).
The well-defined behavior can not make wrong code to give
correct answers. The wrong answers will just come more consistently
and portably with wrap and so I can trust unit tests for my
math of embedded system ran on PC bit more.

> It was not ordered by importance (or what I feel is most important).
> Correctness always trumps efficiency, and aids to correctness are
> therefore high on my list of important features.

I have same views especially about arithmetic that is fast. For
massive amounts of algebra it is better to use libraries that
utilize GPU or the like anyway. I still do not understand how
undefined behavior is supposed to be more safe and reliable than
defined behavior ... but perhaps it is just me.

Christian Gollwitzer <auriocus@gmx.de>: Aug 02 07:41AM +0200

Am 01.08.19 um 23:25 schrieb David Brown:
> It was not ordered by importance (or what I feel is most important).
> Correctness always trumps efficiency, and aids to correctness are
> therefore high on my list of important features.

Then the right thing to do would be mathematical signed integers, which
can not overflow at all (like those in Python, short of memory
exhaustion) for the type "int". Wrapping integers for int8_t, uint8_t
etc. No general unsigned int, unless it throws an exception on negative
numbers.

The only reason no one wants to suggest this, is performance.

Christian

Bonita Montero <Bonita.Montero@gmail.com>: Aug 02 08:10AM +0200

> Specifying wrapping behavior for signed integers could be a *big*
> problem. In particular, this:
> int n = INT_MAX + 1;

That's not the least problem because no one writes such code.

David Brown <david.brown@hesbynett.no>: Aug 02 08:46PM +0200

On 02/08/2019 04:53, Öö Tiib wrote:

> Why? Making compiler to warn that "(x + z) > (y + z)" is maybe sub-optimal
> code and to have some attribute or pragma for to suppress it when wrap
> *was* reason to write it like that is not impossible just tricky.

It is impossible, because the code would no longer be correct (unless
the compiler has other information about the ranges). If one of (x + z)
or (y + z) wraps and the other does not, optimising to "x > y" would not
be valid.

Try this with gcc:

bool foo(int x, int y, int z) {
return (x + z) > (y + z);
}

#pragma GCC optimize "-fwrapv"
bool foo2(int x, int y, int z) {
return (x + z) > (y + z);
}

(from the usual https://godbolt.org>

foo:
cmp edi, esi
setg al
ret
foo2:
add edi, edx
add edx, esi
cmp edi, edx
setg al
ret

When signed integer overflow has two's complement wrapping, you can't
simplify code using as many normal mathematical integer identities. You
can still do some re-arrangements and simplifications - more than if
overflow is defined as trapping - but you lose some case, especially
those involving relations and inequalities.

> Other likely outcome is that iterator or range based loops will simply
> be tiny bit more efficient than that int i as index.
> That can be even good. In Rust it is so and all are happy.

When the compiler has more information about the possible ranges of the
numbers involved, it can do more optimisation. That is the case for
iterators.

But it is not a good thing that some types of code become less efficient
- the fact that other types of code might not be affected does not
suddenly make it good.

A change that makes current, correct, working code slower is never a
good thing by itself. It is only worth having if there are significant
advantages. Turning broken code with arbitrary bad behaviour into
broken code with predictable bad behaviour is not particularly useful.

Let people who think that wrapping integers are somehow good or safe use
other languages. There is no need to weaken C++ with the same mistake.

> deliberately wrapping (for example retroactively to test if overflow
> did happen) then they can suppress that warning locally with some
> pragma or attribute.

So you think this feature - wrapping overflows - is so useful and
important that it should be added to the language and forced upon all
compilers, and yet it also is so unlikely to be correct code that
compilers should warn about it whenever possible and require specific
settings to disable the warning? Isn't that a little inconsistent?

I'd be much happier to see some standardised pragmas like:

#pragma STDC_OVERFLOW_WRAP
#pragma STDC_OVERFLOW_TRAP
#pragma STDC_OVERFLOW_UNDEFINED

(or whatever variant is preferred)

where undefined behaviour is the standard, but people can choose
specific defined behaviour if they want, using a standardised method.
Basically, allow #pragma GCC optimize "-fwrapv" in a common form.
Surely that would be enough for those that want wrapping behaviour,
without bothering anyone else?

>> That cannot be an option if wrapping is the defined behaviour.

> Note that both g++ and clang have -fwrapv and -ftrapv just neither
> of those work reliably.

"-fwrapv" does exactly what it says on the tin, as far as I know. Do
you know of any problems with it, or any way in which it does not work
reliably? Neither gcc nor clang are bug-free, of course, and given that
this is an option that is used rarely it will not receive as much heavy
testing as other aspects of the tools. But the intention is that this
is a working and maintained option that gives you specific new semantics
in the compiler.

"-ftrapv" has always been a bit limited, and somewhat poorly defined.
It is not clear how much rearrangement is done before the trapping
operations are used, and support varies from target to target. AFAIUI,
you are recommended to use -fsanitize=signed-integer-overflow instead.

> Requiring at least working -fwrapv would be bit better
> than nothing.

Again, what do you think does not work with -fwrapv?

> I could bit cheaper (or at least bit more readably) write
> my own throwing or saturating integers using wrapping integers.

For gcc and clang, you are better off using the overflow arithmetic
builtin functions.

> I like trapping best but I understand that it is most expensive on
> majority of platforms.

It is expensive on /all/ platforms, because it disables a good many
re-arrangements and simplifications of expressions. Beyond that, it
usually boils down to adding "trap if overflow" or "branch if overflow"
instructions after arithmetic operations - and the cost of that does
vary between targets.

But it is probably cheaper than saturating arithmetic in many cases,
which also disables many re-arrangements and requires extra code for
targets that don't have saturating arithmetic instructions.

> around and other languages but C and C++ are only sometimes
> experimentally tried there. Also the optimizations often matter
> only on such limited systems somewhat.

My career has been dominated by programming on platforms which are small
enough that you typically disable exceptions and RTTI when using C++.
(Things might change in the future with the newer ideas for cheaper C++
exceptions.) And yes, it is a very important niche, especially for C.

>> to do it for these occasional uses.

> Yes, some of the code to avoid that undefined behavior with signed
> integers uses unsigned right now (and assumes twos complement).

Indeed - and that is often fine, though perhaps of limited portability.
(Making two's complement representation a requirement will fix this.)

> correct answers. The wrong answers will just come more consistently
> and portably with wrap and so I can trust unit tests for my
> math of embedded system ran on PC bit more.

If your unit tests rely on wrapping for overflow, then those unit tests
are broken. "More consistent wrong answers" is /not/ a phrase you want
to hear regarding test code! You want your embedded code to run quickly
and efficiently - but there is no reason not to have
-fsanitize=signed-integer-overflow for your PC-based unit tests and
simulations.

> utilize GPU or the like anyway. I still do not understand how
> undefined behavior is supposed to be more safe and reliable than
> defined behavior ... but perhaps it is just me.

Undefined behaviour is something that your tools know is wrong. That
means that there is at least a chance that the tools can spot mistakes.
They can't do it all the time, and sometimes there are significant costs
in run-time tests for find mistakes, but it is possible. Look at the
sanitize options at
<https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html> (or
the clang page if you prefer), to see some of the mistakes that can be
caught at compile time. They can only be caught because they are
mistakes - they have no defined behaviour.

If something is given defined behaviour by the language, then the
compiler must assume that when it occurs, it is intentional. At best,
any compile-time or run-time checking for this must be an optional
feature with a high risk of false positives.

Additionally, adding this new semantics to the language would likely
confuse people and make them think it was always the case. There are
far too many programmers already who think signed arithmetic wraps in C
and C++ - it would be worse if they see it documented for some standards
and not others.

If there were enough benefit from the additional behaviour, that would
be fair enough. But there isn't any benefit of significance - correct
code remains correct after this change, and broken code remains broken.

An example: code to list exports of a Windows DLL.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 02 03:36AM +0200

This just-now-it-compiled-and-appeared-to-work code is for whoever needs
or desires a tool to list DLL exports (e.g. because of the large amount
of irrelevant output from MS `dumpbin`, or because one doesn't want to
install Visual Studio but instead just use MinGW g++), or those
interested in the internal structure of a Windows PE format executable.

For the latter, if it's still available on the web then one could do
worse than reading Matt Pietrek's two- or three part article series.

I once posted an exports listing program like this on Stack Overflow,
back then only for 32-bit DLLS, and as I recall then by loading the DLL
and using the pointers in the image directly as pointers. This code
instead reads the file. As I recall from the SO posting the function
names are stored internally with UTF-8 encoding, but the documentation
says ASCII, yet, on the third hand, the documentation of encodings used
in Windows is known to be wildly inaccurate & misleading. I've yet to
test that. But apparently the documentation of `GetProcAddr` (the
Windows way of getting a pointer to specified function in a loaded DLL)
now says ASCII, instead of previously Windows ANSI.

The $ things in the following code are macros from the specified
header-only library on GitHub. E.g. `$use_std` expands to a `using`
declaration that prepends `std::` to each specified name. If use of the
library is undesired then just manually expand the macro invocations.

The "winapi-header-wrappers" micro-library is not on GitHub or anywhere,
but they're just simple wrappers, mostly just ensuring that
`<windows.h>` is included, because MS headers are not self-sufficient.

------------------------------------------------------------------------
#include <cppx-core/all.hpp> // <url:
https://github.com/alf-p-steinbach/cppx-core>
#include <winapi-header-wrappers/windows-h.hpp> // Just a
<windows.h> wrapper with UNICODE defined.
#include <winapi-header-wrappers/shellapi-h.hpp> // CommandLineToArgvW

namespace win_util {
$use_std( exchange );
$use_cppx( Wide_c_str, Mutable_wide_c_str, hopefully, P_ );

class Command_line_args
{
P_<Mutable_wide_c_str> m_parts;
int m_n_parts;

Command_line_args( const Command_line_args& ) = delete;
auto operator=( const Command_line_args& ) ->
Command_line_args& = delete;

public:
auto count() const -> int { return
m_n_parts - 1; }
auto operator[]( const int i ) const -> Wide_c_str { return
m_parts[i + 1]; }
auto invocation() const -> Wide_c_str { return
m_parts[0]; }

Command_line_args():
m_parts( CommandLineToArgvW( GetCommandLine(), &m_n_parts ) )
{
hopefully( m_parts != nullptr )
or $fail( "CommandLineToArgvW failed" );
}

Command_line_args( Command_line_args&& other ):
m_parts( exchange( other.m_parts, nullptr ) ),
m_n_parts( exchange( other.m_n_parts, 0 ) )
{}

~Command_line_args()
{
if( m_parts != nullptr ) {
LocalFree( m_parts );
}
}
};

} // namespace win_util

namespace app {
$use_std(
cout, clog, endl, invoke, runtime_error, string, vector
);
$use_cppx(
hopefully, fail_, Is_zero,
Byte, fs_util::C_file, Size, Index, C_str,
fs_util::read, fs_util::read_, fs_util::read_sequence,
fs_util::read_sequence_,
fs_util::peek_,
is_in, P_, to_hex, up_to
);
namespace fs = std::filesystem;
using namespace cppx::basic_string_building; // operator<<,
operator""s

// A class to serve simple failure messages to the user, via
exceptions. These
// exceptions are thrown without origin info, and are presented as
just strings.
// Don't do this in any commercial code.
class Ui_exception:
public runtime_error
{ using runtime_error::runtime_error; };

using Uix = Ui_exception;

struct Pe32_types
{
using Optional_header = IMAGE_OPTIONAL_HEADER32;
static constexpr int address_width = 32;
};

struct Pe64_types
{
using Optional_header = IMAGE_OPTIONAL_HEADER64;
static constexpr int address_width = 64;
};

template< class Type >
auto from_bytes_( const P_<const Byte> p_first )
-> Type
{
Type result;
memcpy( &result, p_first, sizeof( Type ) );
return result;
}

template< class Type >
auto sequence_from_bytes_( const P_<const Byte> p_first, const Size n )
-> vector<Type>
{
vector<Type> result;
if( n <= 0 ) {
return result;
}

result.reserve( n );
for( const Index i: up_to( n ) ) {
result.push_back( from_bytes_<Type>( p_first + i*sizeof(
Type ) ) );
}
return result;
}

// When this function is called the file position is at start of
the optional header.
template< class Pe_types >
void list_exports(
const string& u8_path,
const C_file& f,
const IMAGE_FILE_HEADER& pe_header
)
{
cout << Pe_types::address_width << "-bit DLL." << endl;

using Optional_header = typename Pe_types::Optional_header;
const auto pe_header_opt = read_<Optional_header>( f );

hopefully( IMAGE_DIRECTORY_ENTRY_EXPORT <
pe_header_opt.NumberOfRvaAndSizes )
or fail_<Uix>( ""s << "No exports found in '" << u8_path <<
"'." );

const auto section_headers = invoke( [&]()
-> vector<IMAGE_SECTION_HEADER>
{
vector<IMAGE_SECTION_HEADER> headers;
for( int _: up_to( pe_header.NumberOfSections ) ) {
(void) _; headers.push_back(
read_<IMAGE_SECTION_HEADER>( f ) );
}
return headers;
} );

const auto& dir_info =
pe_header_opt.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];

hopefully( dir_info.Size >= sizeof( IMAGE_EXPORT_DIRECTORY ) )
or fail_<Uix>( "Ungood file: claimed size of export dir
header is too small." );

const IMAGE_SECTION_HEADER& section = invoke( [&]()
{
const auto dir_addr = dir_info.VirtualAddress;
const auto beyond_dir_addr = dir_addr + dir_info.Size;

for( const auto& s: section_headers ) {
const auto s_addr = s.VirtualAddress;
const auto beyond_s_addr = s_addr + s.SizeOfRawData;

if( s_addr <= dir_addr and beyond_dir_addr <=
beyond_s_addr ) {
return s;
}
}
fail_<Uix>( "Ungood file: no section (fully) contains the
export table." );
} );

hopefully( section.SizeOfRawData > 0 )
or fail_<Uix>( "Ungood file: section with export table, is
of length zero." );

const auto addr_to_pos = section.PointerToRawData -
section.VirtualAddress;
const auto dir_position = dir_info.VirtualAddress +
addr_to_pos;
fseek( f, dir_position, SEEK_SET ) >> Is_zero()
or fail_<Uix>( "Ungood file: a seek to the exports table
section failed." );
const auto dir = read_<IMAGE_EXPORT_DIRECTORY>( f );

if( dir.NumberOfFunctions == 0 ) {
cout << "No functions are exported";
} else if( dir.NumberOfFunctions == 1 ) {
cout << "1 function is exported, at ordinal 0";
} else if( dir.NumberOfFunctions > 1 ) {
cout << dir.NumberOfFunctions << " functions are exported"
<< ", at ordinals 0 ... " << dir.NumberOfFunctions - 1;
}
cout << "." << endl;

if( dir.NumberOfFunctions == 0 ) {
return;
}

fseek( f, dir.AddressOfNames + addr_to_pos, SEEK_SET ) >> Is_zero()
or fail_<Uix>( "Ungood file: a seek to the name addresses
table failed." );
const vector<DWORD> name_positions = read_sequence_<DWORD>( f,
dir.NumberOfNames );

vector<string> names;
names.reserve( name_positions.size() );
for( const DWORD name_addr: name_positions ) {
string name;
int ch;
fseek( f, name_addr + addr_to_pos, SEEK_SET ) >> Is_zero()
or fail_<Uix>( "Ungood file: a seek to the an export
name failed." );
while( (ch = fgetc( f )) != EOF and ch != 0 ) {
name += char( ch );
}
names.push_back( name );
}

fseek( f, dir.AddressOfNameOrdinals + addr_to_pos, SEEK_SET )
>> Is_zero()
or fail_<Uix>( "Ungood file: a seek to the ordinals table
failed." );
const vector<WORD> ordinals = read_sequence_<WORD>( f,
dir.NumberOfNames );

cout << string( 72, '-' ) << endl;
for( const int i: up_to( dir.NumberOfNames ) ) {
cout << names[i] << " @" << ordinals[i] << endl;
}
}

void run()
{
const auto args = win_util::Command_line_args();
hopefully( args.count() == 1 )
or fail_<Uix>( "Specify one argument: the DLL filename or
path." );

const fs::path dll_path = args[0];
const string u8_path = cppx::fs_util::utf8_from( dll_path );

const auto f = C_file( tag::Read(), dll_path );

const auto dos_header = read_<IMAGE_DOS_HEADER >( f );
hopefully( dos_header.e_magic == IMAGE_DOS_SIGNATURE )
//0x5A4D, 'MZ' multichar.
or fail_<Uix>( ""s << "No MZ magic number at start of '" <<
u8_path << "'." );

fseek( f, dos_header.e_lfanew, SEEK_SET ) >> Is_zero()
or fail_<Uix>( "fseek to PE header failed" );

const auto pe_signature = read_<DWORD>( f );
hopefully( pe_signature == IMAGE_NT_SIGNATURE )
//0x4550, 'PE' multichar.
or fail_<Uix>( ""s << "No PE magic number in PE header of
'" << u8_path << "'." );

const auto pe_header = read_<IMAGE_FILE_HEADER>( f );
const auto image_kind_spec = peek_<WORD>( f );

switch( image_kind_spec ) {
case IMAGE_NT_OPTIONAL_HDR32_MAGIC: { // 0x10B
list_exports<Pe32_types>( u8_path, f, pe_header );
break;
}
case IMAGE_NT_OPTIONAL_HDR64_MAGIC: { // 0x20B
list_exports<Pe64_types>( u8_path, f, pe_header );
break;
}
default: { // E.g. 0x107 a.k.a.
IMAGE_ROM_OPTIONAL_HDR_MAGIC
fail_<Uix>( "Not a PE32 (32-bit) or PE32+ (64-bit)
file." );
}
};
}
} // namespace app

auto main() -> int
{
$use_std( exception, cerr, endl, clog, ios_base );
$use_cppx( monospaced_bullet_block, description_lines_from );

#ifdef NDEBUG
clog.setstate( ios_base::failbit ); // Suppress trace output.

soft and program

Friday, August 2, 2019

Digest for comp.lang.c++@googlegroups.com - 13 updates in 5 topics

No comments:

Blog Archive

About Me