soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Module libraries - 4 Updates
sizeof(bitfield struct) - 8 Updates
What does operating on raw bytes mean in a C++ context? - 9 Updates

Paavo Helde <myfirstname@osa.pri.ee>: Nov 04 11:04AM +0200

On 3.11.2018 4:54, Thiago Adams wrote:
>> your sources.

>> Voila, done. Works fine for some Boost libraries, for example.

> The real code will have several files.

Of course there are several files. Did you overlook the '*' and 'Select
All'?

> This solution does not scale
> and don't compose well compared with this pragma source.

The pragma source proposal has its charm, but there is a slight problem
in that it does not exist. I also suspect that it would overlap or
conflict with the upcoming C++ modules feature. And it also don't scale
in the sense that pretty often the source code needs some special
compiler flags, prebuild and postbuild steps, etc, which might depend on
the platform, configuration, etc. You cannot replace the myriad of the
existing build systems with a single pragma.

In simpler cases it would work though.

David Brown <david.brown@hesbynett.no>: Nov 04 06:30PM +0100

On 02/11/2018 21:45, Thiago Adams wrote:
> to something inside the code , many times surrounding by #ifdef.
> The suggested feature #pragma source, #pragma includedir etc also
> would give similar power but in a standard way (if possible)

Libraries come in three varieties.

There are header-only libraries - as you say, these are usually fairly
easy to handle.

There are stand-alone libraries. The usually come with their own build
procedures - ./configure, makefiles, CMake, etc. You handle these by
following the build instructions for the library.

Then there are libraries that come as source code to be compiled and
included along with your own project. These are common in the embedded
world, where you don't have shared or dynamic libraries. Sometimes
these could do with better documentation or information, especially if
they have odd restrictions (such as requiring non-standard C behaviour
like wrapping signed integers or gcc's -fno-strict-aliasing flag). But
generally it is easy to see what source files are needed.

Sometimes libraries have particular needs to make them easy to use -
such as lists of include file directories or lists of static libraries
for linking. The "pkg-config" solution is to have a ".pc" file with the
library that gives this information. I think that if you feel these
files don't have all the details you need, it would make more sense to
extend this existing system instead of trying to invent something new.

Thiago Adams <thiago.adams@gmail.com>: Nov 04 03:03PM -0800

On Sunday, November 4, 2018 at 7:04:47 AM UTC-2, Paavo Helde wrote:

> > The real code will have several files.

> Of course there are several files. Did you overlook the '*' and 'Select
> All'?

The problem with * is that you don't know what files
are necessary, and adding many #ifdefs SAMPLE_UNITTEST
is a lot of work. Also it is intrusive. The pragma source
can be intrusive or not. In the other words, if you have
an existing library someone can generate the file that
represents the source for that library in a way the orignal
library don´t need to be changed. But if it is something standard
would be nice to have it for all libraries. This is also something
that will not broke old compilers.

> The pragma source proposal has its charm, but there is a slight problem
> in that it does not exist. I also suspect that it would overlap or
> conflict with the upcoming C++ modules feature.
C++ modules feature is similar of binary libs. You need to compile
for each platform. Macros are also not available at the same form,
and it will not be compatible with C.

> in the sense that pretty often the source code needs some special
> compiler flags, prebuild and postbuild steps, etc, which might depend on
> the platform, configuration, etc.

Please note that this feature is not a build system and it doesn't
replace build systems. This has been a source of confusion. The
name of the feature could be "new pragmas" to avoid confusion.
We can have specific platform settings. I also would like to have
compiler setting inside the source. Something similar of what VC++ has
for warnings

#pragma warning( push )
#pragma warning( disable : 4705 )
#pragma warning( disable : 4706 )
#pragma warning( disable : 4707 )

// Some code

#pragma warning( pop )

Some of the compiler flags could be universal, but this
is not necessary. If the compiler had this setting on the source
the proposed feature could be used together with #ifdef for
an specific compiler.
The way it is today, when you specify compiler settings it will be
applied for all sources.

>You cannot replace the myriad of the
> existing build systems with a single pragma.

> In simpler cases it would work though.

Usually my build is for windows, and I have to sign the executable,
generate documentation, tests, run the installer program,sign the installer
and publish. Also I have to do some parts on different computers (mac).
This feature will not replace this build.

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 06:24PM -0500

Thiago Adams wrote:

> I want copy-paste libraries from their repository
> (with source) and use in my projects as easily as if
> they were header only libraries or amalgamated.
This problem is quite old. One issue that will arise with a naive solution is
that the subset of source files needed to build a program or library changes as
new versions appear. Also, the "universal subset" (i.e. the total set of
sources) is usually a set of files by some version control system (VCS) so it is
not surprising the first (known to me) working solution appeared as part of (one
of the very first) VCS, specifically SCCS.

You can declare source file in the file itself (you will usually do it
automatically as you create a new source, probably using same procedure that
adds a copyright header); then use "sccs what" command on a binary and get a set
of all sources. You do not have to use SCCS for actual version control; although
using some keyword-expanding vcs like CVS,Subversion or RCS provides additional
benefits in that you can grab not only the complete file set but also the exact
versions with which a binary was built.

The syntax for this or similar feature is unfortunately different on different
platforms but it often does involve pragmas. E.g. sometimes as #sccs directive
(works on most Unices), #ident (works on gcc), #pragma comment (MSVC), #pragma
ident etc.

See e.g.
https://stackoverflow.com/questions/15773282/access-ident-information-in-an-executable
for some example.

> The proposal is here

> https://github.com/thradams/Modules/blob/master/README.md

> Basically, we can add #pragma source in your source code
According to what rule will you decide to which source file add the pragmas? (I
mean, you don't want to add
#pragma source "..\Scr\ConsoleWin.c"
to every other C++ file, do you? Are you going to designate "a single source
file per program or library to contain them all"? If yes, why does it even has
to be a C/C++ file -- its role is clearly providing list of files rather than
C/C++ code).

> --------

> to compile this program: (if it was part of the compiler)

> ccompiler MyProgram.c

-Pavel

sizeof(bitfield struct)

"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 04 03:59AM -0800

I had something happen yesterday that surprised me. I was using a
bitfield struct:

#define u32 uint32_t
#define u16 uint16_t

struct STime
{
u32 seconds2 : 5; // double-seconds 0-29
u32 minute : 6; // 0-59
u32 hour : 5; // 0-23
// Total = 16 bits
};

And in another block of code I had this structure used in a union:

union
{
u16 raw_time; // Time in bit-encoded form
STime time; // Time structure for member access
};

However, it was expanding the sizeof(STime) to 4-bytes, and was
making the union be 4-bytes instead of 2-bytes. I was not ex-
pecting that. I expected the sizeof(STime) to be the actual
size of the bits, and not what they expand to.

Is this normal in C++? It seems unnatural to expand the union
size to the derived type sizes, rather than the actual bit size.

--
Rick C. Hodgin

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 02:43PM -0500

Rick C. Hodgin wrote:
> size of the bits, and not what they expand to.

> Is this normal in C++? It seems unnatural to expand the union
> size to the derived type sizes, rather than the actual bit size.

C++ object representation may include the implementation-specific number of
padding bits (see e.g. 6.7 in the latest standard draft, also 8.5.2.3
specifically about sizeof).

On my system, even your STime struct has sizeof of 4:
-------------------------
#include <cstdint>
#include <iostream>
using namespace std;

typedef uint32_t u32;

struct STime {
u32 seconds2 : 5; // double-seconds 0-29
u32 minute : 6; // 0-59
u32 hour : 5; // 0-23
// Total = 16 bits
};

int
main(int, char*[])
{
cout << "sizeof(STime)=" << sizeof(STime) << endl;
return 0;
}

------ result:
sizeof(STime)=4
--------

Compilers may provide pragrmas attributes to change padding see e.g. docs for
gcc "packed" attribute.

HTH
-Pavel

bitrex <user@example.net>: Nov 04 02:50PM -0500

On 11/04/2018 06:59 AM, Rick C. Hodgin wrote:
> size of the bits, and not what they expand to.

> Is this normal in C++? It seems unnatural to expand the union
> size to the derived type sizes, rather than the actual bit size.

why would it make sense for "sizeof" to return bitfield struct sizes
which aren't word-aligned when AFAIK it's at best
implementation-dependent whether bitfield structs less than the native
word size can be packed and/or allocated across word-boundaries. Given
that "sizeof" is usually used for the purposes of calculating allocation
sizes and not general-informational purposes

"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 04 12:19PM -0800

On Sunday, November 4, 2018 at 2:43:41 PM UTC-5, Pavel wrote:
> padding bits (see e.g. 6.7 in the latest standard draft, also 8.5.2.3
> specifically about sizeof).

> On my system, even your STime struct has sizeof of 4:

Correct. I had to change the values from u32 to u16 to get it to
have a 2-byte representation within the union. That's the part I
wasn't expecting.

> Compilers may provide pragrmas attributes to change padding see e.g. docs for
> gcc "packed" attribute.

This was compiled in MSVC++, and I double-checked to make sure I
had the struct alignment set to bytes.

I'm really surprised that bit structs are expanded to the largest
member of their types, and not represented by the size of their
bit encoding. I actually consider it to be a flaw in C/C++ to do
it that way.

--
Rick C. Hodgin

"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 04 12:26PM -0800

On Sunday, November 4, 2018 at 2:50:36 PM UTC-5, bitrex wrote:
> word size can be packed and/or allocated across word-boundaries. Given
> that "sizeof" is usually used for the purposes of calculating allocation
> sizes and not general-informational purposes

I could see asking for sizeof(STime.hour) and having it return 4
because its type is a 32-bit quantity.

But for the sizeof(STime) it's not 4. Each STime structure is
only two bytes. And changing the member values from u32 to u16
now made the structure be 2 bytes again.

Additionally, if you did it thusly:

STime* t = get_a_valid_t_array_of_at_least_10_elements();

for (int i = 0; i < 10; ++i)
++t;

It's going to increase by 2 bytes per iteration. You couldn't
change that code to something like this and have it work properly
if your STime members were u32. But, if you change them to u16,
then this code would work properly:

for (int i = 0; i < 10; ++i)
t = (STime*)((char*)t + sizeof(STime));

The value for t would be skewed if STime's members were u32 in-
stead of u16.

--
Rick C. Hodgin

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 05:37PM -0500

Rick C. Hodgin wrote:

> Correct. I had to change the values from u32 to u16 to get it to
> have a 2-byte representation within the union. That's the part I
> wasn't expecting.
This is also not so for gcc. When I add the "packed" attribute (for gcc), the
sizeof of your struct with u32 bit fields becomes 2:

$ cat sbf.cpp
#include <cstdint>
#include <iostream>
using namespace std;

typedef uint32_t u32;

struct STime {
u32 seconds2 : 5 __attribute__ ((packed)); // double-seconds 0-29
u32 minute : 6 __attribute__ ((packed)); // 0-59
u32 hour : 5 __attribute__ ((packed)); // 0-23
// Total = 16 bits
};

int
main(int, char*[])
{
cout << "sizeof(STime)=" << sizeof(STime) << endl;
return 0;
}
$ g++ -std=c++11 ./sbf.cpp
$ ./a.out
sizeof(STime)=2
$

> had the struct alignment set to bytes.

> I'm really surprised that bit structs are expanded to the largest
> member of their types,
Note this seems just an implementation choice of MSVC as per the results of my
test above.
> and not represented by the size of their
> bit encoding. I actually consider it to be a flaw in C/C++ to do
> it that way.
C/C++ as such does not define whether or how many padding bits are added, it is
implementation-specific. New (2011+) standard, however, does provide some more
tools to control alignment e.g. "alignas" specifier but it cannot make alignment
weaker.

To regain confidence in how your structs are aligned, you might consider using
static asserts. E.g. (adopted from http://www.catb.org/esr/structure-packing/)

static_assert(sizeof(struct STime) == 2, "Check your assumptions");

HTH
-Pavel

"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 04 02:50PM -0800

On Sunday, November 4, 2018 at 5:38:03 PM UTC-5, Pavel wrote:

> static_assert(sizeof(struct STime) == 2, "Check your assumptions");

> HTH
> -Pavel

Definitely helps. Thank you, Pavel.

--
Rick C. Hodgin

David Brown <david.brown@hesbynett.no>: Nov 05 12:05AM +0100

On 04/11/2018 12:59, Rick C. Hodgin wrote:
> u32 hour : 5; // 0-23
> // Total = 16 bits
> };

When you write "u32 seconds2 : 5;", what you are saying is "make
seconds2 5 bits of a u32". So the struct STime has a u32, gives 5 bits
to seconds2, 6 bits to minute, and 5 bits to hour. The remaining 16
bits are unused - but they still take up space in the struct, and still
affect alignment.

If you want this all to be within a 16-bit struct, use u16 (or uint16_t).

(Compilers may let you reduce this with extra features like "packed"
attributes or pragmas.)

Remember that there is quite a lot about bit-fields that are
implementation-specific. That might be fine for you, but you will have
to check that they work as expected on the compilers you use. The rules
I am giving here are for C, rather than C++ - I expect them to be
roughly the same for C++, but I am not familiar enough to be sure.
(Hopefully someone will correct me if I'm wrong.)

First, the type should be _Bool (bool for C++), signed int, unsigned
int, or an implementation-defined type. This means that u16, which is
"unsigned short" on most platforms, may not be supported. In practice
most modern compilers /will/ support it, but as I say - check.

The order of bit-fields packing is up to the implementation. On most
little-endian systems, ordering is from least-significant-bit onwards
but that is not guaranteed. AFAIK on very old versions of MS C
compilers, the order was most-significant-bit first, and then they
changed it for the next version of the compiler. There was a lot of
wailing at the time, but it has been consistent since then.

If you have something like:

uint16_t field1 : 10;
uint16_t field2 : 10;

it is implementation-defined if part of field2 is included in the first
uint16_t, or if it is all put in the second uint16_t.

Bit-fields can be very useful, but you have to take care, especially if
you want them to work across compilers or targets.

> size of the bits, and not what they expand to.

> Is this normal in C++? It seems unnatural to expand the union
> size to the derived type sizes, rather than the actual bit size.

The size (and alignment) comes from the size of the underlying type,
which is 32-bit in this case.

What does operating on raw bytes mean in a C++ context?

"Öö Tiib" <ootiib@hot.ee>: Nov 03 04:45PM -0700

On Saturday, 3 November 2018 22:53:14 UTC+2, Paul wrote:
> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

> I'm confused by the instruction: "Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing."

> What does "raw bytes" mean in terms of the input/output parameters.

Perhaps "raw bytes" is meant as "unencoded bytes".
It is advice to operate on unencoded bytes instead of hex
encoded or base64 encoded textual representations.

> Presumably it means I shouldn't have a const std::string& as input
> and a std::string as output?

I don't think that it was meant.

> Does anyone know what it means to translate hex to base 64 "by operating
> on raw bytes" in a C++ context?

Perhaps it just means that you need to translate hex to bytes and
those bytes to base64 ... not to attempt to translate hex text
directly to base64 text.

Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 04 12:21AM

> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

> I'm confused by the instruction: "Always operate on raw bytes, never
> on encoded strings. Only use hex and base64 for pretty-printing."

It's probably because the site is language agnostic. You really would
care what this mean if you were using, say, Python.

> What does "raw bytes" mean in terms of the input/output parameters.
> Presumably it means I shouldn't have a const std::string& as input
> and a std::string as output?

I disagree with the advice you've had that std::string is OK for this
sort of work. You might get away with it for this first task, but zero
bytes can be a problem in std::string objects.

I'd use std::vector<unsigned char>. The unsigned is to smooth the way
for arithmetic and bit operations.

--
Ben.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 04 02:22AM +0100

On 04.11.2018 01:21, Ben Bacarisse wrote:

> I disagree with the advice you've had that std::string is OK for this
> sort of work. You might get away with it for this first task, but zero
> bytes can be a problem in std::string objects.

`std::string` has no problem with zero-bytes.

Perhaps you're thinking of using `.c_str()` to convert to C-string.

That's a different string representation, that does have such a problem.

> I'd use std::vector<unsigned char>. The unsigned is to smooth the way
> for arithmetic and bit operations.

I think I'd also use a vector of traditional byte type, `unsigned char`.
But there's no /technical/ problem with using `std::string`.

After all, if it's good enough for this for Google, it's good enough,
even though other considerations IMO make it a less than perfect choice.
Those other considerations include that the default item type, `char`,
is typically signed, which needs more conversion operations sprinkled in
the code, which is an invitation to bugs to enter please, free
admission. And judging by what I've seen of questions about this, the
non-technical considerations include that it's easy for novices to get
confused about whether a string represents binary data or text.

Cheers!,

- Alf

Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 04 02:57AM

> Perhaps you're thinking of using `.c_str()` to convert to C-string.

> That's a different string representation, that does have such a
> problem.

That's a part of it, yes, though I was thinking in more general terms
about the interaction between std::string and null-terminated character
arrays. The std::string API uses a lot of CharT * parameters that are
taken to be null-terminated. Even trying to initialise a std::string
with a null-containing array can trip up the unwary.

It's all manageable with a few simple rules, but I don't see the point
for cryptographic manipulation. You won't be using the specifically
string-oriented parts of the std::string interface.

<snip>
--
Ben.

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 12:49AM -0400

Alf P. Steinbach wrote:
>> for arithmetic and bit operations.

> I think I'd also use a vector of traditional byte type, `unsigned char`. But
> there's no /technical/ problem with using `std::string`.
In practice, organizations still use cow version of "std::string" which is less
efficient than vector in both memory and time especially for short strings --
unless its cow feature is needed. Regardless, if I get to choose the API for
this facility, I would probably selected a traditional algorithm approach i.e.
something like:

template <typename InIter, typename OutIter>
OutIter ReEncodeHexToBase64(InIter beg, InIter end, OutIter out);

This way, you can use it to produce result in whichever container or stream your
need it, with no intermediate copying.

Just my 2c
-Pavel

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 04 09:42AM +0100

On 04.11.2018 05:49, Pavel wrote:
> In practice, organizations still use cow version of "std::string" which is less
> efficient than vector in both memory and time especially for short strings --
> unless its cow feature is needed.

You mean, less efficient unless at some point it's copied.

Have you timed this, for an optimized build?

Cheers!,

- Alf

Jorgen Grahn <grahn+nntp@snipabacken.se>: Nov 04 05:08PM

On Sat, 2018-11-03, Paul wrote:
> 6c696b65206120706f69736f6e6f7573206d757368726f6f6d

> should produce
> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

It's a confusion in terminology, and in levels of abstraction.

You (or that site) says the source is "hex", but hexadecimal notation
is just a sometimes convenient way of visualizing numbers as text.

It's common to think of memory as a sequence of bytes ("raw bytes),
and to visualize them as hex, but that doesn't mean memory /is/ hex.

> I'm confused by the instruction: "Always operate on raw bytes, never
> on encoded strings. Only use hex and base64 for pretty-printing."

Me too. I can only guess what that means (unless it's Python-specific
like someone implied).

Let's formulate a better exercise:

Base64 encodes a sequence of 8-bit bytes[0] as ASCII[1] text in a
fairly compact manner, according to RFC <something>. Implement it,
as one of the functions:

void encode(std::ostream& os, const void* data, std::size_t len);

// *it must be something that can be cast to unsigned char
template<class FwdIterator>
void encode(std::ostream& os, FwdIterator begin, FwdIterator end);

Also write unit tests.

/Jorgen

[0] IIRC you can encode a sequence of bits too, e.g. four or 81 bits,
but I think you can ignore that possibility.

[1] Perhaps one shouldn't assume ASCII ...

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 01:43PM -0500

Alf P. Steinbach wrote:
>> efficient than vector in both memory and time especially for short strings --
>> unless its cow feature is needed.

> You mean, less efficient unless at some point it's copied.
Almost, to be precised "unless it's copied and not changed thereafter". For
example, some people pass string by value instead of const reference in function
parameters to save an indirection (which is a big part of why they got stuck
with cow string).

> Have you timed this, for an optimized build?
Not recently, no. I did some 9-10 years ago while building some symbol store. I
only recall that the results were largely consistent with my expectations; but
the winner was neither vector nor string but a custom-built fixed-size string,
copied by value (again, I was mainly concerned with short strings at the time).

> Cheers!,

> - Alf

-Pavel

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 01:52PM -0500

Paul wrote:

> should produce
> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

> I'm confused by the instruction: "Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing."
My guess is that they meant to emphasize that the solution should base64-encode
the bytes decoded from hex encoding rather than hex-encoded bytes themselves.
This seems like stating the obvious; but is it possible some takers just jumped
into 64-bit encoding and forgot to hex-decode?
> on raw bytes" in a C++ context?

> Thank you,

> Paul

HTH
-Pavel

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Sunday, November 4, 2018

Digest for comp.lang.c++@googlegroups.com - 21 updates in 3 topics

No comments:

Blog Archive

About Me