- Module libraries - 4 Updates
- sizeof(bitfield struct) - 8 Updates
- What does operating on raw bytes mean in a C++ context? - 9 Updates
Paavo Helde <myfirstname@osa.pri.ee>: Nov 04 11:04AM +0200 On 3.11.2018 4:54, Thiago Adams wrote: >> your sources. >> Voila, done. Works fine for some Boost libraries, for example. > The real code will have several files. Of course there are several files. Did you overlook the '*' and 'Select All'? > This solution does not scale > and don't compose well compared with this pragma source. The pragma source proposal has its charm, but there is a slight problem in that it does not exist. I also suspect that it would overlap or conflict with the upcoming C++ modules feature. And it also don't scale in the sense that pretty often the source code needs some special compiler flags, prebuild and postbuild steps, etc, which might depend on the platform, configuration, etc. You cannot replace the myriad of the existing build systems with a single pragma. In simpler cases it would work though. |
David Brown <david.brown@hesbynett.no>: Nov 04 06:30PM +0100 On 02/11/2018 21:45, Thiago Adams wrote: > to something inside the code , many times surrounding by #ifdef. > The suggested feature #pragma source, #pragma includedir etc also > would give similar power but in a standard way (if possible) Libraries come in three varieties. There are header-only libraries - as you say, these are usually fairly easy to handle. There are stand-alone libraries. The usually come with their own build procedures - ./configure, makefiles, CMake, etc. You handle these by following the build instructions for the library. Then there are libraries that come as source code to be compiled and included along with your own project. These are common in the embedded world, where you don't have shared or dynamic libraries. Sometimes these could do with better documentation or information, especially if they have odd restrictions (such as requiring non-standard C behaviour like wrapping signed integers or gcc's -fno-strict-aliasing flag). But generally it is easy to see what source files are needed. Sometimes libraries have particular needs to make them easy to use - such as lists of include file directories or lists of static libraries for linking. The "pkg-config" solution is to have a ".pc" file with the library that gives this information. I think that if you feel these files don't have all the details you need, it would make more sense to extend this existing system instead of trying to invent something new. |
Thiago Adams <thiago.adams@gmail.com>: Nov 04 03:03PM -0800 On Sunday, November 4, 2018 at 7:04:47 AM UTC-2, Paavo Helde wrote: > > The real code will have several files. > Of course there are several files. Did you overlook the '*' and 'Select > All'? The problem with * is that you don't know what files are necessary, and adding many #ifdefs SAMPLE_UNITTEST is a lot of work. Also it is intrusive. The pragma source can be intrusive or not. In the other words, if you have an existing library someone can generate the file that represents the source for that library in a way the orignal library don´t need to be changed. But if it is something standard would be nice to have it for all libraries. This is also something that will not broke old compilers. > The pragma source proposal has its charm, but there is a slight problem > in that it does not exist. I also suspect that it would overlap or > conflict with the upcoming C++ modules feature. C++ modules feature is similar of binary libs. You need to compile for each platform. Macros are also not available at the same form, and it will not be compatible with C. > in the sense that pretty often the source code needs some special > compiler flags, prebuild and postbuild steps, etc, which might depend on > the platform, configuration, etc. Please note that this feature is not a build system and it doesn't replace build systems. This has been a source of confusion. The name of the feature could be "new pragmas" to avoid confusion. We can have specific platform settings. I also would like to have compiler setting inside the source. Something similar of what VC++ has for warnings #pragma warning( push ) #pragma warning( disable : 4705 ) #pragma warning( disable : 4706 ) #pragma warning( disable : 4707 ) // Some code #pragma warning( pop ) Some of the compiler flags could be universal, but this is not necessary. If the compiler had this setting on the source the proposed feature could be used together with #ifdef for an specific compiler. The way it is today, when you specify compiler settings it will be applied for all sources. >You cannot replace the myriad of the > existing build systems with a single pragma. > In simpler cases it would work though. Usually my build is for windows, and I have to sign the executable, generate documentation, tests, run the installer program,sign the installer and publish. Also I have to do some parts on different computers (mac). This feature will not replace this build. |
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 06:24PM -0500 Thiago Adams wrote: > I want copy-paste libraries from their repository > (with source) and use in my projects as easily as if > they were header only libraries or amalgamated. This problem is quite old. One issue that will arise with a naive solution is that the subset of source files needed to build a program or library changes as new versions appear. Also, the "universal subset" (i.e. the total set of sources) is usually a set of files by some version control system (VCS) so it is not surprising the first (known to me) working solution appeared as part of (one of the very first) VCS, specifically SCCS. You can declare source file in the file itself (you will usually do it automatically as you create a new source, probably using same procedure that adds a copyright header); then use "sccs what" command on a binary and get a set of all sources. You do not have to use SCCS for actual version control; although using some keyword-expanding vcs like CVS,Subversion or RCS provides additional benefits in that you can grab not only the complete file set but also the exact versions with which a binary was built. The syntax for this or similar feature is unfortunately different on different platforms but it often does involve pragmas. E.g. sometimes as #sccs directive (works on most Unices), #ident (works on gcc), #pragma comment (MSVC), #pragma ident etc. See e.g. https://stackoverflow.com/questions/15773282/access-ident-information-in-an-executable for some example. > The proposal is here > https://github.com/thradams/Modules/blob/master/README.md > Basically, we can add #pragma source in your source code According to what rule will you decide to which source file add the pragmas? (I mean, you don't want to add #pragma source "..\Scr\ConsoleWin.c" to every other C++ file, do you? Are you going to designate "a single source file per program or library to contain them all"? If yes, why does it even has to be a C/C++ file -- its role is clearly providing list of files rather than C/C++ code). > -------- > to compile this program: (if it was part of the compiler) > ccompiler MyProgram.c -Pavel |
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 04 03:59AM -0800 I had something happen yesterday that surprised me. I was using a bitfield struct: #define u32 uint32_t #define u16 uint16_t struct STime { u32 seconds2 : 5; // double-seconds 0-29 u32 minute : 6; // 0-59 u32 hour : 5; // 0-23 // Total = 16 bits }; And in another block of code I had this structure used in a union: union { u16 raw_time; // Time in bit-encoded form STime time; // Time structure for member access }; However, it was expanding the sizeof(STime) to 4-bytes, and was making the union be 4-bytes instead of 2-bytes. I was not ex- pecting that. I expected the sizeof(STime) to be the actual size of the bits, and not what they expand to. Is this normal in C++? It seems unnatural to expand the union size to the derived type sizes, rather than the actual bit size. -- Rick C. Hodgin |
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 02:43PM -0500 Rick C. Hodgin wrote: > size of the bits, and not what they expand to. > Is this normal in C++? It seems unnatural to expand the union > size to the derived type sizes, rather than the actual bit size. C++ object representation may include the implementation-specific number of padding bits (see e.g. 6.7 in the latest standard draft, also 8.5.2.3 specifically about sizeof). On my system, even your STime struct has sizeof of 4: ------------------------- #include <cstdint> #include <iostream> using namespace std; typedef uint32_t u32; struct STime { u32 seconds2 : 5; // double-seconds 0-29 u32 minute : 6; // 0-59 u32 hour : 5; // 0-23 // Total = 16 bits }; int main(int, char*[]) { cout << "sizeof(STime)=" << sizeof(STime) << endl; return 0; } ------ result: sizeof(STime)=4 -------- Compilers may provide pragrmas attributes to change padding see e.g. docs for gcc "packed" attribute. HTH -Pavel |
bitrex <user@example.net>: Nov 04 02:50PM -0500 On 11/04/2018 06:59 AM, Rick C. Hodgin wrote: > size of the bits, and not what they expand to. > Is this normal in C++? It seems unnatural to expand the union > size to the derived type sizes, rather than the actual bit size. why would it make sense for "sizeof" to return bitfield struct sizes which aren't word-aligned when AFAIK it's at best implementation-dependent whether bitfield structs less than the native word size can be packed and/or allocated across word-boundaries. Given that "sizeof" is usually used for the purposes of calculating allocation sizes and not general-informational purposes |
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 04 12:19PM -0800 On Sunday, November 4, 2018 at 2:43:41 PM UTC-5, Pavel wrote: > padding bits (see e.g. 6.7 in the latest standard draft, also 8.5.2.3 > specifically about sizeof). > On my system, even your STime struct has sizeof of 4: Correct. I had to change the values from u32 to u16 to get it to have a 2-byte representation within the union. That's the part I wasn't expecting. > Compilers may provide pragrmas attributes to change padding see e.g. docs for > gcc "packed" attribute. This was compiled in MSVC++, and I double-checked to make sure I had the struct alignment set to bytes. I'm really surprised that bit structs are expanded to the largest member of their types, and not represented by the size of their bit encoding. I actually consider it to be a flaw in C/C++ to do it that way. -- Rick C. Hodgin |
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 04 12:26PM -0800 On Sunday, November 4, 2018 at 2:50:36 PM UTC-5, bitrex wrote: > word size can be packed and/or allocated across word-boundaries. Given > that "sizeof" is usually used for the purposes of calculating allocation > sizes and not general-informational purposes I could see asking for sizeof(STime.hour) and having it return 4 because its type is a 32-bit quantity. But for the sizeof(STime) it's not 4. Each STime structure is only two bytes. And changing the member values from u32 to u16 now made the structure be 2 bytes again. Additionally, if you did it thusly: STime* t = get_a_valid_t_array_of_at_least_10_elements(); for (int i = 0; i < 10; ++i) ++t; It's going to increase by 2 bytes per iteration. You couldn't change that code to something like this and have it work properly if your STime members were u32. But, if you change them to u16, then this code would work properly: for (int i = 0; i < 10; ++i) t = (STime*)((char*)t + sizeof(STime)); The value for t would be skewed if STime's members were u32 in- stead of u16. -- Rick C. Hodgin |
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 05:37PM -0500 Rick C. Hodgin wrote: > Correct. I had to change the values from u32 to u16 to get it to > have a 2-byte representation within the union. That's the part I > wasn't expecting. This is also not so for gcc. When I add the "packed" attribute (for gcc), the sizeof of your struct with u32 bit fields becomes 2: $ cat sbf.cpp #include <cstdint> #include <iostream> using namespace std; typedef uint32_t u32; struct STime { u32 seconds2 : 5 __attribute__ ((packed)); // double-seconds 0-29 u32 minute : 6 __attribute__ ((packed)); // 0-59 u32 hour : 5 __attribute__ ((packed)); // 0-23 // Total = 16 bits }; int main(int, char*[]) { cout << "sizeof(STime)=" << sizeof(STime) << endl; return 0; } $ g++ -std=c++11 ./sbf.cpp $ ./a.out sizeof(STime)=2 $ > had the struct alignment set to bytes. > I'm really surprised that bit structs are expanded to the largest > member of their types, Note this seems just an implementation choice of MSVC as per the results of my test above. > and not represented by the size of their > bit encoding. I actually consider it to be a flaw in C/C++ to do > it that way. C/C++ as such does not define whether or how many padding bits are added, it is implementation-specific. New (2011+) standard, however, does provide some more tools to control alignment e.g. "alignas" specifier but it cannot make alignment weaker. To regain confidence in how your structs are aligned, you might consider using static asserts. E.g. (adopted from http://www.catb.org/esr/structure-packing/) static_assert(sizeof(struct STime) == 2, "Check your assumptions"); HTH -Pavel |
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Nov 04 02:50PM -0800 On Sunday, November 4, 2018 at 5:38:03 PM UTC-5, Pavel wrote: > static_assert(sizeof(struct STime) == 2, "Check your assumptions"); > HTH > -Pavel Definitely helps. Thank you, Pavel. -- Rick C. Hodgin |
David Brown <david.brown@hesbynett.no>: Nov 05 12:05AM +0100 On 04/11/2018 12:59, Rick C. Hodgin wrote: > u32 hour : 5; // 0-23 > // Total = 16 bits > }; When you write "u32 seconds2 : 5;", what you are saying is "make seconds2 5 bits of a u32". So the struct STime has a u32, gives 5 bits to seconds2, 6 bits to minute, and 5 bits to hour. The remaining 16 bits are unused - but they still take up space in the struct, and still affect alignment. If you want this all to be within a 16-bit struct, use u16 (or uint16_t). (Compilers may let you reduce this with extra features like "packed" attributes or pragmas.) Remember that there is quite a lot about bit-fields that are implementation-specific. That might be fine for you, but you will have to check that they work as expected on the compilers you use. The rules I am giving here are for C, rather than C++ - I expect them to be roughly the same for C++, but I am not familiar enough to be sure. (Hopefully someone will correct me if I'm wrong.) First, the type should be _Bool (bool for C++), signed int, unsigned int, or an implementation-defined type. This means that u16, which is "unsigned short" on most platforms, may not be supported. In practice most modern compilers /will/ support it, but as I say - check. The order of bit-fields packing is up to the implementation. On most little-endian systems, ordering is from least-significant-bit onwards but that is not guaranteed. AFAIK on very old versions of MS C compilers, the order was most-significant-bit first, and then they changed it for the next version of the compiler. There was a lot of wailing at the time, but it has been consistent since then. If you have something like: uint16_t field1 : 10; uint16_t field2 : 10; it is implementation-defined if part of field2 is included in the first uint16_t, or if it is all put in the second uint16_t. Bit-fields can be very useful, but you have to take care, especially if you want them to work across compilers or targets. > size of the bits, and not what they expand to. > Is this normal in C++? It seems unnatural to expand the union > size to the derived type sizes, rather than the actual bit size. The size (and alignment) comes from the size of the underlying type, which is 32-bit in this case. |
"Öö Tiib" <ootiib@hot.ee>: Nov 03 04:45PM -0700 On Saturday, 3 November 2018 22:53:14 UTC+2, Paul wrote: > SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t > I'm confused by the instruction: "Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing." > What does "raw bytes" mean in terms of the input/output parameters. Perhaps "raw bytes" is meant as "unencoded bytes". It is advice to operate on unencoded bytes instead of hex encoded or base64 encoded textual representations. > Presumably it means I shouldn't have a const std::string& as input > and a std::string as output? I don't think that it was meant. > Does anyone know what it means to translate hex to base 64 "by operating > on raw bytes" in a C++ context? Perhaps it just means that you need to translate hex to bytes and those bytes to base64 ... not to attempt to translate hex text directly to base64 text. |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 04 12:21AM > SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t > I'm confused by the instruction: "Always operate on raw bytes, never > on encoded strings. Only use hex and base64 for pretty-printing." It's probably because the site is language agnostic. You really would care what this mean if you were using, say, Python. > What does "raw bytes" mean in terms of the input/output parameters. > Presumably it means I shouldn't have a const std::string& as input > and a std::string as output? I disagree with the advice you've had that std::string is OK for this sort of work. You might get away with it for this first task, but zero bytes can be a problem in std::string objects. I'd use std::vector<unsigned char>. The unsigned is to smooth the way for arithmetic and bit operations. -- Ben. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 04 02:22AM +0100 On 04.11.2018 01:21, Ben Bacarisse wrote: > I disagree with the advice you've had that std::string is OK for this > sort of work. You might get away with it for this first task, but zero > bytes can be a problem in std::string objects. `std::string` has no problem with zero-bytes. Perhaps you're thinking of using `.c_str()` to convert to C-string. That's a different string representation, that does have such a problem. > I'd use std::vector<unsigned char>. The unsigned is to smooth the way > for arithmetic and bit operations. I think I'd also use a vector of traditional byte type, `unsigned char`. But there's no /technical/ problem with using `std::string`. After all, if it's good enough for this for Google, it's good enough, even though other considerations IMO make it a less than perfect choice. Those other considerations include that the default item type, `char`, is typically signed, which needs more conversion operations sprinkled in the code, which is an invitation to bugs to enter please, free admission. And judging by what I've seen of questions about this, the non-technical considerations include that it's easy for novices to get confused about whether a string represents binary data or text. Cheers!, - Alf |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Nov 04 02:57AM > Perhaps you're thinking of using `.c_str()` to convert to C-string. > That's a different string representation, that does have such a > problem. That's a part of it, yes, though I was thinking in more general terms about the interaction between std::string and null-terminated character arrays. The std::string API uses a lot of CharT * parameters that are taken to be null-terminated. Even trying to initialise a std::string with a null-containing array can trip up the unwary. It's all manageable with a few simple rules, but I don't see the point for cryptographic manipulation. You won't be using the specifically string-oriented parts of the std::string interface. <snip> -- Ben. |
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 12:49AM -0400 Alf P. Steinbach wrote: >> for arithmetic and bit operations. > I think I'd also use a vector of traditional byte type, `unsigned char`. But > there's no /technical/ problem with using `std::string`. In practice, organizations still use cow version of "std::string" which is less efficient than vector in both memory and time especially for short strings -- unless its cow feature is needed. Regardless, if I get to choose the API for this facility, I would probably selected a traditional algorithm approach i.e. something like: template <typename InIter, typename OutIter> OutIter ReEncodeHexToBase64(InIter beg, InIter end, OutIter out); This way, you can use it to produce result in whichever container or stream your need it, with no intermediate copying. Just my 2c -Pavel |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Nov 04 09:42AM +0100 On 04.11.2018 05:49, Pavel wrote: > In practice, organizations still use cow version of "std::string" which is less > efficient than vector in both memory and time especially for short strings -- > unless its cow feature is needed. You mean, less efficient unless at some point it's copied. Have you timed this, for an optimized build? Cheers!, - Alf |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Nov 04 05:08PM On Sat, 2018-11-03, Paul wrote: > 6c696b65206120706f69736f6e6f7573206d757368726f6f6d > should produce > SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t It's a confusion in terminology, and in levels of abstraction. You (or that site) says the source is "hex", but hexadecimal notation is just a sometimes convenient way of visualizing numbers as text. It's common to think of memory as a sequence of bytes ("raw bytes), and to visualize them as hex, but that doesn't mean memory /is/ hex. > I'm confused by the instruction: "Always operate on raw bytes, never > on encoded strings. Only use hex and base64 for pretty-printing." Me too. I can only guess what that means (unless it's Python-specific like someone implied). Let's formulate a better exercise: Base64 encodes a sequence of 8-bit bytes[0] as ASCII[1] text in a fairly compact manner, according to RFC <something>. Implement it, as one of the functions: void encode(std::ostream& os, const void* data, std::size_t len); // *it must be something that can be cast to unsigned char template<class FwdIterator> void encode(std::ostream& os, FwdIterator begin, FwdIterator end); Also write unit tests. /Jorgen [0] IIRC you can encode a sequence of bits too, e.g. four or 81 bits, but I think you can ignore that possibility. [1] Perhaps one shouldn't assume ASCII ... -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 01:43PM -0500 Alf P. Steinbach wrote: >> efficient than vector in both memory and time especially for short strings -- >> unless its cow feature is needed. > You mean, less efficient unless at some point it's copied. Almost, to be precised "unless it's copied and not changed thereafter". For example, some people pass string by value instead of const reference in function parameters to save an indirection (which is a big part of why they got stuck with cow string). > Have you timed this, for an optimized build? Not recently, no. I did some 9-10 years ago while building some symbol store. I only recall that the results were largely consistent with my expectations; but the winner was neither vector nor string but a custom-built fixed-size string, copied by value (again, I was mainly concerned with short strings at the time). > Cheers!, > - Alf -Pavel |
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Nov 04 01:52PM -0500 Paul wrote: > should produce > SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t > I'm confused by the instruction: "Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing." My guess is that they meant to emphasize that the solution should base64-encode the bytes decoded from hex encoding rather than hex-encoded bytes themselves. This seems like stating the obvious; but is it possible some takers just jumped into 64-bit encoding and forgot to hex-decode? > on raw bytes" in a C++ context? > Thank you, > Paul HTH -Pavel |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment