soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Structured binding considered harmful - 13 Updates
My own encryption program - 1 Update
Bind const L-value reference to function return value - 11 Updates

Sam <sam@email-scan.com>: Mar 18 06:30AM -0400

Bonita Montero writes:

>> tuples will also work, if it becomes necessary. You'll be surprised to learn
>> that the above line of code will remain unchanged.

> This ease doesn't justify the lesser readability.

I find it quite the opposite. Verbose code that spells out the data type of
each variable results in too much visual clutter that's taken up by minor
details that are largely irrelevant to the task at hand. When trying to
understand existing code it's more readable to me when the bulk of it
describes what it does, instead of how it does it.

Bonita Montero <Bonita.Montero@gmail.com>: Mar 18 11:38AM +0100

> I find it quite the opposite. Verbose code that spells out the data type
> of each variable results in too much visual clutter that's taken up by
> minor details that are largely irrelevant to the task at hand. ...

The details aren't minor if you aren't the writer of the code or you've
not looked for the code for a long time. I can understand the guy that
told in the auto-thread that he suffered reading code from someone else
who overly used auto.

Sam <sam@email-scan.com>: Mar 18 08:17AM -0400

Bonita Montero writes:

>> details that are largely irrelevant to the task at hand. ...

> The details aren't minor if you aren't the writer of the code or you've
> not looked for the code for a long time.

Even if I did not write the code,

for (auto &rectangle:container)
draw(rectangle);

Tells me everything I need to know, without the visual noise of whether
"container" is a set or a vector. That is completely, and utterly
irrelevant. Which will still be completely and utterly irrelevant if I
didn't write that.

> I can understand the guy that
> told in the auto-thread that he suffered reading code from someone else
> who overly used auto.

I can also understand that. After being set in one's ways, and coding in pre-
C++11 for your entire working career, learning something new and different,
and adapting to it, can be tough.

But, it can be done. I was in the same boat about eight years ago. But I
plowed through it, and did it. It took a little bit of time, but it was
doable. Not knowing the type of some object was, at first, a fish-out-of-
water experience, but I figured out at some point how, in many cases, it's
really irrelevant and I don't need to know. Now being exposed to both coding
styles, I like the auto style better. But I can still work with the existing
code base that spells everything out. I can happily work with both styles of
existing code.

And if someone can't, I think that someone will, eventually, just take their
job from them, if it comes down to that. It might even be me.

Bonita Montero <Bonita.Montero@gmail.com>: Mar 18 01:19PM +0100

> draw(rectangle);
> Tells me everything I need to know, without the visual noise of whether
> "container" is a set or a vector.

You need to know it.

> "container" is a set or a vector. That is completely, and utterly
> irrelevant. Which will still be completely and utterly irrelevant if I
> didn't write that.

No, that's relevant if you want to understand the code.

[rest of stupid stuff unread]

Sam <sam@email-scan.com>: Mar 18 08:31AM -0400

Bonita Montero writes:

>> Tells me everything I need to know, without the visual noise of whether
>> "container" is a set or a vector.

> You need to know it.

I can decide for myself what I need to know.

>> irrelevant. Which will still be completely and utterly irrelevant if I
>> didn't write that.

> No, that's relevant if you want to understand the code.

I will concede that some people do need to know that, in order to understand
the code.

But others won't.

Sucks to be the former.

Evolution.

Survival of the fittest.

I learned that in grade school.

> [rest of stupid stuff unread]

Because you have no response to industrial strength, 100%, truth.

That one hit too close to home?

Bonita Montero <Bonita.Montero@gmail.com>: Mar 18 02:15PM +0100

>>> whether "container" is a set or a vector.

>> You need to know it.

> I can decide for myself what I need to know.

You can't if you really want to understand the code.

> I will concede that some people do need to know that, in order to
> understand the code.
> But others won't.

If you're involved in the development of the code you need to know it.

> Because you have no response to industrial strength, 100%, truth.

Industrial strength tends to be more readable.
What you suggest is less readable.

Sam <sam@email-scan.com>: Mar 18 10:13AM -0400

Bonita Montero writes:

>>> You need to know it.

>> I can decide for myself what I need to know.

> You can't if you really want to understand the code.

That's just your own limitation, but you are assuming that everyone else
shares the same limited capacity to understand code. That is not the case.

>> Because you have no response to industrial strength, 100%, truth.

> Industrial strength tends to be more readable.
> What you suggest is less readable.

It's less readable to you, and you won't understand it. I'm sorry to hear
that. Someone else, who can read the code, will be happy to help you, if
needed.

Bonita Montero <Bonita.Montero@gmail.com>: Mar 18 03:45PM +0100

>> You can't if you really want to understand the code.

> That's just your own limitation, but you are assuming that everyone else
> shares the same limited capacity to understand code. That is not the case.

There's no limitation on my side. It's simply impossible to understand
the code without the underlying container. So you have to find the
definition or declaration of the container, which might be superfluous
work if there would be a concrete type-definition.

> It's less readable to you, ...

No, to everyone.
No one has a crystal ball to gues the type of the container.

Sam <sam@email-scan.com>: Mar 18 11:36AM -0400

Bonita Montero writes:

>> That's just your own limitation, but you are assuming that everyone else
>> shares the same limited capacity to understand code. That is not the case.

> There's no limitation on my side.

But you just admitted to one. You freely admit that you can't understand any
code that uses a C++ container until you figure out what the container is.
That's a major limitation of one's ability to understand C++ code.

> It's simply impossible to understand
> the code without the underlying container.

For you, it is. Others don't share that limitation. I can understand it. So
pretty much everyone else.

> So you have to find the
> definition or declaration of the container, which might be superfluous
> work if there would be a concrete type-definition.

No, you don't. You don't need to know what the container is when doing

for (const auto &v:container)
{
if (v == 0)
{
++number_of_zeroes;
}
}

If you don't understand what this does, without looking up what container
is, that's only because you don't know C++ very well.

This is a typical question given to a candidate for a C++ developer job, and
if you can't explain what this does, it is unlikely that you will get the
job. You admit that you have absolutely no clue what this does.

I'm sorry to hear that.

>> It's less readable to you, ...

> No, to everyone.

No, just you.

> No one has a crystal ball to gues the type of the container.

Noone really needs to know the type of the container, in this instance.

Can I suggest some good C++ books, for you?

Bonita Montero <Bonita.Montero@gmail.com>: Mar 18 05:59PM +0100

>> There's no limitation on my side.

> ... That's a major limitation of one's ability to understand C++ code.

Everyone without your crystal ball has this limitation.

> ++number_of_zeroes;
> }
> }

v can be a type castable to an integer.

> Noone really needs to know the type of the container, in this instance.

But in most other instances.

Sam <sam@email-scan.com>: Mar 18 01:18PM -0400

Bonita Montero writes:

>>> There's no limitation on my side.

>> ... That's a major limitation of one's ability to understand C++ code.

> Everyone without your crystal ball has this limitation.

Too bad, so sad. I guess you'll have to figure it out without having one,
then. Life's so unfair.

>> }
>> }

> v can be a type castable to an integer.

Or a suitable operator== overload. Please learn a little bit of C++.

>> Noone really needs to know the type of the container, in this instance.

> But in most other instances.

In most other instances almost everyone can figure this out too. Just not
you.

Bonita Montero <Bonita.Montero@gmail.com>: Mar 18 06:27PM +0100

>> v can be a type castable to an integer.

> Or a suitable operator== overload. Please learn a little bit of C++.

If you arguing against yourself there's nothing more to say.

> In most other instances almost everyone can figure this out too.
> Just not you.

I do it like everyone, but this is an unecessary effort.

Sam <sam@email-scan.com>: Mar 18 02:41PM -0400

Bonita Montero writes:

>>> v can be a type castable to an integer.

>> Or a suitable operator== overload. Please learn a little bit of C++.

> If you arguing against yourself there's nothing more to say.

Here are your toys. You can go home now.

>> In most other instances almost everyone can figure this out too.
>> Just not you.

> I do it like everyone, but this is an unecessary effort.

Not really much of an effort. You just have to know C++.

My own encryption program

Frederick Gotham <cauldwell.thomas@gmail.com>: Mar 18 10:59AM -0700

Yesterday I started writing my own encryption program. With all this talk of so-called 'quantum' computing and the possibility that our best algorithms might become brute-forceable, I've decided to start writing something new.

Having worked developing a Linux embedded device with a 32-Bit ARM processor, very limited hard disk space and very limited RAM, I have seen the true convenience and power of programs that are fully-functional when reading from stdin. 'tar' and 'openssl' are good examples as they don't seek on their input even when doing complex tasks.

On one of the devices I'm developing, tar is used to exract a file to stdout, where openssl then decrypts it to stdout, where tar then extracts another inner file. Because of very limited hard disk space and RAM, this wouldn't be possible without the piping capabilities of tar and openssl.

My encryption program will encrypt blocks of 16 bytes at a time, and the scheme will be similar to CBC, however I'll be doing something new which I haven't seen done before (at least not publicly).

One thing I'm taking into account is that I want my encryption and decryption to be /at least/ a third as fast as the mainstream algorithms (Twofish, AES, 3DES). So if AES can encrypt something in 5 minutes, I want mine to be able to do it within 15 minutes.

If I'm going to be encrypting big files then I will want to take the hash of the input as I'm processing it, so that you can check after the encryption procedure that the file wasn't corrupt at some point.

Where possible I will use multiple threads on multiple CPU cores. This actually isn't possible with CBC, but if my idea works as I think it will, then parallel encryption will be possible while getting around the problem of discernable patterns (i.e. the 'Bitmap' problem in encryption).

In 'main', I immediately start a second thread for performing the hashing of the input. This 'hasher' thread will infinitely call the 'consume_all' method on an object of type 'boost::lockfree::spsc_queue'.

The main thread will infinitely read from stdin, and will push 16-byte-data-chunks into the aforementioned spsc_queue.

So, in the main thread, data will be repeatedly read in 16-byte chunks from stdin, and each chunk will be passed imediately to the hasher thread for asychronous processing.

The main thread, working in parallel with the hasher thread, will then encrypt the data and spit it out to stdout. To avoid discernable patterns in the output, this is where the CBC scheme is normally used -- however the problem here is that CBC can only be done by one thread, it cannot be parallelised.

I have two new ideas: One that will allow something similar to CBC that can be done in parallel without creating a pattern, and Another idea that will make it difficult (if not impossible) to perform a brute-force even if a technique is devised of brute-forcing the likes of AES, Twofish, 3DES.

Does anyone have any thoughts on this?

Is spsc_queue suitable for doing the hashing in parallel with the encryption?

Bind const L-value reference to function return value

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Mar 18 07:13AM +0100

On 17.03.2020 19:07, Felix Palmen wrote:
> seems to be something handling filenames. The thing is: A filename on a
> Unix system ist just an array of octets, with no encoding information
> attached or implied.

The "implied" is incorrect.

Today the implied encoding of a Linux filename is UTF-8.

I guess a lot of tools will have difficulties with filenames that are
invalid as UTF-8, even though technically wrt. the OS API they're valid
filenames. 20 years ago they would be good. In that period that changed.

> So it's UTF-8 IFF the file in question was named
> while an UTF-8 locale was in effect.

That's true as far as it goes, which is a micro-meter or so.

> encoded in UTF-16. That seems to be the reason why the Windows
> implementation returns a newly created object: It must convert the
> native Windows filename to an 8-bit string.

Right, but only because the library chooses to use UTF-16 internally in
Windows.

IIRC the original filesystem v3 spec had that as a requirement.

With the standardization, instead the iostreams were outfitted with
constructors taking `std::filesystem::path`.

> recurring PITA, as Windows API calls based on `char` (and also e.g. the
> Standard C library functions like `fopen()`) indeed use a Windows ANSI
> encoding, so information is lost.

No, information is only lost when the process Windows ANSI codepage is
not UTF-8.

Support for UTF-8 Windows ANSI process codepages was added in May last year.

> IMHO the only sane thing to do on
> Windows when you need 8bit strings is to enforce using UTF-8,

That's now, in the last few years, become a good idea.

Because now also the system compiler, Visual C++, supports UTF-8 as
execution character set, as well as default source encoding.

Still worth noting that without setting UTF-8 process codepage

* `main` arguments will be ANSI encoded,
* ditto for `char` based environment variables,
* narrow output to wide streams will be treated as ANSI,
* narrow filenames will be assumed to be ANSI,
* all the locale dependent functionality will be ungood.

> and convert it to/from UTF-16 whenever needed to use exclusively the
> wide-string APIs.

That used to be my advice for Windows.

However, if one doesn't have to support Windows versions earlier than
May 2019, then there is no need to do all this conversion.

- Alf

felix@palmen-it.de (Felix Palmen): Mar 18 08:16AM +0100

> The "implied" is incorrect.

> Today the implied encoding of a Linux filename is UTF-8.

Wrong. The filename is an opaque octet sequence. Linux (as well as other
POSIX systems) starts up in a C locale, where characters with bit 7 set
aren't even defined, and will happily use any other encoding specified
in the locale. So there's nothing implied. "Implied" doesn't mean that a
vast majority of users configures it that way. For a sane program, it's
reasonable to assume that all filenames follow the encoding of the
current locale (cause that's the best thing it can do). It's not
reasonable to just assume UTF-8.

>> encoded in UTF-16.

> Right, but only because the library chooses to use UTF-16 internally in
> Windows.

No, because Windows uses UTF-16 internally and in all API calls for
anything Unicode. The library just decided to follow this design
decision.

>> encoding, so information is lost.

> No, information is only lost when the process Windows ANSI codepage is
> not UTF-8.

Which is almost always the case. Windows always uses its 8bit single
character codepages like eg CP-1252.

> Support for UTF-8 Windows ANSI process codepages was added in May last year.

Sure, but AFAIK you can't configure it system-wide, and, more
importantly, it can't be made the default for the all-sacred backwards
compatibility (and you can't use it anyways if you want your program to
work with slightly older versions of Windows).

> * narrow output to wide streams will be treated as ANSI,
> * narrow filenames will be assumed to be ANSI,
> * all the locale dependent functionality will be ungood.

For all these issues, there are functions in either the Windows API
(like e.g. GetCommandLineW()) or the C runtime (like e.g. _wfopen()). Of
course, this means a bit of conditional compilation is needed for
portability to Windows.

> However, if one doesn't have to support Windows versions earlier than
> May 2019, then there is no need to do all this conversion.

Sure. And that's a good thing. Let's start talking about this in another
5 years. People still use Windows 7 right now, ignoring all sanity...

--
Dipl.-Inform. Felix Palmen <felix@palmen-it.de> ,.//..........
{web} http://palmen-it.de {jabber} [see email] ,//palmen-it.de
{pgp public key} http://palmen-it.de/pub.txt // """""""""""
{pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Mar 18 08:30AM +0100

On 18.03.2020 08:16, Felix Palmen wrote:
>> The "implied" is incorrect.

>> Today the implied encoding of a Linux filename is UTF-8.

> Wrong.

That sort of braindead. We've already discussed this. I hope I will
never use an app made by you.

> The filename is an opaque octet sequence.

Yes, at the OS API level is that.

> reasonable to assume that all filenames follow the encoding of the
> current locale (cause that's the best thing it can do). It's not
> reasonable to just assume UTF-8.

Again, braindead.

> No, because Windows uses UTF-16 internally and in all API calls for
> anything Unicode. The library just decided to follow this design
> decision.

The "No" is meaningless, the rest is correct but doesn't support the "no".

Again, braindead.

>> not UTF-8.

> Which is almost always the case. Windows always uses its 8bit single
> character codepages like eg CP-1252.

You have just been informed otherwise, in the posting you're responding to.

Sounds like irrational denial to me.

>> Support for UTF-8 Windows ANSI process codepages was added in May last year.

> Sure, but AFAIK you can't configure it system-wide,

You can.

You're just making this up as you go, aren't you?

> importantly, it can't be made the default for the all-sacred backwards
> compatibility (and you can't use it anyways if you want your program to
> work with slightly older versions of Windows).

I failed to parse that, sorry.

> (like e.g. GetCommandLineW()) or the C runtime (like e.g. _wfopen()). Of
> course, this means a bit of conditional compilation is needed for
> portability to Windows.

Yes there are workarounds. And it's worth noting.

>> May 2019, then there is no need to do all this conversion.

> Sure. And that's a good thing. Let's start talking about this in another
> 5 years. People still use Windows 7 right now, ignoring all sanity...

You shouldn't talk too loudly about the last word you used there.

- Alf

Jorgen Grahn <grahn+nntp@snipabacken.se>: Mar 18 07:35AM

On Wed, 2020-03-18, Felix Palmen wrote:
> reasonable to assume that all filenames follow the encoding of the
> current locale (cause that's the best thing it can do). It's not
> reasonable to just assume UTF-8.

I agree, but (IIRC, and e.g.) Debian no longer handles bug reports for
tools which hardcode UTF-8. At the same time they ship non-utf8
locales, so there are some contradicitons.

My filenames are in Latin-1 and so is my LC_CTYPE. There are a few
tools which have problems with that. This may originate in some GUI
toolkit misfeature.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

felix@palmen-it.de (Felix Palmen): Mar 18 09:39AM +0100

* Alf P. Steinbach <alf.p.steinbach+usenet@gmail.com>:
[bullshit]

*plonk*

--
Dipl.-Inform. Felix Palmen <felix@palmen-it.de> ,.//..........
{web} http://palmen-it.de {jabber} [see email] ,//palmen-it.de
{pgp public key} http://palmen-it.de/pub.txt // """""""""""
{pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A

felix@palmen-it.de (Felix Palmen): Mar 18 09:45AM +0100

> I agree, but (IIRC, and e.g.) Debian no longer handles bug reports for
> tools which hardcode UTF-8. At the same time they ship non-utf8
> locales, so there are some contradicitons.

I fail to see the contradiction here? Tools hardcoding UTF-8 will fail
in non-utf8 locales, so isn't it just consequent to not support them?

> My filenames are in Latin-1 and so is my LC_CTYPE. There are a few
> tools which have problems with that. This may originate in some GUI
> toolkit misfeature.

I don't see any practical reason any more *not* to use UTF-8 nowadays,
apart from maybe having to deal with some (commercial?) Unix systems
that still don't deal well with it. But that's a personal decision, and
any software blindly assuming UTF-8 while other encodings are still
supported by the OS is just broken.

--
Dipl.-Inform. Felix Palmen <felix@palmen-it.de> ,.//..........
{web} http://palmen-it.de {jabber} [see email] ,//palmen-it.de
{pgp public key} http://palmen-it.de/pub.txt // """""""""""
{pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A

Jorgen Grahn <grahn+nntp@snipabacken.se>: Mar 18 09:36AM

On Wed, 2020-03-18, Felix Palmen wrote:
>> locales, so there are some contradicitons.

> I fail to see the contradiction here? Tools hardcoding UTF-8 will fail
> in non-utf8 locales, so isn't it just consequent to not support them?

The contradiction would be shipping non-utf-8 locales, but not
handling bugs activated by those locales. But note that I haven't
investigated this in detail. I don't have a reference at hand for "no
longer handles bug reports", and I see the Debian Reference only
/recommends/ UTF-8.

> I don't see any practical reason any more *not* to use UTF-8 nowadays,
> apart from maybe having to deal with some (commercial?) Unix systems
> that still don't deal well with it.

My reasons are offtopic here, but basically I don't want to go through
and modify my archived data.

> But that's a personal decision, and
> any software blindly assuming UTF-8 while other encodings are still
> supported by the OS is just broken.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

James Kuyper <jameskuyper@alumni.caltech.edu>: Mar 18 10:02AM -0400

On 3/18/20 3:30 AM, Alf P. Steinbach wrote:
> On 18.03.2020 08:16, Felix Palmen wrote:
>> * Alf P. Steinbach <alf.p.steinbach+usenet@gmail.com>:
...
>>> Support for UTF-8 Windows ANSI process codepages was added in May last year.

>> Sure, but AFAIK you can't configure it system-wide,

> You can.

I realize that the two of you are enjoying insulting each other, so I'm
not sure you'd appreciate this interruption. However, this particular
issue is a simple matter of fact: it's either true or false, and can be
verified by anyone with a sufficiently up-to-date version of Windows.
That being the case, a more useful response would have explained
precisely what you need to do to configure it system-wide. Then he would
either have to concede that it is possible to do so, or report that it
doesn't work.
He could also change his mind, and instead of saying it can't be done,
say that there's some disadvantage to doing so. Since the importance of
such a disadvantage is entirely a matter of judgement rather than fact,
would allow the two of you to go on arguing about it.

James Kuyper <jameskuyper@alumni.caltech.edu>: Mar 18 10:11AM -0400

On 3/18/20 3:16 AM, Felix Palmen wrote:
> * Alf P. Steinbach <alf.p.steinbach+usenet@gmail.com>:
...
>> not UTF-8.

> Which is almost always the case. Windows always uses its 8bit single
> character codepages like eg CP-1252.

The only things I use Windows for don't require use of anything other
than 7-bit ASCII, so I'm not at all familiar with such issues. But I'm
having trouble figuring out those two sentences. It seems to me that
they address different aspects of the same thing. Therefore, I would
expect either both sentences to use "almost always", or both sentences
to use "always". How can the codepage "almost always" be non-UTF8, but
"always" be 8bit single character codepages?

felix@palmen-it.de (Felix Palmen): Mar 18 03:28PM +0100

> expect either both sentences to use "almost always", or both sentences
> to use "always". How can the codepage "almost always" be non-UTF8, but
> "always" be 8bit single character codepages?

Sloppy phrasing from my side, sorry. Setting a language in Windows, the
8bit character set is *always* set to something like e.g. CP-1252 (which
is a single-byte character set similar to iso-8859-1). AFAIK, in some
special cases, double-byte encodings are used as well (chinese etc), but
not UTF-8.

The "almost always" slipped in because there actually *is* an UTF-8
codepage available in newer versions of Windows, which the application
might use. As this is only available in newer versions and isn't default
either, it's useless in practice (which might change in the future of
course). Even if they really added a way to set it system-wide recently,
it will take years until you can safely assume you can use it when
targeting windows.

Until then, in a portable program using 8bit characters, you'll probably
continue to do a lot of conversions when building for Windows.

--
Dipl.-Inform. Felix Palmen <felix@palmen-it.de> ,.//..........
{web} http://palmen-it.de {jabber} [see email] ,//palmen-it.de
{pgp public key} http://palmen-it.de/pub.txt // """""""""""
{pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A

cdalten@gmail.com: Mar 18 08:40AM -0700

FIGHT FIGHT

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Wednesday, March 18, 2020

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

No comments:

Blog Archive

About Me