Friday, February 8, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 5 topics

"Öö Tiib" <ootiib@hot.ee>: Feb 08 12:51AM -0800

On Thursday, 7 February 2019 18:46:53 UTC+2, Marcel Mueller wrote:
 
> OK, I can write a template helper function to do a safe conversion from
> std::array<T,N>& to T(&)[N]. But this can no longer be constexpr because
> of the cast.
 
Write helper function template that takes reference to
implementation-specific member (like "__elems_" in libcpp)
directly. If you target multiple platforms then choose what
it's name is with preprocessor. That member is required to
exist and to be public by standard.
 
There are only handful of viable standard library implementations
and these don't rename the member too often so it will take half
an hour to get majority covered. The only corner case is
when you have zero-sized std::arrays (those are allowed to lack
that member) but otherwise it is kind of piece of cake.

> Why does std::array<T,N>::data not return T(&)[N] which implicitly
> converts to T* if the length information is not needed? The internal
> storage is of type T[N] anyway.
 
It is issue 930 with status "not a defect".
https://cplusplus.github.io/LWG/issue930
So it won't change anytime soon and therefore I would go with what
I suggested above.
James Kuyper <jameskuyper@alumni.caltech.edu>: Feb 08 07:59AM -0500

On 2/8/19 03:51, Öö Tiib wrote:
> directly. If you target multiple platforms then choose what
> it's name is with preprocessor. That member is required to
> exist and to be public by standard.
 
Citation, please? 26.3.7.1p3 doesn't list any public data members for
std::array<>, as far as I can see, and neither do any of the general
container requirements (tables 83, 84, 85, 86, and 87).
 
> https://cplusplus.github.io/LWG/issue930
> So it won't change anytime soon and therefore I would go with what
> I suggested above.
 
That issue mentions std::array::elems, which was marked "for exposition
only". "for exposition only" means precisely that - it does NOT mandate
that std::array be implemented in that manner, only that it have the
same observable behavior as if it did so. The identifier "elems" is not
present anywhere in the n4659.pdf, the latest draft version of the C++
standard that I have on my desktop machine, essentially equivalent to
C++2017. Perhaps the committee decided that it was misleading to explain
the behavior of std::array in terms of elems?
"Öö Tiib" <ootiib@hot.ee>: Feb 08 06:16AM -0800

On Friday, 8 February 2019 15:00:02 UTC+2, James Kuyper wrote:
 
> Citation, please? 26.3.7.1p3 doesn't list any public data members for
> std::array<>, as far as I can see, and neither do any of the general
> container requirements (tables 83, 84, 85, 86, and 87).
 
The std::array is required to be aggregate and requirements of
aggregate forbid private or protected non-static data members.
Sure, in theory it may be purely magical thing turned on by
incantation "#include <array>" but such implementations
don't exist.
 
> standard that I have on my desktop machine, essentially equivalent to
> C++2017. Perhaps the committee decided that it was misleading to explain
> the behavior of std::array in terms of elems?
 
Yes, the name of that member is nowhere required but we can aggregate-
initialize it so only thing to do is to look what it's name is up from library
code.
Marcel Mueller <news.5.maazl@spamgourmet.org>: Feb 08 05:25PM +0100

Am 08.02.19 um 09:51 schrieb Öö Tiib:
> an hour to get majority covered. The only corner case is
> when you have zero-sized std::arrays (those are allowed to lack
> that member) but otherwise it is kind of piece of cake.
 
Hmm, not that pretty. Especially for open source software which may be
compiled on quite different platforms, at least ones that I cannot test.
 
For now I don't rely on constexpr, so I can live with the cast.
 
 
Marcel
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 08 07:51AM +0100

On 07.02.2019 22:08, Daniel wrote:
>> // Source encoding: UTF-8 with BOM
 
> UTF-8 with BOM? In [rfc8259](https://tools.ietf.org/html/rfc8259), there are only two occurrences of the key word "MUST NOT", one to define the term, and one to state "Implementations MUST NOT add a byte order mark (U+FEFF) to the
> beginning of a networked-transmitted JSON text."
 
When in Rome, do as the Romans.
 
Still, the RFC that allows JSON implementations to be
non-standard-conforming in their treatment of UTF-8, allowing them to
treat a BOM as an error, is necessarily a statement of politics, the
aspect of belonging to a social group that at least partially is bound
together by the adoption of a collection of zealot ideas, and not sound
engineering.
 
I'd let Donald Trump whip those idiot 1990s-Linux fanbois, if I could.
 
 
Cheers!,
 
- Alf
 
¹ Quote: "implementations that parse JSON texts MAY ignore the presence
of a byte order mark rather than treating it as an error."
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 08 08:01AM +0100

On 07.02.2019 22:56, David Brown wrote:
>> `CPPX_ASCII_PLEASE` above. One would not want 65 536 library variants.
 
>> So, what's the new way to do things?
 
> Can it be done with "if constexpr (CPPX_ASCII_PLEASE) " ?
 
I don't think so.
 
 
> Can you have template constexpr variables?
 
Yes, but the point is that the same client code may need to be compiled
for an environment where the UTF-8 encoded symbols won't work.
 
 
> Put them in a templated struct?
 
That's a good idea, thanks. One might conceivably provide a compile time
choice constant as a custom configuration module. It would be a hack
like thing, much like in current code leaving the definition of a
function to client code (or provide a weakly linked default), but could
indicate a direction for a language supported standard solution, maybe?
 
Anyway it's a Better Way™. :)
 
 
> Have them both as two namespaces, and let the module user pick the
> namespace they want?
 
Nope.
 
 
> Scrap all encodings except proper UTF-8 (which has no BOM)?
 
Proper UTF-8 supports BOM, and it's still necessary for portable code.
When or if Microsoft sees fit to equip Visual Studio with a GUI way to
select `/utf-8` option for the source code, one may start thinking about
using BOM-less UTF-8 also for portable code. Until then such source
code, if it contains text literals, is likely to be misinterpreted in a
Visual Studio project.
 
 
 
> I don't know if any of these solutions are possible - I haven't looked
> at modules in detail.  But perhaps they will put you onto a solution
> that would work.
 
The templated struct thing sounds good. Or, well, not ideal, but much
gooder. :)
 
 
Cheers!,
 
- Alf
Daniel <danielaparker@gmail.com>: Feb 08 01:25AM -0800

On Friday, February 8, 2019 at 1:51:26 AM UTC-5, Alf P. Steinbach wrote:
> aspect of belonging to a social group that at least partially is bound
> together by the adoption of a collection of zealot ideas, and not sound
> engineering.
 
From an engineering point of view, BOM's serve no purpose. But as a
practical matter, they exist, so it behooves parsers to ignore them.
 
Daniel
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 08 11:00AM +0100

On 08.02.2019 10:25, Daniel wrote:
>> together by the adoption of a collection of zealot ideas, and not sound
>> engineering.
 
> From an engineering point of view, BOM's serve no purpose.
 
You state, in the form of an incorrect assertion that presumably I
should rush to correct (hey, someone's wrong on the internet!), that you
don't know any purpose of a BOM.
 
OK then.
 
A BOM serves two main purposes:
 
* It identifies the general encoding scheme (UTF-8, UTF-16 or UTF-32),
with high probability.
* It identifies the byte order for the multibyte unit encodings.
 
Since its original definition was a zero-width space it can be treated
as removable whitespace, and AFAIK that was the original intent.
 
As a practical matter, for JSON data it's probably best omitted in data
one produces and accepted in data one receives. Be strict in what one
produces, lenient in what one accepts. Where the strictness is relative
to established standards or conventions for the relevant usage.
 
For C++ source code, if one wants GUI-oriented Visual Studio users to be
able to use the source code with a correct interpretation of the source
code bytes, e.g. of a • bullet symbol in a literal, then a BOM is
required. Otherwise the Visual C++ compiler will assume Windows
ANSI-encoding. Here the usage has the opposite convention of that for
JSON data.
 
Users like me can of course configure the compiler via textual command
line options, but most users, in my experience, don't delve into that.
 
And the idea of creating something that doesn't work by default, when it
could easily work by default, is IMO brain-dead, from incompetents. It
gets worse when those folks /insist/ that /others/ should create things
that don't work by default. That's why IMO they should be subjected to
Donald Trump's whips, if practically possible, to teach them a little.
 
 
> But as a
> practical matter, they exist, so it behooves parsers to ignore them.
 
Yes, agreed. :)
 
 
Cheers!,
 
- Alf
"Öö Tiib" <ootiib@hot.ee>: Feb 08 02:19AM -0800

On Friday, 8 February 2019 11:25:22 UTC+2, Daniel wrote:
> > engineering.
 
> From an engineering point of view, BOM's serve no purpose. But as a
> practical matter, they exist, so it behooves parsers to ignore them.
 
DirtyUTF8-to-UTF8 converter/validator should erase those. One
needs validator in external interfaces anyway since new naïve
programmers keep sending wrong stuff in supposedly UTF8 stream
(like that double dot "ï" of their "naïvety" in Windows-1250) forever.
Such messages have to be rejected and defective products blamed.
Ralf Goertz <me@myprovider.invalid>: Feb 08 12:27PM +0100

Am Fri, 8 Feb 2019 11:00:59 +0100
 
> A BOM serves two main purposes:
 
> * It identifies the general encoding scheme (UTF-8, UTF-16 or UTF-32),
> with high probability.
 
BOM for UTF-N with N>8 is fine IMHO. But as I understand it UTF-8 is
meant to be as compatible as possible with ASCII. So if you happen to
have a »UTF-8« file that doesn't contain any non-ASCII characters then
why should it have a BOM? This can easily happen, e.g. when you decide
to erase those fancy quotation marks I just used and replace them with
ordinary ones like in "UTF-8". Suddenly, the file is pure ASCII but has
an unnecessary BOM. If the file contains non-ASCII characters you'll
notice that soon enough. My favourite editor (vim) is very good at
detecting that without the aid of BOMs and I guess others are, too.
 
And BOMs can be a burden, for instance when you want to quickly
concatenate two files with "cat file1 file2 >ouftile". Then you end up
with a BOM in the middle of a file which doesn't conform to the
standard AFAIK.
 
> * It identifies the byte order for the multibyte unit encodings.
 
As I said, for those BOMs are fine.
 
> Since its original definition was a zero-width space it can be
> treated as removable whitespace, and AFAIK that was the original
> intent.
 
But they increase the file size which can cause problems (in the above
mentioned case of an ASCII only UTF-8 file). I really don't understand
why UTF-8 has not become standard on Windows even after so many years of
it's existence.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Feb 08 01:45PM +0100

On 08.02.2019 12:27, Ralf Goertz wrote:
> to erase those fancy quotation marks I just used and replace them with
> ordinary ones like in "UTF-8". Suddenly, the file is pure ASCII but has
> an unnecessary BOM.
 
It's not unnecessary if the intent is to further edit the file, because
then it says what encoding should better be used with this file.
 
Otherwise I'd just save as pure ASCII.
 
Done.
 
 
> If the file contains non-ASCII characters you'll
> notice that soon enough. My favourite editor (vim) is very good at
> detecting that without the aid of BOMs and I guess others are, too.
 
Evidently vim doesn't have to relate to many Windows ANSI encoded files,
where all byte sequences are valid.
 
It's possible to apply statistical measures over large stretches of
text, but these are necessarily grossly inefficient compared to just
checking three bytes, and that efficiency versus inefficiency counts for
tools such as compilers.
 
For an editor that loads the whole file anyway, and also has an
interactive user in front that can guide it, maybe it doesn't matter.
 
 
> concatenate two files with "cat file1 file2 >ouftile". Then you end up
> with a BOM in the middle of a file which doesn't conform to the
> standard AFAIK.
 
Binary `cat` is a nice tool when it's not misapplied.
 
I guess the argument, that you've picked up from somebody else, is that
it's plain impossible to make a corresponding text concatenation tool.
 
 
>> intent.
 
> But they increase the file size which can cause problems (in the above
> mentioned case of an ASCII only UTF-8 file).
 
Not having the BOMs for files intended to be used with Windows tools,
causes problems of correctness.
 
In the above mentioned case the "problem" of /not forgetting the
encoding/ sounds to me like turning black to white and vice versa.
 
I'd rather /not/ throw away the encoding information, and would see the
throwing-away, if that were enforced, as a serious problem.
 
 
> I really don't understand
> why UTF-8 has not become standard on Windows even after so many years of
> it's existence.
 
As I see it, a war between Microsoft and other platforms, where they try
their best to subtly and not-so-subtly sabotage each other.
 
Microsoft does things like not supporting UTF-8 in Windows consoles
(input doesn't work at all for non-ASCII characters), and not supporting
UTF-8 locales in Windows, hiding the UTF-8 sans BOM encoding far down in
a very long list of useless encodings in the VS editor's GUI for
encoding choice, letting it save with system-dependent Windows ANSI
encoding by default, and even (Odin save us!) using that as the default
basic execution character set in Visual C++ -- a /system dependent/
encoding as basic execution character set.
 
*nix-world folks do things such as restricting the JSON format, in newer
version of its RFC, to UTF without BOM, permitting a BOM to be treated
as an error.
 
Very political, as I see it.
 
Not engineering.
 
 
Cheers!,
 
- Alf
Manfred <noname@add.invalid>: Feb 08 02:18PM +0100

On 2/8/2019 11:00 AM, Alf P. Steinbach wrote:
>>> aspect of belonging to a social group that at least partially is bound
>>> together by the adoption of a collection of zealot ideas, and not sound
>>> engineering.
 
Not really - from RFC 3629 (the one that directly matters to UTF-8,
unlike 8259):
> line options, but most users, in my experience, don't delve into that.
 
> And the idea of creating something that doesn't work by default, when it
> could easily work by default, is IMO brain-dead, from incompetents.
 
You have a good point, but it depends on what you mean by "work by
default" - it depends on the environment.
If you are in a Microsoft context, since they use BOM everywhere (even
where they shouldn't), then you are right.
 
I ran into this with XML. For this, RFC 3629 is very clear:
> A protocol SHOULD also forbid use of U+FEFF as a signature for
> those textual protocol elements for which the protocol provides
> character encoding identification mechanisms, ...
 
Still, if you want to embed an XML resource with Visual Studio, it
forces the BOM into it.
In this context, since the standard is so clear, I would say that "work
by default" would mean without inserting a BOM.
 
That said, I find your choice, to make the source code encoding clear
with a comment in the very first line, to be clear and appropriate.
 
If I would consider a theoretical alternative, it would be pure ASCII
source code with UTF-8 characters explicitly escaped in text strings:
since I don't like Unicode characters in identifiers (so no need for
UTF-8 for pure code), this would make it clear which text 'resources'
require UTF-8, and avoid the confusion of similar-looking characters in
favor of numeric code.
But, since this is quite impractical with tooling currently available,
it's just a theory.
 
It
Manfred <noname@add.invalid>: Feb 08 02:26PM +0100

On 2/8/2019 12:27 PM, Ralf Goertz wrote:
> I really don't understand
> why UTF-8 has not become standard on Windows even after so many years of
> it's existence.
 
From one side, recalling what Alf said, Microsoft has an history of
doing its best to be compatible with itself only.
From what I read around, there is even a culture traditionally
established within Microsoft to avoid anything that is not "invented by
Microsoft"
 
More practically, they also have to keep backwards compatibility with
the huge legacy they have, both with existing code base all around the
world, and existing executables that are tied to their existing
standard, so they're stuck with UTF-16 (or UCS-2, whatever it is).
jameskuyper@alumni.caltech.edu: Feb 08 06:26AM -0800

On Friday, February 8, 2019 at 2:01:48 AM UTC-5, Alf P. Steinbach wrote:
...
> Proper UTF-8 supports BOM, and it's still necessary for portable code.
 
Could you explain that, in light of Manfred's citation of
 
> > textual protocol elements that the protocol mandates to be always
> > UTF-8, the signature function being totally useless in those
> > cases.
 
If a protocol mandates that an element must always be UTF-8, What
benefit would be gained by using U+FEFF as a signature?
I can understand the use of U+FEFF in UTF-16, because that determines
how subsequent bytes of the encoded text are to be interpreted, but
that's not the case for UTF-8.
"Öö Tiib" <ootiib@hot.ee>: Feb 08 06:50AM -0800

> I can understand the use of U+FEFF in UTF-16, because that determines
> how subsequent bytes of the encoded text are to be interpreted, but
> that's not the case for UTF-8.
 
It can be that you are talking about different things. Alf seems to talk about
"text files" where BOM indeed might help to figure out what kind of "text file"
it is. You seem to talk about UTF8 field in transmission protocol.
There BOM is useless not-an-error.
Daniel <danielaparker@gmail.com>: Feb 08 07:06AM -0800

On Friday, February 8, 2019 at 5:01:11 AM UTC-5, Alf P. Steinbach wrote:
 
> * It identifies the general encoding scheme (UTF-8, UTF-16 or UTF-32),
> with high probability.
 
> * It identifies the byte order for the multibyte unit encodings.
 
On the contrary, it's utterly redundant. Given the first four bytes of
UTF-8, UTF-16(LE), UTF-16(BE), UTF-32(LE), or UTF-32(BE) encoded text, the
encoding can be detected with equal reliability.
 
There was a very long discussion on the mailing list leading up RFC 8259
whether the json data interchange specification should provide a statement
of that algorithm (there are some subtleties), but it was dropped when it
was decided to restrict data interchange to UTF8 only.
 
Daniel
jameskuyper@alumni.caltech.edu: Feb 08 07:31AM -0800

On Friday, February 8, 2019 at 9:51:07 AM UTC-5, Öö Tiib wrote:
> "text files" where BOM indeed might help to figure out what kind of "text file"
> it is. You seem to talk about UTF8 field in transmission protocol.
> There BOM is useless not-an-error.
 
I deliberately worded my message to match the wording of RFC 3629. It
seems to me that it's perfectly feasible for a protocol, in the most
general sense of the term (I don't know whether RFC 3629 uses it in that
sense), to include a textual element stored in a file. If that protocol
mandates that the file always be encoded in UTF-8, then the clause from
RFC 3629 quoted above is applicable.
Manfred <noname@add.invalid>: Feb 08 04:44PM +0100

> sense), to include a textual element stored in a file. If that protocol
> mandates that the file always be encoded in UTF-8, then the clause from
> RFC 3629 quoted above is applicable.
 
There has been some mix-up between generic files and specific protocols
in this thread:
The piece of mine you quoted was in reply to a comment by Alf about
JSON. JSON is a protocol (either as network transfer or file format),
and since the JSON spec dictates UTF-8, then a BOM should not be there.
 
Back to the original post from Alf, this is about source code files, so
there is no specific protocol defined.
 
Finally, in the context of how Visual Studio handles project items, I
mentioned XML files, for which an established protocol/format does
exist, and should have no BOM as well, but VS keeps adding one.
Daniel <danielaparker@gmail.com>: Feb 08 07:54AM -0800

On Friday, February 8, 2019 at 9:51:07 AM UTC-5, Öö Tiib wrote:
 
> It can be that you are talking about different things. Alf seems to talk about
> "text files" where BOM indeed might help to figure out what kind of "text file"
> it is.
 
BOM adds nothing over a detection mechanism, it's redundant. Unicode
encodings UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32LE, and UTF-32BE
can be detected by inspecting the first octets for zeros. Just two bytes
will do in some cases; at most four are needed.
 
Daniel
Daniel <danielaparker@gmail.com>: Feb 08 08:04AM -0800

On Friday, February 8, 2019 at 10:55:02 AM UTC-5, Daniel wrote:
> encodings UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32LE, and UTF-32BE
> can be detected by inspecting the first octets for zeros. Just two bytes
> will do in some cases; at most four are needed.
 
Apologies, my mind is on JSON files, where the first character is always US-
ASCII. My comments above aren't applicable to other kinds of text files.
 
Daniel
Daniel <danielaparker@gmail.com>: Feb 08 08:07AM -0800

On Friday, February 8, 2019 at 10:07:04 AM UTC-5, Daniel wrote:
 
> On the contrary, it's utterly redundant. Given the first four bytes of
> UTF-8, UTF-16(LE), UTF-16(BE), UTF-32(LE), or UTF-32(BE) encoded text, the
> encoding can be detected with equal reliability.
 
Apologies, my mind is on JSON files, where the first character is always US-
ASCII, and hence a detection mechanism is possible. That observation doesn't
apply to other kinds of text files.
 
Daniel
mvorbrodt@gmail.com: Feb 08 06:24AM -0800

Thanks to many suggestions of people on here and reddit here's the event object implementation I came up with: http://blog.vorbrodt.me/?p=556
woodbrian77@gmail.com: Feb 07 05:33PM -0800

I've heard how ninja is awesome so recently I decided
to check it out through the lens of my project:
https://github.com/Ebenezer-group/onwards
. I have 4 executables that I build there. With just
one core, ninja runs a lot faster than make: approximately
4 seconds with ninja vs 7 seconds with make. But if I use
-j4 with both builds, now ninja is over 10% slower than make.
I'm using meson to create the ninja files so maybe there's
something that could be done to improve things for ninja,
but it looks to me like make still has some life in it.
Rather than replacing my makefiles with ninja files, I'm
going to keep both.
 

 
Brian
Ebenezer Enterprises - Enjoying programming again.
http://webEbenezer.net
"Öö Tiib" <ootiib@hot.ee>: Feb 08 01:49AM -0800

> but it looks to me like make still has some life in it.
> Rather than replacing my makefiles with ninja files, I'm
> going to keep both.
 
Viable products that compile with less than half an hour
on single core are getting rare. Only simpler embedded
stuff is such and one won't use ninja for those. The unit
tests typically take somewhat longer to run than compiling.
Do you have unit tests?
 
If to target multiple platforms then CMake almost works
only platforms that will suck belong to Apple and that is
as expected ... who targets those should have double
bigger budgets too.
"Öö Tiib" <ootiib@hot.ee>: Feb 08 01:19AM -0800

On Friday, 8 February 2019 00:33:56 UTC+2, bitrex wrote:
 
> where the Logo-language-stuff is evaluated by the compiler and at
> run-time the actual image is generated by a sequence of primitive
> drawLine calls to a low-level API.
 
The stateful metaprogramming has been forbidden with verdict "arcane".
https://wg21.cmeerw.net/cwg/msg5556
With gcc and clang it doesn't work anymore and with msvc it never did.
Research something else. ;)
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: