soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

How to write wide char string literals? - 4 Updates
[ OT ] C - Open Standards - 8 Updates
Trying to understand pointers. Why does this give unexpected results? - 1 Update

Juha Nieminen <nospam@thanks.invalid>: Jul 08 08:11AM

>> from the one you wanted. You essentially get garbage.

> You got precisely what you specified - if it's not what you wanted, you
> need to change your specification.

No, I didn't. I wanted a way to specify wide string literals, and that
solution was incorrect.

Juha Nieminen <nospam@thanks.invalid>: Jul 08 08:13AM

> Add the solution for the readability is to just write the code as native
> literals, but NOT as the actual C++ file, and have a filter stage that
> translates this file into the actual C++ code with the escapes.

Clearly you have never written unit tests.

James Kuyper <jameskuyper@alumni.caltech.edu>: Jul 08 05:52AM -0400

On 7/8/21 4:11 AM, Juha Nieminen wrote:
>> need to change your specification.

> No, I didn't. I wanted a way to specify wide string literals, and that
> solution was incorrect.

Paavo Helde's solution of using "\xC2\xA9" was correct for narrow string
literals (on systems with CHAR_BIT==8, a requirement that he didn't
bother mentioning). He was relying upon a UTF-8 => UTF-16 conversion
routine of his own creation to get the corresponding wide string.

You asked whether L"\xC2\xA9" would work, and the answer is "No",
because it specifies two wide characters when only one is desired. You
were aware that it wouldn't work, but seemed to be suggesting that
there's a potentially faulty UTF-8=>UTF-16 conversion involved in it's
failure to be correct. There is no such conversion. L"\xC2\xA9"
specifies directly a wchar_t array of length 3 initialized with {0xC2,
0xA9, 0}, which is not what you wanted.

I initially didn't address that point properly because I hadn't realized
that only one character was desired.
However, u"\xA9" or U"\xA9" would work fine; L"\xA9" should produce the
desired result on systems where wchar_t uses UCS2 or UCS4 (==UTF-32)
encoding.

James Kuyper <jameskuyper@alumni.caltech.edu>: Jul 08 03:56PM -0400

On 7/3/21 10:28 AM, Alf P. Steinbach wrote:
>> source character set would prevent those escapes from working is not.

> As far as I know nobody's argued that the source encoding assumption
> would prevent any escapes from working.

You said "It gets the wrong characters in the wide string literal,
period.", and other parts of the discussion implicated source encoding
assumptions as the reason why. The use of "period" implies no
exceptions, and there's a very large set of exceptions: at least two, as
as many as four, fully portable working escape sequences for every
single Unicode code point.

> [<<]

> Which it decidedly does.

> It's trivial to just try it out and see; QED.

I did try it: as he said, it can get the wrong character if the string
type isn't unicode encoded, and as I pointed out, it can also get the
wrong character if the wrong escape sequence is used (which seems
trivially obvious). But it's perfectly capable of giving the right
characters when the right escape sequence is used with a prefix that
mandates a unicode encoding.

By saying "... it gets the wrong characters ... period.", you were
denying that it's ever possible for it to get the right characters,
which is demonstrably false. I've tried out the sequences I specified in
the message you quoted above. They all work on my systems, and according
to my understanding of the standard, they're required to work on all
fully conforming implementations, regardless of source encoding
assumptions - if that's not the case, I want to know how the exceptions
can be justified.

...
> sequences (including universal character designators) are affected by
> the source encoding, but to me it has been about whether Juha's example
> yields the desired string, as he correctly surmised that it didn't.

Yes, but that's because it was the wrong escape sequence, not because
there's any inherent problem with using correct escape sequences for
that purpose.

[ OT ] C - Open Standards

Real Troll <real.troll@trolls.com>: Jul 08 01:00AM +0100

I have managed to find direct links to the official standard and they
are here:

<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1336.pdf>
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf>
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf>

I am not sure if there are any official standards after n1336.pdf.
Perhaps there are or perhaps there aren't unless you pay for them. Let
me know if there are any for free use.

Microsoft has defined what Open Standard Means:

Real Troll <real.troll@trolls.com>: Jul 08 01:20AM +0100

On 08/07/2021 01:00, Real Troll wrote:

I have now found the official download link to "ISO/IEC 9899:2018". The
link is here:

<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf>

"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Jul 08 03:53AM +0200

On 8 Jul 2021 02:00, Real Troll wrote:
>> parties and operate on a consensus basis. _*An open standard is
>> publicly available*_, and developed, approved and maintained via a
>> collaborative and consensus driven process.

N1256 (in your list) is the amalgamated C99 + TC1 + TC2 + TC3 document,
very nice.

I believe N1570 (not in your list) was the last draft of C11.

- Alf

Real Troll <real.troll@trolls.com>: Jul 08 02:30AM

On 08/07/2021 02:53, Alf P. Steinbach wrote:

> N1256 (in your list) is the amalgamated C99 + TC1 + TC2 + TC3
> document, very nice.

> I believe N1570 (not in your list) was the last draft of C11.

OK Thanks for informing about N1570. I have found the official download
link so the complete list is as follows:

<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf>
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf>
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1336.pdf>
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf>
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf>

Please let us know if anything else is missing from the list. The next
standard is 23xx and it won't be approved until 2023 at the latest
unless something drastic happens in the interim.

David Brown <david.brown@hesbynett.no>: Jul 08 08:46AM +0200

On 08/07/2021 02:00, Real Troll wrote:

> I am not sure if there are any official standards after n1336.pdf.
> Perhaps there are or perhaps there aren't unless you pay for them. Let
> me know if there are any for free use.

That's a useful list - thanks.

>> parties and operate on a consensus basis. _*An open standard is
>> publicly available*_, and developed, approved and maintained via a
>> collaborative and consensus driven process.

As usual, Microsoft has a somewhat different definition from other
people...

"Open standard" usually means that the standard is /available/ to anyone
who wants it - but not necessarily for free. There are a great many
open standards that are only available for a fee, or if you join the
relevant group. "Open" in this context means that anyone can get the
standards - there are no restrictions by country, company, contract,
etc. This also applies to the C and C++ standards, which are published
by ISO - anyone can get the standards, but you have to pay for them.
What is unusual (but /very/ nice) is that the ISO working groups here
publish their drafts at zero cost.

Juha Nieminen <nospam@thanks.invalid>: Jul 08 08:21AM

> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1336.pdf>
> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf>
> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf>

Note that something being directly available for download, even if it's
hosted at the IP owners' own servers, doesn't make it somehow automatically
legal to download if the documents are under a commercial license.
Making something available without technical barriers is not in itself
any sort of implicit free license.

(I don't know if those documents are commercial. Merely pointing out
that fact.)

Philipp Klaus Krause <pkk@spth.de>: Jul 08 10:31AM +0200

Am 08.07.21 um 10:21 schrieb Juha Nieminen:
> any sort of implicit free license.

> (I don't know if those documents are commercial. Merely pointing out
> that fact.)

I understand that the WG14 / ISO copyright situation can be somewhat
complicated (and in the past ISO expressed some dislike about the
existance of that WG14 website). On the other hand, what you write would
hold for any text, website, etc, which is kind of impractical (how do I
know then I am allowed to read your message that I'm replying to here?).

Anyway, those N documents are not meant to be hidden by WG14. There is a
list of them
(http://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm),
which is linked from the WB14 website
(http://www.open-std.org/jtc1/sc22/wg14/).
AFAIK, it is disputed who own the copyright to the individual N
documents there, and it might even differ by legislation (in particular
there might be US vs. EU law differences).

Bo Persson <bo@bo-persson.se>: Jul 08 12:19PM +0200

On 2021-07-08 at 10:21, Juha Nieminen wrote:
> any sort of implicit free license.

> (I don't know if those documents are commercial. Merely pointing out
> that fact.)

I am not a lawyer :-), but these papers are not official ISO documents,
so no commercial license.

Especially humorous is n1256. ISO official documents are the C99
official standard, plus three separate corrigenda - TC1, TC2, and TC3.
ISO never published a "corrected" standard, just these four separate
documents.

In preparation for the C11 work, the committee then produced a "working
draft" with the TCs applied to the C99 standard. You need to have a base
document, right? And arguably a lot better than the official one, as the
bugs have been removed.

However, ISO never published this intermediate version, only the
completed C11 standard.

Trying to understand pointers. Why does this give unexpected results?

Juha Nieminen <nospam@thanks.invalid>: Jul 08 08:17AM

> ptrA = &a;
> ptrB = &b;
> ptrC = &c;

I'm genuinely wondering why you are writing it like that, instead of the
simpler:

int *ptrA = &a;
float *ptrB = &b;
char *ptrC = &c;

> cout << "value of c: " << c << "; address of c: " << ptrC << endl;

A char* pointer is overloaded to print the string pointed to by that pointer,
so that will severely malfunction. You can cast it to void* instead:

std::cout << static_cast<void*>(ptrC) << std::endl;

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Thursday, July 8, 2021

Digest for comp.lang.c++@googlegroups.com - 13 updates in 3 topics

No comments:

Blog Archive

About Me