soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Unicode test - 22 Updates
n-ary roots from complex numbers... - 3 Updates

Jorgen Grahn <grahn+nntp@snipabacken.se>: Apr 12 06:03AM

On Fri, 2017-04-07, David Brown wrote:
> On 07/04/17 14:19, Alf P. Steinbach wrote:
...
>> since 1981 old Unix unification of files, pipes and interactive i/o as
>> streams of single bytes. Even with UTF-8 and no support for interactive
>> features it's ungood, because UTF-8 error states are usually persistent.

That surprises me, because UTF-8 was designed so that recovering would
be easy. E.g. when you encounter an octet with MSB unset, you know
you've found an undamaged ASCII character.

> Linux terminals can certainly be screwed up if you try and cat a binary
> file. I don't know if it is only UTF-8 errors, or other problems.

It's very easy to screw up a terminal without involving UTF-8. I doubt
if UTF-8 makes that worse.

> No system is perfect, it seems.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Ben Bacarisse <ben.usenet@bsb.me.uk>: Apr 07 07:52PM +0100

> I don't understand how 0xC2, 0xAC gives the cent symbol.

I don't think it ever will! 0xC2 0xAC is the hex encoding of the UTF-8
encoding of the not sign. A cent symbol is UTF-8 encoded as 0xC2 0xA2.

You need to anchor the distinction between the character set (loosely
the numbering of some collection of symbols) and the way in which those
numbers are encoded as bytes for transmission and/or printing. Unicode
is, to a first approximation, a numbering scheme for many hundreds of
characters. The numbers specified by Unicode can then be transmitted in
a variety of ways with names like UCS4, UTF-16 and UTF-8.

An excellent resource for learning about UTF-8 is this page:

http://www.cl.cam.ac.uk/~mgk25/unicode.html

<snip>
--
Ben.

alexo <alessandro.volturno@libero.it>: Apr 08 01:01PM +0200

Il 06/04/2017 18:20, Alf P. Steinbach ha scritto:
> getline( wcin, name );
> wcout << "Pleased to meet you, " << name << "!" << endl;
> }

this is the output of my g++ compiler (MinGW)

g++ (GCC) 5.3.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And this is the message I get trying to compile your code:

main.cpp: In function 'void init_streams()':
main.cpp:15:32: error: '_O_WTEXT' was not declared in this scope
_setmode( _fileno( stdin), _O_WTEXT );
^
main.cpp: In function 'int main()':
main.cpp:24:21: error: converting to execution character set: Illegal
byte sequence
auto const& s = L"Every ??? ????? likes Norwegian blåbærsyltetøy!";
^
main.cpp:28:14: error: converting to execution character set: Illegal
byte sequence
wcout << L"What's your name? ";
^

I've removed all C++11 flavours and I've added

#include <cstdio>

to turn off a couple of errors about finding stdin and stout.
The problem is that my compiler cannot find _O_WTEXT
and that it doesn't recognize the format L"..." string

alexo <alessandro.volturno@libero.it>: Apr 09 05:34PM +0200

Il 08/04/2017 18:39, Alf P. Steinbach ha scritto:
>> [snip]
>> Never heard of mingw-w64

> That wasn't quite what I wrote.

www.mingw-w64.org/doku.php

I've found this project for MinGW in 64 bit flavour.
Is that what you referred to?

David Brown <david.brown@hesbynett.no>: Apr 07 03:37PM +0200

On 07/04/17 14:06, Alvin wrote:
> File "unicode.py", line 5, in <module>
> UnicodeEncodeError: 'latin-1' codec can't encode character '\u03c0' in
> position 19: ordinal not in range(256)

That is certainly true. But the difference is that the standard shells
and terminals on Linux have all been fine with utf-8 for a good many
years, and most systems will have a utf-8 locale even if they are only
used for plain ASCII characters normally. On Windows, however, you need
to go out of your way to get extra terminal software and have extra
settings (unless things have changed in later Windows).

Still, I will remember the possibility of something like ConEmu if I
find I need console utf-8 on Windows.

alexo <alessandro.volturno@libero.it>: Apr 08 04:28PM +0200

>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>> PURPOSE.

> This is an old compiler.

I'm sorry to have to agree to you. But I knew only the MinGW main
project site.

> Essentially the maintenance of MinGW g++ has been passed from the
> original MinGW project (where I believe you downloaded that compiler) to
> the MinGW-64 project.

Never heard of mingw-w64

> [H:\forums\clc++\unicode in windows console]
>> g++ _setmode.cpp -std=c++11 -D __MSVCRT_VERSION__=0x0800
> -U__STRICT_ANSI__

this is the output my actual (MinGW) g++ spits out:

g++: error: __MSVCRT_VERSION__=0x0800: No such file or directory
probably this is due to the compiler's old release number.

> To avoid having to write all that every time, you can define the parts
> that you'd otherwise have to repeat, as an environment variable.

> Or, make a script or alias for the g++ invocation.

I'm not sure the way it must be done.
Anyway the command history of the prompt shell helps a lot.

> By default that's UTF-8.

> And better make that UTF-8 with BOM, so that Visual C++ will understand
> that it's UTF-8 by default.

OK, encoded and saved using notepad++ text editor.
I didn't know of this encoding necessity.

Since UTF-8 is back-compatible to ASCII I'll use the former as the
default encoding format.

An off-topic question: could you brefly tell me what does the arrow mean
in the following main declaration?

thank you

auto main() -> int
{
...
}

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 07 02:19PM +0200

On 07-Apr-17 9:17 AM, David Brown wrote:
>>> Hello, world - ÅØÆ πr² ïç a̳b

>>> Works fine with Python 2 or 3. It certainly works on my Linux machine -
>>> I expect it to work on Windows too.

[snip]

> I took a quick test on my Windows 7 system. It struggled with some of
> the characters - it seems the modification character \u0333 is beyond
> Window's abilities.

Yes, it is.

> And π is missing.

Whether you have π (pi) available with Python output depends on the
active codepage (the narrow character encoding) in the console, and the
reason that it depends on that, is Python's conversion from Unicode to
the active codepage, instead of using console i/o. I.e., to be less
imprecise, that the CPython implementation still does not support
Windows consoles but uses the standard narrow streams for console i/o.

Checking by simple

chcp CODEPAGENUMBER
echo π | more

… π appears to be there in codepages 437 (IBM PC, default in English
language installations of Windows) and 865 (Nordic), but appears to be
missing in codepages 865 (all-European) and 1252 (Windows ANSI Western,
an extension of ISO Latin-1), which some programmers use in the console.

But again, it would be impractical to keep changing the codepage to suit
the effective character set of the Python script. Instead CPython should
be fixed. I think that to get it fixed someone should argue that CPython
could be made even better (no mention of this being a fault).

> Perhaps Windows console could print the Unicode characters, as long as
> those characters happened to be in the normal Windows code page
> (Latin-1, or something similar).

Windows consoles handle the Basic Multilingual Plane just fine; Windows
console Windows are Unicode beasts through and through.

For example, you can use any BMP character in a typed in command, and
that command line is passed perfectly to the process.

Programs, such as the CPython implementation, and designs, such as the
design of the C++ i/o, fail to handle Unicode correctly in Windows.
Mainly because they are based on the antiquated and counter-productive
since 1981 old Unix unification of files, pipes and interactive i/o as
streams of single bytes. Even with UTF-8 and no support for interactive
features it's ungood, because UTF-8 error states are usually persistent.

Cheers & hth., :)

- Alf

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 07 03:46PM +0200

On 07-Apr-17 3:03 PM, alexo wrote:
> I don't understand how 0xC2, 0xAC gives the cent symbol.
> It is not a single character code, so how can this sequence interpreted
> as a single character?

You need to provide the full code and example for that.

Cheers!,

- Alf

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 08 06:39PM +0200

On 08-Apr-17 4:28 PM, alexo wrote:
> [snip]
> Never heard of mingw-w64

That wasn't quite what I wrote.

> this is the output my actual (MinGW) g++ spits out:

> g++: error: __MSVCRT_VERSION__=0x0800: No such file or directory
> probably this is due to the compiler's old release number.

You'd better copy and paste the command, and maybe edit the file name,
rather than typing it in.

Here it is again:

g++ your_file_name.cpp -std=c++11 -D __MSVCRT_VERSION__=0x0800
-U__STRICT_ANSI__

> An off-topic question: could you brefly tell me what does the arrow mean
> in the following main declaration?

> auto main() -> int

It means that `main` returns a function result of type `int`.

Cheers, & hth.,

- Alf

Bonita Montero <Bonita.Montero@gmail.com>: Apr 09 11:00AM +0200

Your formatting-style is disgusting.

Alvin <Alvin@invalid.invalid>: Apr 08 02:01PM +0200

On 2017-04-08 13:01, alexo wrote:
> main.cpp:15:32: error: '_O_WTEXT' was not declared in this scope
> _setmode( _fileno( stdin), _O_WTEXT );
> ^

...\x86_64-w64-mingw32\include\fcntl.h:
/**
* This file has no copyright assigned and is placed in the Public Domain.
* This file is part of the mingw-w64 runtime package.
* No warranty is given; refer to the file DISCLAIMER.PD within this
package.
*/
...
#define _O_WTEXT 0x10000
...

> byte sequence
> wcout << L"What's your name? ";
> ^

That's the kind of error you get, if you didn't properly create the .cpp
as UTF-8.

> to turn off a couple of errors about finding stdin and stout.
> The problem is that my compiler cannot find _O_WTEXT
> and that it doesn't recognize the format L"..." string

The original code works with the MinGW 5.x and 6.x versions I have lying
around.

David Brown <david.brown@hesbynett.no>: Apr 07 09:17AM +0200

On 07/04/17 01:25, Alf P. Steinbach wrote:
> UnicodeEncodeError: 'charmap' codec can't encode character '\xd8' in
> position 16: character maps to <undefined>

> [H:\forums\clc++\unicode in windows console]

I took a quick test on my Windows 7 system. It struggled with some of
the characters - it seems the modification character \u0333 is beyond
Window's abilities. And π is missing. With those removed, it worked.

Perhaps Windows console could print the Unicode characters, as long as
those characters happened to be in the normal Windows code page
(Latin-1, or something similar).

Mr Flibble <flibble@i42.co.uk>: Apr 08 11:44PM +0100

On 08/04/2017 15:28, alexo wrote:

> {
> ...
> }

It means that the person who wrote it has OCD.

/Flibble

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 09 10:52PM +0200

On 09-Apr-17 5:34 PM, alexo wrote:

> www.mingw-w64.org/doku.php

> I've found this project for MinGW in 64 bit flavour.
> Is that what you referred to?

Yep. You just have to be aware that there are lots of different builds,
by different persons. As I recall the Nuwen distro (which is very simple
and small) is built on MinGW-w64 but lacks support for specifying
execution character set, which currently is awkward for Windows
programming. Still it's nice, and it's maintained by STL over at
Microsoft (who is the guy who maintains the STL over at Microsoft, a
sort of extreme name coincidence), but on the Nuwen site he just calls
himself after the hacker prince in Vernor Vinge's novels.

Cheers!,

- Alf

David Brown <david.brown@hesbynett.no>: Apr 07 03:44PM +0200

On 07/04/17 14:19, Alf P. Steinbach wrote:
> the active codepage, instead of using console i/o. I.e., to be less
> imprecise, that the CPython implementation still does not support
> Windows consoles but uses the standard narrow streams for console i/o.

That makes sense.

> language installations of Windows) and 865 (Nordic), but appears to be
> missing in codepages 865 (all-European) and 1252 (Windows ANSI Western,
> an extension of ISO Latin-1), which some programmers use in the console.

My code page at the moment is 850 (latin-1) - I have Windows set up for
UK English, but with a Norwegian keyboard. I can show π in a terminal,
with a suitable font like Lucida Console, but the python script still
does not print it.

> the effective character set of the Python script. Instead CPython should
> be fixed. I think that to get it fixed someone should argue that CPython
> could be made even better (no mention of this being a fault).

Yes - it would be nice if this simply worked cross-platform out of the
box on Windows. I suppose font support would be required, but it should
not be asking /too/ much of Python to make Unicode output work here on
Windows in the same way as on Linux.

> since 1981 old Unix unification of files, pipes and interactive i/o as
> streams of single bytes. Even with UTF-8 and no support for interactive
> features it's ungood, because UTF-8 error states are usually persistent.

Linux terminals can certainly be screwed up if you try and cat a binary
file. I don't know if it is only UTF-8 errors, or other problems. No
system is perfect, it seems.

> Cheers & hth., :)

Perhaps that is enough of this here - the Python stuff is off-topic for
c.l.c++ and I doubt if it is helping the OP. But thank you for your
explanations - I have learned a few new things here.

Alvin <Alvin@invalid.invalid>: Apr 07 04:40PM +0200

On 2017-04-07 15:37, David Brown wrote:
> settings (unless things have changed in later Windows).

> Still, I will remember the possibility of something like ConEmu if I
> find I need console utf-8 on Windows.

I just tried Python 3.6.1. It works without chcp. There is PEP 528:
https://www.python.org/dev/peps/pep-0528/

Alvin <Alvin@invalid.invalid>: Apr 07 02:06PM +0200

On 2017-04-07 09:17, David Brown wrote:

> Perhaps Windows console could print the Unicode characters, as long as
> those characters happened to be in the normal Windows code page
> (Latin-1, or something similar).

Windows works fine, if you set the codepage to UTF-8 (at least with a
terminal with good UTF support like ConEmu):
chcp 65001

It's not like it would work on Linux, if you have a non-UTF configuration:

> LC_ALL=en_US python3 unicode.py

Traceback (most recent call last):
File "unicode.py", line 5, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\u03c0' in
position 19: ordinal not in range(256)

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 08 03:05PM +0200

On 08-Apr-17 1:01 PM, alexo wrote:
> Copyright (C) 2015 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

This is an old compiler.

Well, 2 years old, and that can be a long time when we get a new
standard every second year.

Essentially the maintenance of MinGW g++ has been passed from the
original MinGW project (where I believe you downloaded that compiler) to
the MinGW-64 project.

> main.cpp: In function 'void init_streams()':
> main.cpp:15:32: error: '_O_WTEXT' was not declared in this scope
> _setmode( _fileno( stdin), _O_WTEXT );

By inspection of the headers of that compiler's standard library, in
order to get a definition of `_O_WTEXT` with this compiler you need to
define `__MSVCRT_VERSION__` as equal or greater than `0x0800`.

Also, with `-std=c++11` option you need to explicitly tell it to not
define `__STRICT_ANSI__`, in order to get a definition of `_fileno`.

Which with this compiler's library is defined by the header that I
forgot to include, namely `<stdio.h>`.

It's weird that a compiler whose one and only purpose was to work in
Windows, doesn't. Anyway, the good news is that the newer g++ compilers
don't have these quirks. At least not the ones from MinGW-64.

Be that as it may, the following build command works for me, with g++
5.3.0-3 from the old MinGW project:

---------------------------------------------------------------------
[H:\forums\clc++\unicode in windows console]
> g++ --version
g++ (GCC) 5.3.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[H:\forums\clc++\unicode in windows console]
> g++ _setmode.cpp -std=c++11 -D __MSVCRT_VERSION__=0x0800
-U__STRICT_ANSI__

[H:\forums\clc++\unicode in windows console]
> a
Every 日本国 кошка likes Norwegian blåbærsyltetøy!

What's your name? Særskrevne Påske Nøtter
Pleased to meet you, Særskrevne Påske Nøtter!

[H:\forums\clc++\unicode in windows console]
> _
---------------------------------------------------------------------

To avoid having to write all that every time, you can define the parts
that you'd otherwise have to repeat, as an environment variable.

Or, make a script or alias for the g++ invocation.

> main.cpp:28:14: error: converting to execution character set: Illegal
> byte sequence
> wcout << L"What's your name? ";

You just need to save your .cpp file with the encoding that g++ expects.

By default that's UTF-8.

And better make that UTF-8 with BOM, so that Visual C++ will understand
that it's UTF-8 by default.

> I've removed all C++11 flavours and I've added

> #include <cstdio>

> to turn off a couple of errors about finding stdin and stout.

Sorry about that, I plain forgot to include that header. :(

By the way it should be `<stdio.h>`.

The `<cstdio>` header may not necessarily provide unqualified names,
e.g. with that header one may have to write `std::stdin` instead of just
`stdin`.

> The problem is that my compiler cannot find _O_WTEXT
> and that it doesn't recognize the format L"..." string

Cheers & hth., :)

- Alf

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 07 03:37PM +0200

On 07-Apr-17 3:03 PM, alexo wrote:
> }

> I know I seem stupid, but I tought there were a way to write something like

> std::wcout << "H\u0333" << "ello" << endl;

Well, you could read my reply to your original posting.

That's a hint.

Cheers!,

- Alf

alexo <alessandro.volturno@libero.it>: Apr 07 03:03PM +0200

I don't understand how 0xC2, 0xAC gives the cent symbol.
It is not a single character code, so how can this sequence interpreted
as a single character?

Same doubts in

0xE2, 0x82, 0xAC
or
0xF0, 0x90, 0x8D, 0x88.

> '\n',
> 0x00
> }

This arrays contains 19 or 11 characters (including in the counting 0x00
as '\0') ?

More over, reading the posts in this thread, I still have not understood
how to use wcout and unicode codes without having to write console
settings like

UINT oldcp = GetConsoleOutputCP();

if (!SetConsoleOutputCP(CP_UTF8))
{
fprintf(stderr, "chcp failed\n");
return EXIT_FAILURE;
}

I know I seem stupid, but I tought there were a way to write something like

std::wcout << "H\u0333" << "ello" << endl;

thank you

alexo <alessandro.volturno@libero.it>: Apr 09 12:07PM +0200

Il 08/04/2017 18:39, Alf P. Steinbach ha scritto:

> Here it is again:

> g++ your_file_name.cpp -std=c++11 -D __MSVCRT_VERSION__=0x0800
> -U__STRICT_ANSI__

Now it worked!
Copy and paste just worked fine. I suppose now that I miss-typed in
something...

>> in the following main declaration?

>> auto main() -> int

> It means that `main` returns a function result of type `int`.

ok

scott@slp53.sl.home (Scott Lurndal): Apr 07 01:49PM

>Linux terminals can certainly be screwed up if you try and cat a binary
>file. I don't know if it is only UTF-8 errors, or other problems. No
>system is perfect, it seems.

Random terminal control escape sequences within the binary will screw up
xterm and gnome-terminal.

$ stty sane
$ tput reset

will restore normal operations. It may be necessary,
in some cases (^c of poorly written curses app, e.g.)
to use ^j to get a newline when typing stty sane.

http://invisible-island.net/xterm/ctlseqs/ctlseqs.html

n-ary roots from complex numbers...

ram@zedat.fu-berlin.de (Stefan Ram): Apr 11 06:54PM

>I still don't understand how rotation, and thereby the structure
>(metric) of our physical environment, emerges in complex arithmetic.

The numbers 1 and i also can be represented as 2×2 matrices

/ \
| 1 0 |
1 = | |
| 0 1 |
\ /

and

/ \
| 0 -1 |
i = | | .
| 1 0 |
\ /

See Wikipedia »7.2 Matrix representation of complex
numbers«.

Now, compare this with the matrix M_z on page 2 "259" in

www.astro.caltech.edu/~golwala/ph125ab/ph106ab_notes_sec5.1.pdf

. We can thus see that this matrix is the infinitesimal
generator of rotations around the z axis. (The following
pages then explain how finite rotations can be obtained
from such "infinitesimal rotations".)

ram@zedat.fu-berlin.de (Stefan Ram): Apr 14 08:38PM

Some other languages (like Pascal or COBOL) provide types
for ranges and enumerations, and I wonder to which extend
one can create such types in C++.

For example,

struct example { range<2'000'000'000,2'000'000'010> i; };

. The implementation should emit an error message (a
compile-time error message if possible) when one tries to

instance.i = 0;

, and, if possible,

sizeof instance.i

should be just 1, because one byte is enough to store
one out of 10 values.

ram@zedat.fu-berlin.de (Stefan Ram): Apr 14 10:09PM

Newsgroups: comp.lang.c,comp.lang.c++

> 55
> > (loop for i from 1 below 10 sum i)
> 45

C++ (after appropriate definitions):

int main()
{
auto sum { 0 };

for( auto const i : from{ 1, below( 3 )} )sum += i;
::std::cout << sum << '\n'; }

. And, BTW, the variable declaration and the whole
loop are being compiled here into just

movl $3, %edx

, that is, all the looping is done at compile time!

However, my current, simplistic, definitions are not
prepared for other increments than »+1«. (See full
C++ code at the end of this post.)

JavaScript is closer to Lisp, so you can get the value
of the sum /as the value of the loop/ in JavaScript,
just as in Common Lisp.

function * from_below( from, top )
{ let i = from; while ( i < top )yield i++; }

console.log
( eval( "sum = 0; for( let i of from_below( 1, 3 ))sum += i;" ));

(prints »3«).

Full C++ source code:

#include <initializer_list>
#include <iostream>
#include <ostream>

struct intref
{ int i = 0;
constexpr explicit intref( int const i ): i{ i } {}
constexpr int operator * () const { return i; }
constexpr intref & operator ++ () { ++i; return *this; }
constexpr bool operator != ( intref const other ) const
{ return this->i != other.i; }};

struct from
{ intref const first;
intref const top;
constexpr from( int const first, int const top ):
first{ first }, top{ top } {}
constexpr intref const begin() const { return first; }
constexpr intref const end() const { return top; }};

constexpr int below( int const i )
{ return i; }

constexpr int including( int const i )
{ return i + 1; }

int main()
{ auto sum { 0 };
for( auto const i : from{ 1, below( 3 )} )sum += i;
::std::cout << sum << '\n'; }

. I am not an experienced writer of C++ classes, so
I appreciate all comments on my C++ source code with a
Followup-To header for the newsgroup comp.lang.c++.

My most frequent error when summing up, is forgetting to
initialize the sum variable. Using »auto« above means that
I can't forget this!

Of course, one could define the abstractions so as to
force the user to use one of »below« or »including«.

Newsgroups: comp.lang.c,comp.lang.c++

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Sunday, April 16, 2017

Digest for comp.lang.c++@googlegroups.com - 25 updates in 2 topics

No comments:

Blog Archive

About Me