- Unicode test - 22 Updates
- n-ary roots from complex numbers... - 3 Updates
Jorgen Grahn <grahn+nntp@snipabacken.se>: Apr 12 06:03AM On Fri, 2017-04-07, David Brown wrote: > On 07/04/17 14:19, Alf P. Steinbach wrote: ... >> since 1981 old Unix unification of files, pipes and interactive i/o as >> streams of single bytes. Even with UTF-8 and no support for interactive >> features it's ungood, because UTF-8 error states are usually persistent. That surprises me, because UTF-8 was designed so that recovering would be easy. E.g. when you encounter an octet with MSB unset, you know you've found an undamaged ASCII character. > Linux terminals can certainly be screwed up if you try and cat a binary > file. I don't know if it is only UTF-8 errors, or other problems. It's very easy to screw up a terminal without involving UTF-8. I doubt if UTF-8 makes that worse. > No system is perfect, it seems. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Apr 07 07:52PM +0100 > I don't understand how 0xC2, 0xAC gives the cent symbol. I don't think it ever will! 0xC2 0xAC is the hex encoding of the UTF-8 encoding of the not sign. A cent symbol is UTF-8 encoded as 0xC2 0xA2. You need to anchor the distinction between the character set (loosely the numbering of some collection of symbols) and the way in which those numbers are encoded as bytes for transmission and/or printing. Unicode is, to a first approximation, a numbering scheme for many hundreds of characters. The numbers specified by Unicode can then be transmitted in a variety of ways with names like UCS4, UTF-16 and UTF-8. An excellent resource for learning about UTF-8 is this page: http://www.cl.cam.ac.uk/~mgk25/unicode.html <snip> -- Ben. |
alexo <alessandro.volturno@libero.it>: Apr 08 01:01PM +0200 Il 06/04/2017 18:20, Alf P. Steinbach ha scritto: > getline( wcin, name ); > wcout << "Pleased to meet you, " << name << "!" << endl; > } this is the output of my g++ compiler (MinGW) g++ (GCC) 5.3.0 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. And this is the message I get trying to compile your code: main.cpp: In function 'void init_streams()': main.cpp:15:32: error: '_O_WTEXT' was not declared in this scope _setmode( _fileno( stdin), _O_WTEXT ); ^ main.cpp: In function 'int main()': main.cpp:24:21: error: converting to execution character set: Illegal byte sequence auto const& s = L"Every ??? ????? likes Norwegian blåbærsyltetøy!"; ^ main.cpp:28:14: error: converting to execution character set: Illegal byte sequence wcout << L"What's your name? "; ^ I've removed all C++11 flavours and I've added #include <cstdio> to turn off a couple of errors about finding stdin and stout. The problem is that my compiler cannot find _O_WTEXT and that it doesn't recognize the format L"..." string |
alexo <alessandro.volturno@libero.it>: Apr 09 05:34PM +0200 Il 08/04/2017 18:39, Alf P. Steinbach ha scritto: >> [snip] >> Never heard of mingw-w64 > That wasn't quite what I wrote. www.mingw-w64.org/doku.php I've found this project for MinGW in 64 bit flavour. Is that what you referred to? |
David Brown <david.brown@hesbynett.no>: Apr 07 03:37PM +0200 On 07/04/17 14:06, Alvin wrote: > File "unicode.py", line 5, in <module> > UnicodeEncodeError: 'latin-1' codec can't encode character '\u03c0' in > position 19: ordinal not in range(256) That is certainly true. But the difference is that the standard shells and terminals on Linux have all been fine with utf-8 for a good many years, and most systems will have a utf-8 locale even if they are only used for plain ASCII characters normally. On Windows, however, you need to go out of your way to get extra terminal software and have extra settings (unless things have changed in later Windows). Still, I will remember the possibility of something like ConEmu if I find I need console utf-8 on Windows. |
alexo <alessandro.volturno@libero.it>: Apr 08 04:28PM +0200 >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR >> PURPOSE. > This is an old compiler. I'm sorry to have to agree to you. But I knew only the MinGW main project site. > Essentially the maintenance of MinGW g++ has been passed from the > original MinGW project (where I believe you downloaded that compiler) to > the MinGW-64 project. Never heard of mingw-w64 > [H:\forums\clc++\unicode in windows console] >> g++ _setmode.cpp -std=c++11 -D __MSVCRT_VERSION__=0x0800 > -U__STRICT_ANSI__ this is the output my actual (MinGW) g++ spits out: g++: error: __MSVCRT_VERSION__=0x0800: No such file or directory probably this is due to the compiler's old release number. > To avoid having to write all that every time, you can define the parts > that you'd otherwise have to repeat, as an environment variable. > Or, make a script or alias for the g++ invocation. I'm not sure the way it must be done. Anyway the command history of the prompt shell helps a lot. > By default that's UTF-8. > And better make that UTF-8 with BOM, so that Visual C++ will understand > that it's UTF-8 by default. OK, encoded and saved using notepad++ text editor. I didn't know of this encoding necessity. Since UTF-8 is back-compatible to ASCII I'll use the former as the default encoding format. An off-topic question: could you brefly tell me what does the arrow mean in the following main declaration? thank you auto main() -> int { ... } |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 07 02:19PM +0200 On 07-Apr-17 9:17 AM, David Brown wrote: >>> Hello, world - ÅØÆ πr² ïç a̳b >>> Works fine with Python 2 or 3. It certainly works on my Linux machine - >>> I expect it to work on Windows too. [snip] > I took a quick test on my Windows 7 system. It struggled with some of > the characters - it seems the modification character \u0333 is beyond > Window's abilities. Yes, it is. > And π is missing. Whether you have π (pi) available with Python output depends on the active codepage (the narrow character encoding) in the console, and the reason that it depends on that, is Python's conversion from Unicode to the active codepage, instead of using console i/o. I.e., to be less imprecise, that the CPython implementation still does not support Windows consoles but uses the standard narrow streams for console i/o. Checking by simple chcp CODEPAGENUMBER echo π | more … π appears to be there in codepages 437 (IBM PC, default in English language installations of Windows) and 865 (Nordic), but appears to be missing in codepages 865 (all-European) and 1252 (Windows ANSI Western, an extension of ISO Latin-1), which some programmers use in the console. But again, it would be impractical to keep changing the codepage to suit the effective character set of the Python script. Instead CPython should be fixed. I think that to get it fixed someone should argue that CPython could be made even better (no mention of this being a fault). > Perhaps Windows console could print the Unicode characters, as long as > those characters happened to be in the normal Windows code page > (Latin-1, or something similar). Windows consoles handle the Basic Multilingual Plane just fine; Windows console Windows are Unicode beasts through and through. For example, you can use any BMP character in a typed in command, and that command line is passed perfectly to the process. Programs, such as the CPython implementation, and designs, such as the design of the C++ i/o, fail to handle Unicode correctly in Windows. Mainly because they are based on the antiquated and counter-productive since 1981 old Unix unification of files, pipes and interactive i/o as streams of single bytes. Even with UTF-8 and no support for interactive features it's ungood, because UTF-8 error states are usually persistent. Cheers & hth., :) - Alf |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 07 03:46PM +0200 On 07-Apr-17 3:03 PM, alexo wrote: > I don't understand how 0xC2, 0xAC gives the cent symbol. > It is not a single character code, so how can this sequence interpreted > as a single character? You need to provide the full code and example for that. Cheers!, - Alf |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 08 06:39PM +0200 On 08-Apr-17 4:28 PM, alexo wrote: > [snip] > Never heard of mingw-w64 That wasn't quite what I wrote. > this is the output my actual (MinGW) g++ spits out: > g++: error: __MSVCRT_VERSION__=0x0800: No such file or directory > probably this is due to the compiler's old release number. You'd better copy and paste the command, and maybe edit the file name, rather than typing it in. Here it is again: g++ your_file_name.cpp -std=c++11 -D __MSVCRT_VERSION__=0x0800 -U__STRICT_ANSI__ > An off-topic question: could you brefly tell me what does the arrow mean > in the following main declaration? > auto main() -> int It means that `main` returns a function result of type `int`. Cheers, & hth., - Alf |
Bonita Montero <Bonita.Montero@gmail.com>: Apr 09 11:00AM +0200 Your formatting-style is disgusting. |
Alvin <Alvin@invalid.invalid>: Apr 08 02:01PM +0200 On 2017-04-08 13:01, alexo wrote: > main.cpp:15:32: error: '_O_WTEXT' was not declared in this scope > _setmode( _fileno( stdin), _O_WTEXT ); > ^ ...\x86_64-w64-mingw32\include\fcntl.h: /** * This file has no copyright assigned and is placed in the Public Domain. * This file is part of the mingw-w64 runtime package. * No warranty is given; refer to the file DISCLAIMER.PD within this package. */ ... #define _O_WTEXT 0x10000 ... > byte sequence > wcout << L"What's your name? "; > ^ That's the kind of error you get, if you didn't properly create the .cpp as UTF-8. > to turn off a couple of errors about finding stdin and stout. > The problem is that my compiler cannot find _O_WTEXT > and that it doesn't recognize the format L"..." string The original code works with the MinGW 5.x and 6.x versions I have lying around. |
David Brown <david.brown@hesbynett.no>: Apr 07 09:17AM +0200 On 07/04/17 01:25, Alf P. Steinbach wrote: > UnicodeEncodeError: 'charmap' codec can't encode character '\xd8' in > position 16: character maps to <undefined> > [H:\forums\clc++\unicode in windows console] I took a quick test on my Windows 7 system. It struggled with some of the characters - it seems the modification character \u0333 is beyond Window's abilities. And π is missing. With those removed, it worked. Perhaps Windows console could print the Unicode characters, as long as those characters happened to be in the normal Windows code page (Latin-1, or something similar). |
Mr Flibble <flibble@i42.co.uk>: Apr 08 11:44PM +0100 On 08/04/2017 15:28, alexo wrote: > { > ... > } It means that the person who wrote it has OCD. /Flibble |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 09 10:52PM +0200 On 09-Apr-17 5:34 PM, alexo wrote: > www.mingw-w64.org/doku.php > I've found this project for MinGW in 64 bit flavour. > Is that what you referred to? Yep. You just have to be aware that there are lots of different builds, by different persons. As I recall the Nuwen distro (which is very simple and small) is built on MinGW-w64 but lacks support for specifying execution character set, which currently is awkward for Windows programming. Still it's nice, and it's maintained by STL over at Microsoft (who is the guy who maintains the STL over at Microsoft, a sort of extreme name coincidence), but on the Nuwen site he just calls himself after the hacker prince in Vernor Vinge's novels. Cheers!, - Alf |
David Brown <david.brown@hesbynett.no>: Apr 07 03:44PM +0200 On 07/04/17 14:19, Alf P. Steinbach wrote: > the active codepage, instead of using console i/o. I.e., to be less > imprecise, that the CPython implementation still does not support > Windows consoles but uses the standard narrow streams for console i/o. That makes sense. > language installations of Windows) and 865 (Nordic), but appears to be > missing in codepages 865 (all-European) and 1252 (Windows ANSI Western, > an extension of ISO Latin-1), which some programmers use in the console. My code page at the moment is 850 (latin-1) - I have Windows set up for UK English, but with a Norwegian keyboard. I can show π in a terminal, with a suitable font like Lucida Console, but the python script still does not print it. > the effective character set of the Python script. Instead CPython should > be fixed. I think that to get it fixed someone should argue that CPython > could be made even better (no mention of this being a fault). Yes - it would be nice if this simply worked cross-platform out of the box on Windows. I suppose font support would be required, but it should not be asking /too/ much of Python to make Unicode output work here on Windows in the same way as on Linux. > since 1981 old Unix unification of files, pipes and interactive i/o as > streams of single bytes. Even with UTF-8 and no support for interactive > features it's ungood, because UTF-8 error states are usually persistent. Linux terminals can certainly be screwed up if you try and cat a binary file. I don't know if it is only UTF-8 errors, or other problems. No system is perfect, it seems. > Cheers & hth., :) Perhaps that is enough of this here - the Python stuff is off-topic for c.l.c++ and I doubt if it is helping the OP. But thank you for your explanations - I have learned a few new things here. |
Alvin <Alvin@invalid.invalid>: Apr 07 04:40PM +0200 On 2017-04-07 15:37, David Brown wrote: > settings (unless things have changed in later Windows). > Still, I will remember the possibility of something like ConEmu if I > find I need console utf-8 on Windows. I just tried Python 3.6.1. It works without chcp. There is PEP 528: https://www.python.org/dev/peps/pep-0528/ |
Alvin <Alvin@invalid.invalid>: Apr 07 02:06PM +0200 On 2017-04-07 09:17, David Brown wrote: > Perhaps Windows console could print the Unicode characters, as long as > those characters happened to be in the normal Windows code page > (Latin-1, or something similar). Windows works fine, if you set the codepage to UTF-8 (at least with a terminal with good UTF support like ConEmu): chcp 65001 It's not like it would work on Linux, if you have a non-UTF configuration: > LC_ALL=en_US python3 unicode.py Traceback (most recent call last): File "unicode.py", line 5, in <module> UnicodeEncodeError: 'latin-1' codec can't encode character '\u03c0' in position 19: ordinal not in range(256) |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 08 03:05PM +0200 On 08-Apr-17 1:01 PM, alexo wrote: > Copyright (C) 2015 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This is an old compiler. Well, 2 years old, and that can be a long time when we get a new standard every second year. Essentially the maintenance of MinGW g++ has been passed from the original MinGW project (where I believe you downloaded that compiler) to the MinGW-64 project. > main.cpp: In function 'void init_streams()': > main.cpp:15:32: error: '_O_WTEXT' was not declared in this scope > _setmode( _fileno( stdin), _O_WTEXT ); By inspection of the headers of that compiler's standard library, in order to get a definition of `_O_WTEXT` with this compiler you need to define `__MSVCRT_VERSION__` as equal or greater than `0x0800`. Also, with `-std=c++11` option you need to explicitly tell it to not define `__STRICT_ANSI__`, in order to get a definition of `_fileno`. Which with this compiler's library is defined by the header that I forgot to include, namely `<stdio.h>`. It's weird that a compiler whose one and only purpose was to work in Windows, doesn't. Anyway, the good news is that the newer g++ compilers don't have these quirks. At least not the ones from MinGW-64. Be that as it may, the following build command works for me, with g++ 5.3.0-3 from the old MinGW project: --------------------------------------------------------------------- [H:\forums\clc++\unicode in windows console] > g++ --version g++ (GCC) 5.3.0 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [H:\forums\clc++\unicode in windows console] > g++ _setmode.cpp -std=c++11 -D __MSVCRT_VERSION__=0x0800 -U__STRICT_ANSI__ [H:\forums\clc++\unicode in windows console] > a Every 日本国 кошка likes Norwegian blåbærsyltetøy! What's your name? Særskrevne Påske Nøtter Pleased to meet you, Særskrevne Påske Nøtter! [H:\forums\clc++\unicode in windows console] > _ --------------------------------------------------------------------- To avoid having to write all that every time, you can define the parts that you'd otherwise have to repeat, as an environment variable. Or, make a script or alias for the g++ invocation. > main.cpp:28:14: error: converting to execution character set: Illegal > byte sequence > wcout << L"What's your name? "; You just need to save your .cpp file with the encoding that g++ expects. By default that's UTF-8. And better make that UTF-8 with BOM, so that Visual C++ will understand that it's UTF-8 by default. > I've removed all C++11 flavours and I've added > #include <cstdio> > to turn off a couple of errors about finding stdin and stout. Sorry about that, I plain forgot to include that header. :( By the way it should be `<stdio.h>`. The `<cstdio>` header may not necessarily provide unqualified names, e.g. with that header one may have to write `std::stdin` instead of just `stdin`. > The problem is that my compiler cannot find _O_WTEXT > and that it doesn't recognize the format L"..." string Cheers & hth., :) - Alf |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 07 03:37PM +0200 On 07-Apr-17 3:03 PM, alexo wrote: > } > I know I seem stupid, but I tought there were a way to write something like > std::wcout << "H\u0333" << "ello" << endl; Well, you could read my reply to your original posting. That's a hint. Cheers!, - Alf |
alexo <alessandro.volturno@libero.it>: Apr 07 03:03PM +0200 I don't understand how 0xC2, 0xAC gives the cent symbol. It is not a single character code, so how can this sequence interpreted as a single character? Same doubts in 0xE2, 0x82, 0xAC or 0xF0, 0x90, 0x8D, 0x88. > '\n', > 0x00 > } This arrays contains 19 or 11 characters (including in the counting 0x00 as '\0') ? More over, reading the posts in this thread, I still have not understood how to use wcout and unicode codes without having to write console settings like UINT oldcp = GetConsoleOutputCP(); if (!SetConsoleOutputCP(CP_UTF8)) { fprintf(stderr, "chcp failed\n"); return EXIT_FAILURE; } I know I seem stupid, but I tought there were a way to write something like std::wcout << "H\u0333" << "ello" << endl; thank you |
alexo <alessandro.volturno@libero.it>: Apr 09 12:07PM +0200 Il 08/04/2017 18:39, Alf P. Steinbach ha scritto: > Here it is again: > g++ your_file_name.cpp -std=c++11 -D __MSVCRT_VERSION__=0x0800 > -U__STRICT_ANSI__ Now it worked! Copy and paste just worked fine. I suppose now that I miss-typed in something... >> in the following main declaration? >> auto main() -> int > It means that `main` returns a function result of type `int`. ok |
scott@slp53.sl.home (Scott Lurndal): Apr 07 01:49PM >Linux terminals can certainly be screwed up if you try and cat a binary >file. I don't know if it is only UTF-8 errors, or other problems. No >system is perfect, it seems. Random terminal control escape sequences within the binary will screw up xterm and gnome-terminal. $ stty sane $ tput reset will restore normal operations. It may be necessary, in some cases (^c of poorly written curses app, e.g.) to use ^j to get a newline when typing stty sane. http://invisible-island.net/xterm/ctlseqs/ctlseqs.html |
ram@zedat.fu-berlin.de (Stefan Ram): Apr 11 06:54PM >I still don't understand how rotation, and thereby the structure >(metric) of our physical environment, emerges in complex arithmetic. The numbers 1 and i also can be represented as 2×2 matrices / \ | 1 0 | 1 = | | | 0 1 | \ / and / \ | 0 -1 | i = | | . | 1 0 | \ / See Wikipedia »7.2 Matrix representation of complex numbers«. Now, compare this with the matrix M_z on page 2 "259" in www.astro.caltech.edu/~golwala/ph125ab/ph106ab_notes_sec5.1.pdf . We can thus see that this matrix is the infinitesimal generator of rotations around the z axis. (The following pages then explain how finite rotations can be obtained from such "infinitesimal rotations".) |
ram@zedat.fu-berlin.de (Stefan Ram): Apr 14 08:38PM Some other languages (like Pascal or COBOL) provide types for ranges and enumerations, and I wonder to which extend one can create such types in C++. For example, struct example { range<2'000'000'000,2'000'000'010> i; }; . The implementation should emit an error message (a compile-time error message if possible) when one tries to instance.i = 0; , and, if possible, sizeof instance.i should be just 1, because one byte is enough to store one out of 10 values. |
ram@zedat.fu-berlin.de (Stefan Ram): Apr 14 10:09PM Newsgroups: comp.lang.c,comp.lang.c++ > 55 > > (loop for i from 1 below 10 sum i) > 45 C++ (after appropriate definitions): int main() { auto sum { 0 }; for( auto const i : from{ 1, below( 3 )} )sum += i; ::std::cout << sum << '\n'; } . And, BTW, the variable declaration and the whole loop are being compiled here into just movl $3, %edx , that is, all the looping is done at compile time! However, my current, simplistic, definitions are not prepared for other increments than »+1«. (See full C++ code at the end of this post.) JavaScript is closer to Lisp, so you can get the value of the sum /as the value of the loop/ in JavaScript, just as in Common Lisp. function * from_below( from, top ) { let i = from; while ( i < top )yield i++; } console.log ( eval( "sum = 0; for( let i of from_below( 1, 3 ))sum += i;" )); (prints »3«). Full C++ source code: #include <initializer_list> #include <iostream> #include <ostream> struct intref { int i = 0; constexpr explicit intref( int const i ): i{ i } {} constexpr int operator * () const { return i; } constexpr intref & operator ++ () { ++i; return *this; } constexpr bool operator != ( intref const other ) const { return this->i != other.i; }}; struct from { intref const first; intref const top; constexpr from( int const first, int const top ): first{ first }, top{ top } {} constexpr intref const begin() const { return first; } constexpr intref const end() const { return top; }}; constexpr int below( int const i ) { return i; } constexpr int including( int const i ) { return i + 1; } int main() { auto sum { 0 }; for( auto const i : from{ 1, below( 3 )} )sum += i; ::std::cout << sum << '\n'; } . I am not an experienced writer of C++ classes, so I appreciate all comments on my C++ source code with a Followup-To header for the newsgroup comp.lang.c++. My most frequent error when summing up, is forgetting to initialize the sum variable. Using »auto« above means that I can't forget this! Of course, one could define the abstractions so as to force the user to use one of »below« or »including«. Newsgroups: comp.lang.c,comp.lang.c++ |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment