soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Unicode test - 8 Updates
My following projects were updated - 4 Updates
How to find 56 potential vulnerabilities in FreeBSD code in one evening - 1 Update
Unicode test - 1 Update

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 06 05:03AM +0200

On 06-Apr-17 3:08 AM, Stefan Ram wrote:
>> doesn't work for console i/o, down at the API level.

> If »generally doesn't« means, »not under every combination
> of circumstance«, you might be right.

Generally means mostly, and not at all for input.

> H ä l ¢ Euro [] [] l ö !

> (U+10348 lays outside of the BMP, and was not rendered correctly
> here [it was rendered looking similar to »[][]«]).

That might be an UTF-16 surrogate pair, treated as two UCS-2 characters
by the console.

Sorry I don't have the energy or time right now (or earlier) to give a
good discussion. It's messy. I'll try to find time & energy tomorrow.

Cheers!,

- Alf

alexo <alessandro.volturno@libero.it>: Apr 06 02:43PM +0200

Il 06/04/2017 00:16, Stefan Ram ha scritto:

> And you need to set a console font in the console,
> no, wait: a /Unicode/ font!, such as: Lucida Console
> or Consolas.

I have tried my code on Windows 10 under cmd prompt shell and consolas
font face. My code produces no output.

if instead of wcout I use cout what I get is the number value in decimal
format of the character I would like to print.

alexo <alessandro.volturno@libero.it>: Apr 06 03:15PM +0200

My machine displays only certain characters, namely the cent symbol, the
euro sign, the ä and ö, but even with consolas font face the 0xF0, 0x90,
0x8D, 0x88 (character U+10348) - that in my case is shown as 2 distinct
characters - is not displayed at all.

Häl¢€𐍈lö! should be the output.

I've argued that I have to insert the exadecimal code of any single byte
of the UTF-8 character, I'm I right?
Could you explain me (I'm not so friend to hex and key codes) how should
I use, for example, the U+0333 characacter?

Is it really so tricky to print a Unicode character?

thank you

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 06 06:20PM +0200

On 05-Apr-17 2:36 PM, alexo wrote:

> return 0;
> }

> It outputs nothing

U+0333 is a modifier character that applies a double underscore to the
preceding character. When there is no preceding character, such as a
space, you should not expect any effect.

However, if instead you used e.g. the Euro sign, which Windows Write
informs me is U+20AC, then you should only expect that character to
display if it is in the active codepage (character set) for the console
window. Because the function of wcout is to convert from wide characters
(effectively Unicode, in Windows encoded as UTF-16) to the narrow
character encoding expected by the external environment.

In Unix-land this is not a big problem, because the external environment
in Unix-land typically expects, and produces, UTF-8 encoded text. So all
you have to do there is the `setlocale(LC_ALL, "")` code that you have.
Then the wide streams work, in Unix-land.

There are two main ways to cajole wcout & friends to, instead of their
standard conversion behavior, do direct console i/o in Windows:

• use a Microsoft extension called `_setmode`. It's supported also by
MinGW g++, or
• replace the wide streams' text buffers with custom buffers that use
Windows' direct console i/o functions.

Here's an example of the first, easier solution:

// Source encoding: UTF-8 w/BOM.
#include <iostream>
#include <string>
#include <locale.h>

#include <io.h> // _setmode, _fileno
#include <fcntl.h> // _O_U16TEXT

using namespace std;

void init_streams()
{
// The bare minimum for this program. More generally one should
// check whether a stream is connected to the console, and if not,
// set UTF-8 mode instead of wide text mode.
_setmode( _fileno( stdin), _O_WTEXT );
_setmode( _fileno( stdout), _O_WTEXT );
}

auto main()
-> int
{
//setlocale( LC_ALL, "" );
init_streams();

auto const& s = L"Every 日本国 кошка likes Norwegian blåbærsyltetøy!";
wcout << s << endl;

wcout << endl;
wcout << L"What's your name? ";
wstring name;
getline( wcin, name );
wcout << "Pleased to meet you, " << name << "!" << endl;
}

I put the stream configuration in a function so you can more easily see
that it can be moved all the way to its own translation unit, which then
would leave the main program as 100% portable.

Well, except for the limitation of Windows console windows to the UCS-2
character set, the Basic Multilingual Plane of Unicode, which means e.g.
that some archaic Chinese ideographs just can't be handled.

I wrote about this once, some five years ago, here: <url:
https://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/>

Cheers & hth.,

- Alf

David Brown <david.brown@hesbynett.no>: Apr 06 09:53PM +0200

On 06/04/17 18:20, Alf P. Steinbach wrote:

> in Unix-land typically expects, and produces, UTF-8 encoded text. So all
> you have to do there is the `setlocale(LC_ALL, "")` code that you have.
> Then the wide streams work, in Unix-land.

In Unix land (well, my Linux system anyway), I get utf-8 encoded text
like this:

$ cat u.cpp
#include <iostream>

int main(void)
{
std::cout << "Hello, world - ÅØÆ πr² ïç a\u0333b\n";
}

$ file u.cpp
u.cpp: C source, UTF-8 Unicode text

$ g++ u.cpp -Wall -Wextra

./a.out
Hello, world - ÅØÆ πr² ïç a̳b

I just write the UTF-8 characters I want, mostly using the keyboard
directly (possibly with the compose key) rather than code points or a
character applet, and they come out fine.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 06 09:59PM +0200

On 06-Apr-17 9:53 PM, David Brown wrote:

> I just write the UTF-8 characters I want, mostly using the keyboard
> directly (possibly with the compose key) rather than code points or a
> character applet, and they come out fine.

Yes, you can do things more easily in non-portable code. ;-)

Cheers!

- Alf

David Brown <david.brown@hesbynett.no>: Apr 06 11:17PM +0200

On 06/04/17 21:59, Alf P. Steinbach wrote:
>> directly (possibly with the compose key) rather than code points or a
>> character applet, and they come out fine.

> Yes, you can do things more easily in non-portable code. ;-)

In this case, that's certainly true.

Or you can do it more easily in portable code that is not C++:

$cat u.py
#!/usr/bin/python
# -*- coding: utf-8 -*-
print(u"Hello, world - ÅØÆ πr² ïç a\u0333b")

./u.py
Hello, world - ÅØÆ πr² ïç a̳b

Works fine with Python 2 or 3. It certainly works on my Linux machine -
I expect it to work on Windows too.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 07 01:25AM +0200

On 06-Apr-17 11:17 PM, David Brown wrote:
> Hello, world - ÅØÆ πr² ïç a̳b

> Works fine with Python 2 or 3. It certainly works on my Linux machine -
> I expect it to work on Windows too.

It seems I'm consigned to the rôle of breaking reasonable expectations
about Windows.

To be fair: it's not a problem with Windows, really, it's a problem that
I call (just in my discussions with myself) "pretend software". It's
where most everybody pretend that some software is OK for its general
purpose, because it works for a number of special cases. And we've never
heard about it not working. At least, we don't remember it.

[H:\forums\clc++\unicode in windows console]
> python --version
Python 3.4.3

[H:\forums\clc++\unicode in windows console]
> python unicode.py
Traceback (most recent call last):
File "unicode.py", line 2, in <module>
print(u"Hello, world - Å\xd8Æ pr² ïç a\u0333b")
File "c:\Python34\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xd8' in
position 16: character maps to <undefined>

[H:\forums\clc++\unicode in windows console]
> _

The silly beast defaults to translating that Unicode text to the narrow
encoding that's chosen (e.g. via the `chcp` command) in the console
window, the console window's "active codepage".

Which in the example above is 437, the original IBM PC encoding, with
lots of fancy characters but not the one noted in the eror mesage.

So what to do?

Well in earlier days CPython (which I suspect this is, it's just been
installed by something else) could be tweaked to do The Right Thing™,
but that possibility was removed, probably because of the mass psychosis
effect I described above: everybody think it works, because there are
cases (e.g. pure ASCII output) where it works. Alternatively it could be
that someone didn't like that it was so easy to prove that this was a
Python problem and not a Windows problem, so, made that that much harder
to do. Anyway, with modern Python it's a PITA to fix it.

I wrote about it in 2015, here: <url:
https://alfps.wordpress.com/2015/05/12/non-crashing-python-3-x-output-in-windows/>.

Cheers!,

- Alf

My following projects were updated

aminer68@gmail.com: Apr 06 08:17AM -0700

Hello,

My following projects were updated:

My Efficient C++ Bounded Thread-Safe FIFO Queue and LIFO Stack for real-time systems was updated to version 1.05

You can download it from:

https://sites.google.com/site/aminer68/efficient-c-bounded-thread-safe-fifo-queue-and-lifo-stack-for-real-time-systems

And my C++ MemPool for real-time systems was updated to version 1.08

You can download it from:

https://sites.google.com/site/aminer68/c-mempool-for-real-time-systems

I have tested them thoroughly and they are more stable and fast now.

Thank you,
Amine Moulay Ramdane.

Real Troll <real.troll@trolls.com>: Apr 06 12:50PM -0400

> I have tested them thoroughly and they are more stable and fast now.

I think we'll just wait for your revised code as you seem to release
them within hours of first release. Frankly, what made you think that
people are interested in your project? People have given up on you
because of your attitude towards female members of this newsgroup.

Bonita Montero <Bonita.Montero@gmail.com>: Apr 06 09:47PM +0200

> You can download it from:
> https://sites.google.com/site/aminer68/c-mempool-for-real-time-systems
> I have tested them thoroughly and they are more stable and fast now.

Don't trust in someone who claims to know how to write synchronization
-facilities who doesn't even understand simple condidtion-variables.

fir <profesor.fir@gmail.com>: Apr 06 03:13PM -0700

W dniu czwartek, 6 kwietnia 2017 21:47:35 UTC+2 użytkownik Bonita Montero napisał:
> > I have tested them thoroughly and they are more stable and fast now.

> Don't trust in someone who claims to know how to write synchronization
> -facilities who doesn't even understand simple condidtion-variables.

besides if someone is really intrested in multicore
computing, he should turn to GPU general coding
1) you got much many cores there (even up to 10k parallel float channels there on the best cards)
2) heavy computational task concentrates in hot loops (thousands of parralel 'fibers' of the same computations not few of very different) and gpu model is more suitable for this

if our idiot ramine will begin to write and optimise opencl kernels we could start talking

How to find 56 potential vulnerabilities in FreeBSD code in one evening

Andrey Karpov <karpov2007@gmail.com>: Apr 06 01:45AM -0700

It's high time to recheck FreeBSD project and to show that even in such serious and qualitative projects PVS-Studio easily finds errors. This time I decided to take a look at the analysis process in terms of detecting potential vulnerabilities. PVS-Studio has always been able to identify defects that could potentially be used for a hacker attack. However, we haven't focused on this aspect of the analyzer and described the errors as typos, consequences of sloppy Copy-Paste and so on, but have never classified them according to CWE, for example. Nowadays it is very popular to speak about security and vulnerabilities that's why I will try to broaden at the perception of our analyzer. PVS-Studio helps not only to search for bugs, but it is also a tool that improves the code security.

Full article: https://www.viva64.com/en/b/0496/

Unicode test

ram@zedat.fu-berlin.de (Stefan Ram): Apr 06 01:08AM

>This is unfortunately ungood advice. The UTF-8 codepage generally
>doesn't work for console i/o, down at the API level.

If »generally doesn't« means, »not under every combination
of circumstance«, you might be right.

If »generally doesn't« means, »never«, I'd like to report
that I saw the expected characters. For example, the second
character that appeared in the console was an »ä«.

,= ./ .: / ,++#, : .///. :///, :///, ,++#, --.: -=
=% ,# =;:/ #, /$M+- /M+/== :...= :...= #, =;;; ;%
=@++%# =//H+ #, -@ H .XH//- = = = = #, ;$,,/X =%
=% ,# -H==%% #, .H;@,. ;#;=.. = = = = #, %/ -# -/
=+ ,M .X++/H: M, :H:, =X++: ++++= ++++= M, ,%++X= -;

H ä l ¢ Euro [] [] l ö !

(U+10348 lays outside of the BMP, and was not rendered correctly
here [it was rendered looking similar to »[][]«]).

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Thursday, April 6, 2017

Digest for comp.lang.c++@googlegroups.com - 14 updates in 4 topics

No comments:

Blog Archive

About Me