Sunday, January 30, 2022

Digest for comp.lang.c++@googlegroups.com - 14 updates in 2 topics

"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Jan 30 09:18AM +0100

On 29 Jan 2022 16:39, Paavo Helde wrote:
>> quality library in the world.
 
> Nowadays even Notepad has learned to cope with LF linebreaks, so there
> is essentially no reason any more to use CRLF anywhere at all.
 
Oh, just all the internet protocols, like NNTP. ;-)
 
 
> point of having the file content in memory differ from the content on
> disk? Especially considering memory mapping, HTTP packets, cloud storage
> etc.
 
My opinion is that C (and hence also C++) text mode is an abomination
that should never have been introduced, and that in addition, given that
it was introduced, it's designed in a stupid way with the data
conversion applied underneath the buffer level so that one can't get a
clear view of the raw data.
 
As an example, the design means that Unix `cat` can't be faithfully
implemented in Windows using only standard C or C++, which IMO is extreme.
 
That said, from a C++ programming perspective data in memory is usually
statically typed, while data on disk is untyped or effectively
dynamically typed. In memory one knows that that thing is UTF-8 encoded
text. On disk on doesn't know and must assume, where in Windows such
assumptions can be partially checked (a good thing) via UTF-8 BOM.
 
 
> take another 20 years. All the backslash madness in Windows is about to
> avoid typing a single space after the command name, it's about time to
> get rid of this.
 
The Windows API level supports forward slashes, as did DOS before, and
especially for `#include` directives they should be used, not backslash.
 
Why the Windows shells and applications generally don't support them is
a mystery.
 
In some cases the fiction about what's allowed is imperfectly
implemented. I remember in the 1990's (when I still worked) I had some
fun demonstrating to colleagues how to completely and utterly hide some
data on disk, using commands like
 
 
<<
[C:\root\temp]
> type nul >poem.txt
 
[C:\root\temp]
> echo "Very important secret!" > poem.txt:secret
 
[C:\root\temp]
> dir | find "poem"
30 Jan 2022 09:14 0 poem.txt
 
[C:\root\temp]
> find /v "" < poem.txt:secret
"Very important secret!"
 
The file doesn't need to be empty, it can e.g. contain an actual poem if
one feels like security by obscurity is a great thing.
 
This is just a bug in cmd.exe where it fails to check that the file name
is "allowed" for ordinary users, so one is able to specify an internal
NTFS stream. :-)
 
 
- Alf
Marcel Mueller <news.5.maazl@spamgourmet.org>: Jan 30 12:09PM +0100

Am 29.01.22 um 16:39 schrieb Paavo Helde:
> is essentially no reason any more to use CRLF anywhere at all. As it has
> happened with utf-8, Microsoft will be finally enforced to give up its
> stubbornness and join the sane world.
 
I would wonder if this ever happens.
But many implemetations are tolerant to different line end encodings.
 
> point of having the file content in memory differ from the content on
> disk? Especially considering memory mapping, HTTP packets, cloud storage
> etc.
 
In HTTP LF w/o CR is not officially supported. ;-)
 
> take another 20 years. All the backslash madness in Windows is about to
> avoid typing a single space after the command name, it's about time to
> get rid of this.
 
ntosknrl as well as its predecessor os2knrl can deal with '/' as path
separator for a long time too. Basically the same as with line ending:
they are tolerant.
But the command line parsers of may programs can not handle this since
the use '/' as escape character to denote an option. This makes the use
of forward slash unhandy.
 
And although UTF-8 is quite common nowadays it raises several problems
in certain situations. E.g. database fields with restricted length
accept different string lengths depending on the number of characters
with longer UTF-8 encoding used. No user will ever understand this.
In Chinese and several other "non-ASCII" languages the UTF-8 encoding is
furthermore less compact that UCS2. So the encoding issues will persist too.
And well the fact that on Unix-like OSes file names are just binary
blobs rather than a string with a known encoding raises further problems.
 
 
Marcel
Ben Bacarisse <ben.usenet@bsb.me.uk>: Jan 30 12:12PM


> As an example, the design means that Unix `cat` can't be faithfully
> implemented in Windows using only standard C or C++, which IMO is
> extreme.
 
I assume you are talking about the cases where cat defaults to reading
stdin and/or writing stdout, If so, it could be argued that it's not the
fault of the C and C++ standards, but more the fault of the
implementations not providing a useful freopen function.
 
But then maybe freopen simply can't be implemented in Windows for some
mysterious reason I don't get.
 
--
Ben.
Paavo Helde <eesnimi@osa.pri.ee>: Jan 30 02:26PM +0200

30.01.2022 13:09 Marcel Mueller kirjutas:
 
> In HTTP LF w/o CR is not officially supported. ;-)
 
Fixed encoding is better than random encoding. In HTTP headers one must
indeed use CR LF. But the (text) files themselves are sent as HTTP
bodies and there is zero reason to convert them to some other
representation than they have on the server disk.
 
> But the command line parsers of may programs can not handle this since
> the use '/' as escape character to denote an option. This makes the use
> of forward slash unhandy.
 
The slash can be still used as an option, there is no need to change
that. One just needs to start to demand to separate options by spaces,
something what a sane person has always done anyway, so that
 
dir C:/B
 
would list the file B in the root folder and
 
dir C: /B
 
would list all files in cwd of C: in bare format.
 
> in certain situations. E.g. database fields with restricted length
> accept different string lengths depending on the number of characters
> with longer UTF-8 encoding used. No user will ever understand this.
 
In real world UCS-2 is rarely used. Windows SDK, Java et al are using
UTF-16, which has the same string length problem. And UCS-4 is
definitely wasting space.
 
> In Chinese and several other "non-ASCII" languages the UTF-8 encoding is
> furthermore less compact that UCS2.
 
I have heard this claim is not supported by actual data. In real usage
Chinese is most often heavily interspersed with punctuation, numbers and
English keywords so that there is little or no benefit in using UCS2
over UTF-8.
 
> And well the fact that on Unix-like OSes file names are just binary
> blobs rather than a string with a known encoding raises further problems.
 
This has actually helped to standardize them all to UTF-8 in practice.
"Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Jan 30 02:02PM +0100

On 30 Jan 2022 13:12, Ben Bacarisse wrote:
> implementations not providing a useful freopen function.
 
> But then maybe freopen simply can't be implemented in Windows for some
> mysterious reason I don't get.
 
I guess you're talking about using standard C++ code with some system
specific knowledge such as how to specify the standard input stream as a
file name.
 
To reopen the standard input so that it connects to the original source,
which might a pipe or a file or the console, one needs to (1) identify
that source, and (2) open that source, possibly after closing the
original connection. Neither is feasible in general, even (AFAIK) in
Unix environment. But one might attempt to sidestep (1) by using a
filename that in the relevant OS denotes standard input.
 
A demonstration of an approach that fails is not a proof that no
solution exists, e.g. failure to open a door doesn't prove that the door
is stuck. Maybe the person just failed to consider using the doorknob,
dragged the door instead of pushing, failed to note that it opens
sideways, didn't swipe the id card, deliberately made it look as if the
door didn't open, or something. But anyway I cooked up some code:
 
 
#include <stdlib.h> // EXIT_...
#include <stdio.h>
 
#include <stdexcept> // std::runtime_error, std::exception
#include <string> // std::string
using namespace std;
 
#ifdef PORTABLE
constexpr bool portable = true;
#else
constexpr bool portable = false;

No comments: