| "Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Jan 30 09:18AM +0100 On 29 Jan 2022 16:39, Paavo Helde wrote: >> quality library in the world. > Nowadays even Notepad has learned to cope with LF linebreaks, so there > is essentially no reason any more to use CRLF anywhere at all. Oh, just all the internet protocols, like NNTP. ;-) > point of having the file content in memory differ from the content on > disk? Especially considering memory mapping, HTTP packets, cloud storage > etc. My opinion is that C (and hence also C++) text mode is an abomination that should never have been introduced, and that in addition, given that it was introduced, it's designed in a stupid way with the data conversion applied underneath the buffer level so that one can't get a clear view of the raw data. As an example, the design means that Unix `cat` can't be faithfully implemented in Windows using only standard C or C++, which IMO is extreme. That said, from a C++ programming perspective data in memory is usually statically typed, while data on disk is untyped or effectively dynamically typed. In memory one knows that that thing is UTF-8 encoded text. On disk on doesn't know and must assume, where in Windows such assumptions can be partially checked (a good thing) via UTF-8 BOM. > take another 20 years. All the backslash madness in Windows is about to > avoid typing a single space after the command name, it's about time to > get rid of this. The Windows API level supports forward slashes, as did DOS before, and especially for `#include` directives they should be used, not backslash. Why the Windows shells and applications generally don't support them is a mystery. In some cases the fiction about what's allowed is imperfectly implemented. I remember in the 1990's (when I still worked) I had some fun demonstrating to colleagues how to completely and utterly hide some data on disk, using commands like << [C:\root\temp] > type nul >poem.txt [C:\root\temp] > echo "Very important secret!" > poem.txt:secret [C:\root\temp] > dir | find "poem" 30 Jan 2022 09:14 0 poem.txt [C:\root\temp] > find /v "" < poem.txt:secret "Very important secret!" The file doesn't need to be empty, it can e.g. contain an actual poem if one feels like security by obscurity is a great thing. This is just a bug in cmd.exe where it fails to check that the file name is "allowed" for ordinary users, so one is able to specify an internal NTFS stream. :-) - Alf |
| Marcel Mueller <news.5.maazl@spamgourmet.org>: Jan 30 12:09PM +0100 Am 29.01.22 um 16:39 schrieb Paavo Helde: > is essentially no reason any more to use CRLF anywhere at all. As it has > happened with utf-8, Microsoft will be finally enforced to give up its > stubbornness and join the sane world. I would wonder if this ever happens. But many implemetations are tolerant to different line end encodings. > point of having the file content in memory differ from the content on > disk? Especially considering memory mapping, HTTP packets, cloud storage > etc. In HTTP LF w/o CR is not officially supported. ;-) > take another 20 years. All the backslash madness in Windows is about to > avoid typing a single space after the command name, it's about time to > get rid of this. ntosknrl as well as its predecessor os2knrl can deal with '/' as path separator for a long time too. Basically the same as with line ending: they are tolerant. But the command line parsers of may programs can not handle this since the use '/' as escape character to denote an option. This makes the use of forward slash unhandy. And although UTF-8 is quite common nowadays it raises several problems in certain situations. E.g. database fields with restricted length accept different string lengths depending on the number of characters with longer UTF-8 encoding used. No user will ever understand this. In Chinese and several other "non-ASCII" languages the UTF-8 encoding is furthermore less compact that UCS2. So the encoding issues will persist too. And well the fact that on Unix-like OSes file names are just binary blobs rather than a string with a known encoding raises further problems. Marcel |
| Ben Bacarisse <ben.usenet@bsb.me.uk>: Jan 30 12:12PM > As an example, the design means that Unix `cat` can't be faithfully > implemented in Windows using only standard C or C++, which IMO is > extreme. I assume you are talking about the cases where cat defaults to reading stdin and/or writing stdout, If so, it could be argued that it's not the fault of the C and C++ standards, but more the fault of the implementations not providing a useful freopen function. But then maybe freopen simply can't be implemented in Windows for some mysterious reason I don't get. -- Ben. |
| Paavo Helde <eesnimi@osa.pri.ee>: Jan 30 02:26PM +0200 30.01.2022 13:09 Marcel Mueller kirjutas: > In HTTP LF w/o CR is not officially supported. ;-) Fixed encoding is better than random encoding. In HTTP headers one must indeed use CR LF. But the (text) files themselves are sent as HTTP bodies and there is zero reason to convert them to some other representation than they have on the server disk. > But the command line parsers of may programs can not handle this since > the use '/' as escape character to denote an option. This makes the use > of forward slash unhandy. The slash can be still used as an option, there is no need to change that. One just needs to start to demand to separate options by spaces, something what a sane person has always done anyway, so that dir C:/B would list the file B in the root folder and dir C: /B would list all files in cwd of C: in bare format. > in certain situations. E.g. database fields with restricted length > accept different string lengths depending on the number of characters > with longer UTF-8 encoding used. No user will ever understand this. In real world UCS-2 is rarely used. Windows SDK, Java et al are using UTF-16, which has the same string length problem. And UCS-4 is definitely wasting space. > In Chinese and several other "non-ASCII" languages the UTF-8 encoding is > furthermore less compact that UCS2. I have heard this claim is not supported by actual data. In real usage Chinese is most often heavily interspersed with punctuation, numbers and English keywords so that there is little or no benefit in using UCS2 over UTF-8. > And well the fact that on Unix-like OSes file names are just binary > blobs rather than a string with a known encoding raises further problems. This has actually helped to standardize them all to UTF-8 in practice. |
| "Alf P. Steinbach" <alf.p.steinbach@gmail.com>: Jan 30 02:02PM +0100 On 30 Jan 2022 13:12, Ben Bacarisse wrote: > implementations not providing a useful freopen function. > But then maybe freopen simply can't be implemented in Windows for some > mysterious reason I don't get. I guess you're talking about using standard C++ code with some system specific knowledge such as how to specify the standard input stream as a file name. To reopen the standard input so that it connects to the original source, which might a pipe or a file or the console, one needs to (1) identify that source, and (2) open that source, possibly after closing the original connection. Neither is feasible in general, even (AFAIK) in Unix environment. But one might attempt to sidestep (1) by using a filename that in the relevant OS denotes standard input. A demonstration of an approach that fails is not a proof that no solution exists, e.g. failure to open a door doesn't prove that the door is stuck. Maybe the person just failed to consider using the doorknob, dragged the door instead of pushing, failed to note that it opens sideways, didn't swipe the id card, deliberately made it look as if the door didn't open, or something. But anyway I cooked up some code: #include <stdlib.h> // EXIT_... #include <stdio.h> #include <stdexcept> // std::runtime_error, std::exception #include <string> // std::string using namespace std; #ifdef PORTABLE constexpr bool portable = true; #else constexpr bool portable = false;
Subscribe to:
Post Comments (Atom)
|
No comments:
Post a Comment