Wednesday, February 21, 2018

Digest for comp.lang.c++@googlegroups.com - 17 updates in 3 topics

"James R. Kuyper" <jameskuyper@verizon.net>: Feb 21 12:05PM -0500

I'm writing some code that reads an unspecified number of fixed-length
records from a binary file; it will know how many there are when it
reaches the end of the file. It's normal for the last read to reach EOF,
but a format error if EOF is reached in the middle of a record, and I
need to detect and report if that situation occurs.
In C, I'd use fread() and check the return value for a short read.
Looking over the C++ standard, I came to the conclusion that readsome()
is what I should use for comparable purposes. However, I got results
that I don't understand when using readsome(). I created the following
programs to investigate those results (error handling suppressed for the
sake of clarity):
 
C version:
 
#include <stdio.h>
int main(int argc, char *argv[])
{
FILE *infile = fopen(argv[1], "rb");
char buffer[256];
size_t bytes;
long records;
for(records=0; bytes = fread(buffer, 1, sizeof(buffer), infile);
records++)
{
printf("%ld: %zu\n", records, bytes);
}
if(ferror(infile))
perror(argv[1]);
if(feof(infile))
printf("EOF\n");
return 0;
}
 
C++ version:
#include <iostream>
#include <fstream>
 
int main(int argc, char *argv[])
{
std::ifstream infile(argv[1], std::ios_base::binary);
char buffer[256];
int records=0;
std::streamsize bytes;
for(records=0; bytes = infile.readsome(buffer, sizeof buffer);
records++)
{
std::cout << records << ": " << bytes << std::endl;
}
if(infile.eof())
std::cout << "EOF ";
if(infile.bad())
std::cout << "bad ";
if(infile.rdstate() & std::ios_base::failbit)
std::cout << "fail ";
std::cout << std::endl;
 
return 0;
}
 
When I run these programs using the same input file, which is 14648
bytes long, I get the following results:
 
~/testprog(99) gcc -std=c11 -pedantic -Wall -Wpointer-arith -Wcast-align
-Wstrict-prototypes -Wmissing-prototypes read_test_c.c -o read_test_c
~/testprog(100) ./read_test_c read_test
0: 256
1: 256
2: 256
3: 256
4: 256
5: 256
6: 256
7: 256
8: 256
9: 256
10: 256
11: 256
12: 256
13: 256
14: 256
15: 256
16: 256
17: 256
18: 256
19: 256
20: 256
21: 256
22: 256
23: 256
24: 256
25: 256
26: 256
27: 256
28: 256
29: 256
30: 256
31: 256
32: 256
33: 256
34: 256
35: 256
36: 256
37: 256
38: 256
39: 256
40: 256
41: 256
42: 256
43: 256
44: 256
45: 256
46: 256
47: 256
48: 256
49: 256
50: 256
51: 256
52: 256
53: 256
54: 256
55: 256
56: 256
57: 56
EOF
 
~/testprog(102) g++ -std=c++1y -pedantic -Wall -Wpointer-arith
-Wcast-align -ffor-scope -fno-gnu-keywords -fno-nonansi-builtins
-Wctor-dtor-privacy -Wnon-virtual-dtor -Wold-style-cast
-Woverloaded-virtual -Wsign-promo read_test_c++.cpp -o read_test_c++
~/testprog(103) ./read_test_c++ read_test
0: 256
1: 256
2: 256
3: 256
4: 256
5: 256
6: 256
7: 256
8: 256
9: 256
10: 256
11: 256
12: 256
13: 256
14: 256
15: 256
16: 256
17: 256
18: 256
19: 256
20: 256
21: 256
22: 256
23: 256
24: 256
25: 256
26: 256
27: 256
28: 256
29: 256
30: 256
31: 255
32: 256
33: 256
34: 256
35: 256
36: 256
37: 256
38: 256
39: 256
40: 256
41: 256
42: 256
43: 256
44: 256
45: 256
46: 256
47: 256
48: 256
49: 256
50: 256
51: 256
52: 256
53: 256
54: 256
55: 256
56: 256
57: 57
 
Could someone explain to me why the C++ version apparently read one more
byte than the C version, which is also one more byte than the file size?
Also, why infile.eof() was false at the end?
Barry Schwarz <schwarzb@dqel.com>: Feb 21 09:44AM -0800

On Wed, 21 Feb 2018 12:05:57 -0500, "James R. Kuyper"
>that I don't understand when using readsome(). I created the following
>programs to investigate those results (error handling suppressed for the
>sake of clarity):
<snip code>
 
>-Wstrict-prototypes -Wmissing-prototypes read_test_c.c -o read_test_c
>~/testprog(100) ./read_test_c read_test
>0: 256
<snip>
>-Woverloaded-virtual -Wsign-promo read_test_c++.cpp -o read_test_c++
>~/testprog(103) ./read_test_c++ read_test
>0: 256
<snip>
 
>Could someone explain to me why the C++ version apparently read one more
>byte than the C version, which is also one more byte than the file size?
>Also, why infile.eof() was false at the end?
 
The description for readsome at cplusplus.com says that it will stop
reading when there is no data in the stream buffer, even if end of
file has not been reached. That may answer you second question.
 
What are the last few characters in the file? What are the last few
characters placed in your array when the short record is read?
 
--
Remove del for email
"James R. Kuyper" <jameskuyper@verizon.net>: Feb 21 01:50PM -0500

On 02/21/2018 12:44 PM, Barry Schwarz wrote:
 
> The description for readsome at cplusplus.com says that it will stop
> reading when there is no data in the stream buffer, even if end of
> file has not been reached. That may answer you second question.
 
I read something like that too, but it made readsome() seem rather
useless, unless you're getting excessively intimate with the details of
how the stream buffer gets filled, so I assumed that it was either a
mistake on their part, or a misunderstanding of what they were saying on
my part.
 
Still, that's probably the explanation for what I found (see below).
Reviewing the description of readsome() with that in mind, it means that
in_avail() has a different meaning than I thought it did.
 
> What are the last few characters in the file? What are the last few
> characters placed in your array when the short record is read?
 
The file I was using for the test got erased. I'm using a different
file, of length 14704. I modified the code to print out the last 5
characters read in whenever it was less than the buffer size. As a
result, I discovered the reason for the results I saw:
 
...
31: 255
buffer[250]=0
buffer[251]=0
buffer[252]=0
buffer[253]=0
buffer[254]=0
...
57: 113
buffer[108]=0
buffer[109]=0
buffer[110]=0
buffer[111]=0
buffer[112]=0
 
I'd been assuming that all of the calls to readsome() returned 256
characters except the last one, and used that "fact" to calculate the
total number of bytes read, rather than having my program calculate the
actual total. A review of the actual outputs seemed to confirm that
assumption, which implies that I must have misread 255 as 256.
 
Here's the last 2 lines of the output from od on the input file:
0034540 000001 000000 000000 000000 000000 000000 000000 000000
0034560
 
The problem is resolved, and readsome() is NOT a suitable replacement
for fread() for this purpose. I replaced
 
infile.readsome(buffer, sizeof buffer)
 
with
 
infile.read(buffer, sizeof buffer).gcount()
 
and that seemed to work exactly as I need it work. Is there any danger
of read() stopping at the end of the stream buffer, the same way as
readsome()?
Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 21 07:08PM

On Wed, 2018-02-21, James R. Kuyper wrote:
> In C, I'd use fread() and check the return value for a short read.
> Looking over the C++ standard, I came to the conclusion that readsome()
> is what I should use for comparable purposes.
 
Why not istream::read(buf, count)? That's the closest equivalent to
fread() for iostreams.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
"James R. Kuyper" <jameskuyper@verizon.net>: Feb 21 02:37PM -0500

On 02/21/2018 02:08 PM, Jorgen Grahn wrote:
>> is what I should use for comparable purposes.
 
> Why not istream::read(buf, count)? That's the closest equivalent to
> fread() for iostreams.
 
1. It doesn't return the count of characters read, though adding
".gcount() at the end resolves that problem.
 
2. 30.7.4.3p30 says
"Characters are extracted and stored until either of the following occurs:
(30.1) — n characters are stored;
(30.2) — end-of-file occurs on the input sequence (in which case the
function calls setstate(failbit | eofbit), which may throw
ios_base::failure."
 
I'm relying on reaching end-of-file in order to know when I've read all
the records, so having read() throw an exception when that happens would
be inconvenient. However, my testing shows that no exception is thrown,
so I may be misunderstanding something.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 07:46PM

On Wed, 21 Feb 2018 13:50:54 -0500
 
> and that seemed to work exactly as I need it work. Is there any
> danger of read() stopping at the end of the stream buffer, the same
> way as readsome()?
 
No. In the absence of an error, it is guaranteed to provide the number
of characters requested (in your case, the buffer size) unless eof is
encountered, which will set failbit/eofbit and gcount() will indicate
the number of bytes (on a narrow stream) actually received.
 
If you are reading binary records you might consider doing without
ifstream entirely and reading directly from a filebuf using
std::basic_streambuf::sgetn(), and carrying on until traits_type::eof()
(-1 on narrow stream) is returned.
 
Chris
scott@slp53.sl.home (Scott Lurndal): Feb 21 07:49PM


>and that seemed to work exactly as I need it work. Is there any danger
>of read() stopping at the end of the stream buffer, the same way as
>readsome()?
 
I must ask - if the fread code worked, why change it? Valid C is generally
also valid C++.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 07:51PM

On Wed, 21 Feb 2018 14:37:14 -0500
> all the records, so having read() throw an exception when that
> happens would be inconvenient. However, my testing shows that no
> exception is thrown, so I may be misunderstanding something.
 
It will not throw on failbit or eofbit unless you explicitly set the
default exception mask to do so using the std::basic_ios::exceptions()
function.
 
As I have mentioned in another post, in your use I should consider
avoiding all this by using filebuf directly and not instantiate an
ifstream object.
 
Chris
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 08:12PM

On Wed, 21 Feb 2018 19:46:50 +0000
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
[snip]
> ifstream entirely and reading directly from a filebuf using
> std::basic_streambuf::sgetn(), and carrying on until
> traits_type::eof() (-1 on narrow stream) is returned.
 
To correct that, it is uflow() which returns traits_type::eof() on end
of file. sgetn() just returns less than the number of characters
requested.
"James R. Kuyper" <jameskuyper@verizon.net>: Feb 21 03:29PM -0500

On 02/21/2018 02:49 PM, Scott Lurndal wrote:
>> readsome()?
 
> I must ask - if the fread code worked, why change it? Valid C is generally
> also valid C++.
 
I mentioned fread() code because I'm a lot more familiar with C than
C++, and that's how I would have written this code if I were using C.
The actual code was written by someone else in C++, and interacts with a
variety of libraries written in C++. It has a bug, and I need to fix the
bug, and the minimum fix is sufficiently complicated to justify a
significant re-write. That's good, because the existing code is also
very clumsy - it looks like C code that has been translated into C++
code by someone significantly less familiar with C++ than I am.
"Öö Tiib" <ootiib@hot.ee>: Feb 21 01:15PM -0800

On Wednesday, 21 February 2018 21:49:38 UTC+2, Scott Lurndal wrote:
> >readsome()?
 
> I must ask - if the fread code worked, why change it? Valid C is generally
> also valid C++.
 
I remember default buffers of MSVC were set unfavorably for fread
about 10 years ago so fread performed about twice worse than
ifstream when reading 64KB chunks from large file. Consumer
file i/o was about 15 times slower than now then, so it sometimes
mattered. Setting optimal buffer (with setvbuf() or
streambuf::pubsetbuf() ) solved it but surprisingly few
programmers were aware of those features.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 10:06PM

On Wed, 21 Feb 2018 13:15:50 -0800 (PST)
> mattered. Setting optimal buffer (with setvbuf() or
> streambuf::pubsetbuf() ) solved it but surprisingly few
> programmers were aware of those features.
 
std::streambuf::xsgetn(), and so std::ifstream::read() and
std::filebuf::sgetn(), are allowed by the C++ standard on a large block
read (in effect, when the buffer size passed in is larger than the
streambuffer's own buffer size) to bypass the streambuffer's buffer
entirely. std::ifstream::read() and std::filebuf::sgetn() are then
passed on directly to unix read() or the windows equivalent.
 
I wonder if that was the reason for the difference with fread(). I am
not sure that fread() is entitled to do the same; or if it is, whether
that is ever done in fact.
scott@slp53.sl.home (Scott Lurndal): Feb 21 10:13PM


>I wonder if that was the reason for the difference with fread(). I am
>not sure that fread() is entitled to do the same; or if it is, whether
>that is ever done in fact.
 
fread will return the number of bytes requested, unless EOF occurs. Regardless
of the size of the stdio buffer.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 10:38PM

On Wed, 21 Feb 2018 22:13:19 GMT
> >whether that is ever done in fact.
 
> fread will return the number of bytes requested, unless EOF occurs.
> Regardless of the size of the stdio buffer.
 
As will std::filebuf::sgetn(). The issue is whether the internal
buffers are short-cicuited or not. On an optimized block read by
std::filebuf::sgetn(), anything in the buffers will first be extracted
and then a call to unix read() will be made directly into the buffer
passed in to std::filebuf::sgetn() (rather than into the
streambuffer's internal buffer).
 
I was hypothesizing that the poorer performance of fread() on large
block transfers which was reported may be caused by the fact that it
either cannot, or does not, short-circuit in this way.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 10:59PM

On Wed, 21 Feb 2018 22:38:33 +0000
 
> I was hypothesizing that the poorer performance of fread() on large
> block transfers which was reported may be caused by the fact that it
> either cannot, or does not, short-circuit in this way.
 
So far as I understand the C standard, it looks as if fread() cannot
make this optimization. The C standard says about fread(): "For each
object, 'size' calls are made to the fgetc function and the results
stored, in the order read, in an array of unsigned char exactly
overlaying the object."
 
fgetc will always go via the internal stream buffer, for efficiency
reasons.
mcheung63@gmail.com: Feb 21 02:26PM -0800

Rick C. Hodgin於 2018年2月19日星期一 UTC+8上午7時53分32秒寫道:
 
> --
 
> Thank you,
> Rick C. Hodgin
 
fuck off asshole, go fuck yourself
mcheung63@gmail.com: Feb 21 02:26PM -0800

Rick C. Hodgin於 2018年2月19日星期一 UTC+8上午4時38分46秒寫道:
 
> Whose are you?
 
> --
> Rick C. Hodgin
 
fuck off asshole, go fuck yourself
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: