- readsome() vs. fread() - 15 Updates
- Moons - 1 Update
- [Jesus Loves You] A new age is dawning - 1 Update
"James R. Kuyper" <jameskuyper@verizon.net>: Feb 21 12:05PM -0500 I'm writing some code that reads an unspecified number of fixed-length records from a binary file; it will know how many there are when it reaches the end of the file. It's normal for the last read to reach EOF, but a format error if EOF is reached in the middle of a record, and I need to detect and report if that situation occurs. In C, I'd use fread() and check the return value for a short read. Looking over the C++ standard, I came to the conclusion that readsome() is what I should use for comparable purposes. However, I got results that I don't understand when using readsome(). I created the following programs to investigate those results (error handling suppressed for the sake of clarity): C version: #include <stdio.h> int main(int argc, char *argv[]) { FILE *infile = fopen(argv[1], "rb"); char buffer[256]; size_t bytes; long records; for(records=0; bytes = fread(buffer, 1, sizeof(buffer), infile); records++) { printf("%ld: %zu\n", records, bytes); } if(ferror(infile)) perror(argv[1]); if(feof(infile)) printf("EOF\n"); return 0; } C++ version: #include <iostream> #include <fstream> int main(int argc, char *argv[]) { std::ifstream infile(argv[1], std::ios_base::binary); char buffer[256]; int records=0; std::streamsize bytes; for(records=0; bytes = infile.readsome(buffer, sizeof buffer); records++) { std::cout << records << ": " << bytes << std::endl; } if(infile.eof()) std::cout << "EOF "; if(infile.bad()) std::cout << "bad "; if(infile.rdstate() & std::ios_base::failbit) std::cout << "fail "; std::cout << std::endl; return 0; } When I run these programs using the same input file, which is 14648 bytes long, I get the following results: ~/testprog(99) gcc -std=c11 -pedantic -Wall -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes read_test_c.c -o read_test_c ~/testprog(100) ./read_test_c read_test 0: 256 1: 256 2: 256 3: 256 4: 256 5: 256 6: 256 7: 256 8: 256 9: 256 10: 256 11: 256 12: 256 13: 256 14: 256 15: 256 16: 256 17: 256 18: 256 19: 256 20: 256 21: 256 22: 256 23: 256 24: 256 25: 256 26: 256 27: 256 28: 256 29: 256 30: 256 31: 256 32: 256 33: 256 34: 256 35: 256 36: 256 37: 256 38: 256 39: 256 40: 256 41: 256 42: 256 43: 256 44: 256 45: 256 46: 256 47: 256 48: 256 49: 256 50: 256 51: 256 52: 256 53: 256 54: 256 55: 256 56: 256 57: 56 EOF ~/testprog(102) g++ -std=c++1y -pedantic -Wall -Wpointer-arith -Wcast-align -ffor-scope -fno-gnu-keywords -fno-nonansi-builtins -Wctor-dtor-privacy -Wnon-virtual-dtor -Wold-style-cast -Woverloaded-virtual -Wsign-promo read_test_c++.cpp -o read_test_c++ ~/testprog(103) ./read_test_c++ read_test 0: 256 1: 256 2: 256 3: 256 4: 256 5: 256 6: 256 7: 256 8: 256 9: 256 10: 256 11: 256 12: 256 13: 256 14: 256 15: 256 16: 256 17: 256 18: 256 19: 256 20: 256 21: 256 22: 256 23: 256 24: 256 25: 256 26: 256 27: 256 28: 256 29: 256 30: 256 31: 255 32: 256 33: 256 34: 256 35: 256 36: 256 37: 256 38: 256 39: 256 40: 256 41: 256 42: 256 43: 256 44: 256 45: 256 46: 256 47: 256 48: 256 49: 256 50: 256 51: 256 52: 256 53: 256 54: 256 55: 256 56: 256 57: 57 Could someone explain to me why the C++ version apparently read one more byte than the C version, which is also one more byte than the file size? Also, why infile.eof() was false at the end? |
Barry Schwarz <schwarzb@dqel.com>: Feb 21 09:44AM -0800 On Wed, 21 Feb 2018 12:05:57 -0500, "James R. Kuyper" >that I don't understand when using readsome(). I created the following >programs to investigate those results (error handling suppressed for the >sake of clarity): <snip code> >-Wstrict-prototypes -Wmissing-prototypes read_test_c.c -o read_test_c >~/testprog(100) ./read_test_c read_test >0: 256 <snip> >-Woverloaded-virtual -Wsign-promo read_test_c++.cpp -o read_test_c++ >~/testprog(103) ./read_test_c++ read_test >0: 256 <snip> >Could someone explain to me why the C++ version apparently read one more >byte than the C version, which is also one more byte than the file size? >Also, why infile.eof() was false at the end? The description for readsome at cplusplus.com says that it will stop reading when there is no data in the stream buffer, even if end of file has not been reached. That may answer you second question. What are the last few characters in the file? What are the last few characters placed in your array when the short record is read? -- Remove del for email |
"James R. Kuyper" <jameskuyper@verizon.net>: Feb 21 01:50PM -0500 On 02/21/2018 12:44 PM, Barry Schwarz wrote: > The description for readsome at cplusplus.com says that it will stop > reading when there is no data in the stream buffer, even if end of > file has not been reached. That may answer you second question. I read something like that too, but it made readsome() seem rather useless, unless you're getting excessively intimate with the details of how the stream buffer gets filled, so I assumed that it was either a mistake on their part, or a misunderstanding of what they were saying on my part. Still, that's probably the explanation for what I found (see below). Reviewing the description of readsome() with that in mind, it means that in_avail() has a different meaning than I thought it did. > What are the last few characters in the file? What are the last few > characters placed in your array when the short record is read? The file I was using for the test got erased. I'm using a different file, of length 14704. I modified the code to print out the last 5 characters read in whenever it was less than the buffer size. As a result, I discovered the reason for the results I saw: ... 31: 255 buffer[250]=0 buffer[251]=0 buffer[252]=0 buffer[253]=0 buffer[254]=0 ... 57: 113 buffer[108]=0 buffer[109]=0 buffer[110]=0 buffer[111]=0 buffer[112]=0 I'd been assuming that all of the calls to readsome() returned 256 characters except the last one, and used that "fact" to calculate the total number of bytes read, rather than having my program calculate the actual total. A review of the actual outputs seemed to confirm that assumption, which implies that I must have misread 255 as 256. Here's the last 2 lines of the output from od on the input file: 0034540 000001 000000 000000 000000 000000 000000 000000 000000 0034560 The problem is resolved, and readsome() is NOT a suitable replacement for fread() for this purpose. I replaced infile.readsome(buffer, sizeof buffer) with infile.read(buffer, sizeof buffer).gcount() and that seemed to work exactly as I need it work. Is there any danger of read() stopping at the end of the stream buffer, the same way as readsome()? |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 21 07:08PM On Wed, 2018-02-21, James R. Kuyper wrote: > In C, I'd use fread() and check the return value for a short read. > Looking over the C++ standard, I came to the conclusion that readsome() > is what I should use for comparable purposes. Why not istream::read(buf, count)? That's the closest equivalent to fread() for iostreams. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
"James R. Kuyper" <jameskuyper@verizon.net>: Feb 21 02:37PM -0500 On 02/21/2018 02:08 PM, Jorgen Grahn wrote: >> is what I should use for comparable purposes. > Why not istream::read(buf, count)? That's the closest equivalent to > fread() for iostreams. 1. It doesn't return the count of characters read, though adding ".gcount() at the end resolves that problem. 2. 30.7.4.3p30 says "Characters are extracted and stored until either of the following occurs: (30.1) — n characters are stored; (30.2) — end-of-file occurs on the input sequence (in which case the function calls setstate(failbit | eofbit), which may throw ios_base::failure." I'm relying on reaching end-of-file in order to know when I've read all the records, so having read() throw an exception when that happens would be inconvenient. However, my testing shows that no exception is thrown, so I may be misunderstanding something. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 07:46PM On Wed, 21 Feb 2018 13:50:54 -0500 > and that seemed to work exactly as I need it work. Is there any > danger of read() stopping at the end of the stream buffer, the same > way as readsome()? No. In the absence of an error, it is guaranteed to provide the number of characters requested (in your case, the buffer size) unless eof is encountered, which will set failbit/eofbit and gcount() will indicate the number of bytes (on a narrow stream) actually received. If you are reading binary records you might consider doing without ifstream entirely and reading directly from a filebuf using std::basic_streambuf::sgetn(), and carrying on until traits_type::eof() (-1 on narrow stream) is returned. Chris |
scott@slp53.sl.home (Scott Lurndal): Feb 21 07:49PM >and that seemed to work exactly as I need it work. Is there any danger >of read() stopping at the end of the stream buffer, the same way as >readsome()? I must ask - if the fread code worked, why change it? Valid C is generally also valid C++. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 07:51PM On Wed, 21 Feb 2018 14:37:14 -0500 > all the records, so having read() throw an exception when that > happens would be inconvenient. However, my testing shows that no > exception is thrown, so I may be misunderstanding something. It will not throw on failbit or eofbit unless you explicitly set the default exception mask to do so using the std::basic_ios::exceptions() function. As I have mentioned in another post, in your use I should consider avoiding all this by using filebuf directly and not instantiate an ifstream object. Chris |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 08:12PM On Wed, 21 Feb 2018 19:46:50 +0000 Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote: [snip] > ifstream entirely and reading directly from a filebuf using > std::basic_streambuf::sgetn(), and carrying on until > traits_type::eof() (-1 on narrow stream) is returned. To correct that, it is uflow() which returns traits_type::eof() on end of file. sgetn() just returns less than the number of characters requested. |
"James R. Kuyper" <jameskuyper@verizon.net>: Feb 21 03:29PM -0500 On 02/21/2018 02:49 PM, Scott Lurndal wrote: >> readsome()? > I must ask - if the fread code worked, why change it? Valid C is generally > also valid C++. I mentioned fread() code because I'm a lot more familiar with C than C++, and that's how I would have written this code if I were using C. The actual code was written by someone else in C++, and interacts with a variety of libraries written in C++. It has a bug, and I need to fix the bug, and the minimum fix is sufficiently complicated to justify a significant re-write. That's good, because the existing code is also very clumsy - it looks like C code that has been translated into C++ code by someone significantly less familiar with C++ than I am. |
"Öö Tiib" <ootiib@hot.ee>: Feb 21 01:15PM -0800 On Wednesday, 21 February 2018 21:49:38 UTC+2, Scott Lurndal wrote: > >readsome()? > I must ask - if the fread code worked, why change it? Valid C is generally > also valid C++. I remember default buffers of MSVC were set unfavorably for fread about 10 years ago so fread performed about twice worse than ifstream when reading 64KB chunks from large file. Consumer file i/o was about 15 times slower than now then, so it sometimes mattered. Setting optimal buffer (with setvbuf() or streambuf::pubsetbuf() ) solved it but surprisingly few programmers were aware of those features. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 10:06PM On Wed, 21 Feb 2018 13:15:50 -0800 (PST) > mattered. Setting optimal buffer (with setvbuf() or > streambuf::pubsetbuf() ) solved it but surprisingly few > programmers were aware of those features. std::streambuf::xsgetn(), and so std::ifstream::read() and std::filebuf::sgetn(), are allowed by the C++ standard on a large block read (in effect, when the buffer size passed in is larger than the streambuffer's own buffer size) to bypass the streambuffer's buffer entirely. std::ifstream::read() and std::filebuf::sgetn() are then passed on directly to unix read() or the windows equivalent. I wonder if that was the reason for the difference with fread(). I am not sure that fread() is entitled to do the same; or if it is, whether that is ever done in fact. |
scott@slp53.sl.home (Scott Lurndal): Feb 21 10:13PM >I wonder if that was the reason for the difference with fread(). I am >not sure that fread() is entitled to do the same; or if it is, whether >that is ever done in fact. fread will return the number of bytes requested, unless EOF occurs. Regardless of the size of the stdio buffer. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 10:38PM On Wed, 21 Feb 2018 22:13:19 GMT > >whether that is ever done in fact. > fread will return the number of bytes requested, unless EOF occurs. > Regardless of the size of the stdio buffer. As will std::filebuf::sgetn(). The issue is whether the internal buffers are short-cicuited or not. On an optimized block read by std::filebuf::sgetn(), anything in the buffers will first be extracted and then a call to unix read() will be made directly into the buffer passed in to std::filebuf::sgetn() (rather than into the streambuffer's internal buffer). I was hypothesizing that the poorer performance of fread() on large block transfers which was reported may be caused by the fact that it either cannot, or does not, short-circuit in this way. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Feb 21 10:59PM On Wed, 21 Feb 2018 22:38:33 +0000 > I was hypothesizing that the poorer performance of fread() on large > block transfers which was reported may be caused by the fact that it > either cannot, or does not, short-circuit in this way. So far as I understand the C standard, it looks as if fread() cannot make this optimization. The C standard says about fread(): "For each object, 'size' calls are made to the fgetc function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object." fgetc will always go via the internal stream buffer, for efficiency reasons. |
mcheung63@gmail.com: Feb 21 02:26PM -0800 Rick C. Hodgin於 2018年2月19日星期一 UTC+8上午7時53分32秒寫道: > -- > Thank you, > Rick C. Hodgin fuck off asshole, go fuck yourself |
mcheung63@gmail.com: Feb 21 02:26PM -0800 Rick C. Hodgin於 2018年2月19日星期一 UTC+8上午4時38分46秒寫道: > Whose are you? > -- > Rick C. Hodgin fuck off asshole, go fuck yourself |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment