Thursday, March 28, 2019

Digest for comp.lang.c++@googlegroups.com - 19 updates in 3 topics

Juha Nieminen <nospam@thanks.invalid>: Mar 28 09:01AM

> If file reading is a performance bottleneck then one should use mmap
> instead.
 
In which version of the C++ standard was mmap introduced?
 
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
Juha Nieminen <nospam@thanks.invalid>: Mar 28 09:03AM

> And this waste is completely insignificant in this case because file
> access takes orders of magnitudes more time.
 
Why should I be paying that extra time, no matter how "insignificant"
it may be? I thought the design principle of C++ is that you don't have
to pay for what you don't use. In this case I'm not using, at all, the
fact that std::vector zero-initializes its contents when you resize it,
yet I'm still forced to pay for it.
 
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
"Öö Tiib" <ootiib@hot.ee>: Mar 28 02:26AM -0700

On Thursday, 28 March 2019 11:01:16 UTC+2, Juha Nieminen wrote:
> > If file reading is a performance bottleneck then one should use mmap
> > instead.
 
> In which version of the C++ standard was mmap introduced?
 
It will never be. The mmap is part of other standard (POSIX) and C++
will never limit itself only to systems that comply with POSIX.
Even in POSIX the mmap is present only when platform kernel has
virtual memory system. Embedded Linuxes without virtual memory
system don't have mmap. So mmap will forever be a performance
optimization or inter-process communication measure for larger
POSIX-compliant systems.
"Öö Tiib" <ootiib@hot.ee>: Mar 28 04:00AM -0700

On Thursday, 28 March 2019 11:03:22 UTC+2, Juha Nieminen wrote:
> to pay for what you don't use. In this case I'm not using, at all, the
> fact that std::vector zero-initializes its contents when you resize it,
> yet I'm still forced to pay for it.
 
Because it is designed to provide certain guarantees that are important
for certain usages. One tool can not be exactly optimal for every usage.
It has been already said that we don't have to use std::vector where the
difference from optimal actually matters. We can us std::unique_ptr<T[]>
or std::array<T,N> for potentially uninitialized buffers if such are needed.
Paavo Helde <myfirstname@osa.pri.ee>: Mar 28 01:02PM +0200

On 28.03.2019 11:01, Juha Nieminen wrote:
>> If file reading is a performance bottleneck then one should use mmap
>> instead.
 
> In which version of the C++ standard was mmap introduced?
 
That's what I said. You care about other things more than about the
performance. The other things appear to be standard conformance and
convenience (reading the whole file in in one go).
 
There is nothing wrong about these preferences, that's a perfectly fine
approach, but then it sounds a bit silly to complain about the time
wasted on std::vector initialization.
 
I just made a little performance test, reading a 2.3 GB file and summing
all its bytes. The results are here:
 
large vector: 1.55176 s
large new[] : 1.40286 s, 9.59564 % win
small vector: 0.768879 s, 50.4511 % win
small new[] : 0.759881 s, 51.031 % win
mmap : 0.46249 s, 70.1958 % win
 
Here, large means the whole file read into a single buffer, and small
means a 16k buffer.
 
IIRC your approach was "large vector" (read the whole file into a
std::vector). So, using an uninitialized buffer with new[] would win ca
10% in this task (that's much more than I expected, must be because the
file is already in OS caches). That's the overhead you complained about.
 
However, by using a smaller buffer and thus reducing stress on memory
allocator you can win 50% instead, fully standard-conformant.
 
And finally, if you care about performance more than having pure
standard-conforming code, then you can use memory mapping and win 72%.
 
 
Code follows (Windows-only, no error checks, sorry):
 
#include <iostream>
#include <numeric>
#include <string>
#include <functional>
#include <chrono>
#include <algorithm>
#include <io.h>
#include <Windows.h>
 
int main() {
 
std::string filename = "D:/test/columbus/Case 00647038.zip";
unsigned int x1, x2, x3, x4, x5;
 
// put mmap first to warm caches up and still win
auto start3 = std::chrono::steady_clock::now();
{
HANDLE h = ::CreateFileA(filename.c_str(), GENERIC_READ,
FILE_SHARE_READ|FILE_SHARE_WRITE, NULL, OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL, NULL);
LARGE_INTEGER li;
::GetFileSizeEx(h, &li);
size_t n = li.QuadPart;
HANDLE m = ::CreateFileMapping(h, NULL, PAGE_READONLY, 0, 0, NULL);
unsigned char* view = static_cast<unsigned char*>(::MapViewOfFile(m,
FILE_MAP_READ, 0, 0, n));
x3 = std::accumulate(view, view+n, 0u);
::UnmapViewOfFile(view);
::CloseHandle(m);
::CloseHandle(h);
}
auto finish3 = std::chrono::steady_clock::now();
 
auto start1 = std::chrono::steady_clock::now();
{
FILE* f = fopen(filename.c_str(), "rb");
size_t n = _filelengthi64(fileno(f));
std::vector<unsigned char> a(n);
fread(a.data(), 1, n, f);
fclose(f);
x1 = std::accumulate(a.begin(), a.end(), 0u);
 
}
auto finish1 = std::chrono::steady_clock::now();
 
auto start2 = std::chrono::steady_clock::now();
{
FILE* f = fopen(filename.c_str(), "rb");
size_t n = _filelengthi64(fileno(f));
unsigned char* b = new unsigned char[n];
fread(b, 1, n, f);
x2 = std::accumulate(b, b+n, 0u);
delete[] b;
fclose(f);
}
auto finish2 = std::chrono::steady_clock::now();
 
auto start4 = std::chrono::steady_clock::now();
{
FILE* f = fopen(filename.c_str(), "rb");
size_t n = _filelengthi64(fileno(f));
const size_t bufferSize = 4*4096;
std::vector<unsigned char> a(bufferSize);
x4 = 0;
while (true) {
size_t k = fread(a.data(), 1, bufferSize, f);
x4 = std::accumulate(a.data(), a.data()+k, x4);
if (k<bufferSize) {
break;
}
}
fclose(f);
}
auto finish4 = std::chrono::steady_clock::now();
 
auto start5 = std::chrono::steady_clock::now();
{
FILE* f = fopen(filename.c_str(), "rb");
size_t n = _filelengthi64(fileno(f));
const size_t bufferSize = 4*4096;
unsigned char* a = new unsigned char[bufferSize];
x5 = 0;
while (true) {
size_t k = fread(a, 1, bufferSize, f);
x5 = std::accumulate(a, a+k, x5);
if (k<bufferSize) {
break;
}
}
delete[] a;
fclose(f);
}
auto finish5 = std::chrono::steady_clock::now();
 
auto dur1 =
std::chrono::duration_cast<std::chrono::duration<double>>(finish1-start1);
auto dur2 =
std::chrono::duration_cast<std::chrono::duration<double>>(finish2-start2);
auto dur3 =
std::chrono::duration_cast<std::chrono::duration<double>>(finish3-start3);
auto dur4 =
std::chrono::duration_cast<std::chrono::duration<double>>(finish4-start4);
auto dur5 =
std::chrono::duration_cast<std::chrono::duration<double>>(finish5-start5);
 
std::cout << "mmap : " << dur3.count() << " s, " <<
100.0*(dur1.count()-dur3.count())/dur1.count() << " % win\n";
std::cout << "large vector: " << dur1.count() << " s\n";
std::cout << "large new[] : " << dur2.count() << " s, " <<
100.0*(dur1.count()-dur2.count())/dur1.count() << " % win\n";
std::cout << "small vector: " << dur4.count() << " s, " <<
100.0*(dur1.count()-dur4.count())/dur1.count() << " % win\n";
std::cout << "small new[] : " << dur5.count() << " s, " <<
100.0*(dur1.count()-dur5.count())/dur1.count() << " % win\n";
 
if (x1!=x2 || x1!=x3 || x1!=x4 || x1!=x5) {
std::cerr << "Something wrong\n";
}
return x1-x2;
}
leigh.v.johnston@googlemail.com: Mar 28 05:03AM -0700

Nonsense. Boost provides a platform agnostic way to do this so there is no reason C++ couldn't in the future.
"Öö Tiib" <ootiib@hot.ee>: Mar 28 05:30AM -0700

> Nonsense. Boost provides a platform agnostic way to do this so there is no reason C++ couldn't in the future.
 
That "magical boost" does on work embedded
Linux that does not have virtual memory system.
Vir Campestris <vir.campestris@invalid.invalid>: Mar 28 09:57PM

On 27/03/2019 06:15, Paavo Helde wrote:
> it. The initialization of a 800 million element unsigned int vector
> takes ca 1.3 s with reserve() and emplace_back() and ca 0.9 s with new[]
> and assignment. This makes ca half of a nanosecond per element.
 
I just measured it too...
 
reserve + emplace_back: .532
emplace_back without reserve: 1.14
push_back without reserve: 1.38
 
Visual Studio 2012 for vector<char> size 2^29 in _release_ mode, in
debug it's another story.
 
> just deleted and 100% efficiency gained ;-). If it used for anything
> substantial then the initialization overhead will likely turn
> insignificant.
 
+1
 
Andy
fir <profesor.fir@gmail.com>: Mar 28 09:36AM -0700

exe files have header and in some field of it there is a timestamp like this (this above is hex afaik)
it is most probably just time of creation of that exe (by asembler/linker/compiler)
 
does maybe someone know how to translate it to date or yet more likely current date to it ?(i had my own assembler that compiles to exe and i would need to generate put that timestamp of that kind there)
Bonita Montero <Bonita.Montero@gmail.com>: Mar 28 05:49PM +0100

The timestamp is a time_t-value, i.e. the passed seconds since 1.1.1970.
So we are not off-topic here because this isn't Windows-specific. ;-)
"Öö Tiib" <ootiib@hot.ee>: Mar 28 10:10AM -0700

On Thursday, 28 March 2019 18:49:13 UTC+2, Bonita Montero wrote:
> The timestamp is a time_t-value, i.e. the passed seconds since 1.1.1970.
> So we are not off-topic here because this isn't Windows-specific. ;-)
 
:-) Whatever timestamp fir posted It is unlikely time_t like you describe.
The 69676572 must be Fri, 17 Mar 1972 10:36:12 GMT and
0x69676572 (a.k.a 1768383858) must be Wed, 14 Jan 2026 09:44:18 GMT
according to that site: http://www.onlineconversion.com/unix_time.htm
Both dates do look too unusual for to be times of creation of some
Windows exe file for obvious reasons.
Paavo Helde <myfirstname@osa.pri.ee>: Mar 28 07:25PM +0200

On 28.03.2019 18:49, Bonita Montero wrote:
> The timestamp is a time_t-value, i.e. the passed seconds since 1.1.1970.
> So we are not off-topic here because this isn't Windows-specific. ;-)
 
The TimeDateStamp in the COFF header is documented as "The low 32 bits
of the number of seconds since 00:00 January 1, 1970 (a C run-time
time_t value), that indicates when the file was created."
 
I do not see the high 32 bits of the time_t value stored anywhere. Maybe
the TimeDateStamp field is meant just as a convenient label for telling
apart different versions of the file?
Bonita Montero <Bonita.Montero@gmail.com>: Mar 28 06:31PM +0100

> :-) Whatever timestamp fir posted It is unlikely time_t like you describe.
 
I just checked a PE built by myself and converted the hex-value
given by dumpbin ("5C9CBC41 time date stamp Thu Mar 28 13:21:21
2019") to a decimal value with calc.exe and put it into:
http://www.onlineconversion.com/unix_time.htm
And I got exactly the date and time the EXE was built.
fir <profesor.fir@gmail.com>: Mar 28 12:44PM -0700

W dniu czwartek, 28 marca 2019 18:25:47 UTC+1 użytkownik Paavo Helde napisał:
 
> I do not see the high 32 bits of the time_t value stored anywhere. Maybe
> the TimeDateStamp field is meant just as a convenient label for telling
> apart different versions of the file?
 
ye this timestamp is in fact created by gcc (mingw-tdm)in a program i compiled in 2018 or about..so it should be some meaningfull date of 2018 or something
fir <profesor.fir@gmail.com>: Mar 28 12:52PM -0700

W dniu czwartek, 28 marca 2019 18:31:25 UTC+1 użytkownik Bonita Montero napisał:
> 2019") to a decimal value with calc.exe and put it into:
> http://www.onlineconversion.com/unix_time.htm
> And I got exactly the date and time the EXE was built.
 
if used to exe build by mingw-tdm it gives
 
Wed, 14 Jan 2026 09:44:18 GMT
 
i dont checked in binary but it is from ollydebug who shows it nicer way
 
its either bug in olly or mingw puts fake timestamp, it seems
fir <profesor.fir@gmail.com>: Mar 28 12:56PM -0700

W dniu czwartek, 28 marca 2019 17:49:13 UTC+1 użytkownik Bonita Montero napisał:
> The timestamp is a time_t-value, i.e. the passed seconds since 1.1.1970.
> So we are not off-topic here because this isn't Windows-specific. ;-)
 
speaking on pieces of real code in c/c++
is more valueable than speaking on pure language in my opinion (as speaking on language is so called 'big distraction')
 
this is my opinion but its somewhat well grounded, i think ;<
Mark <ma740988@gmail.com>: Mar 28 04:28AM -0700

I'm trying to make sense of the noexcept specifier so given the following
 
// noexceptOperator.cpp
 
#include <iostream>
#include <array>
#include <vector>
 
class NoexceptCopy{
public:
std::array<int, 5> arr{1, 2, 3, 4, 5}; // (2)
};
 
class NonNoexceptCopy{
public:
std::vector<int> v{1, 2, 3, 4 , 5}; // (3)
};
 
template <typename T>
T copy(T const& src) noexcept(noexcept(T(src))) { // (1)
std::cout << " called " << std::endl;
return src;
}
 
int main(){
 
NoexceptCopy noexceptCopy;
NonNoexceptCopy nonNoexceptCopy;
 
std::cout << std::boolalpha << std::endl;
 
std::cout << "noexcept(copy(noexceptCopy)): " << // (4)
noexcept(copy(noexceptCopy)) << std::endl;
 
std::cout << "noexcept(copy(nonNoexceptCopy)): " << // (5)
noexcept(copy(nonNoexceptCopy)) << std::endl;
 
std::cout << std::endl;
 
}
 
Need a reading on where the call to copy fits in given copy (noexceptCopy) or copy (nonNoexceptCopy ) never gets invoked ('called is not outputted) and 'noexcept(T(src))' appears to boil down to
 
noexcept ( NoexceptCopy )
noexcept ( NonNoexceptCopy )
 
thanks in advance
"Öö Tiib" <ootiib@hot.ee>: Mar 28 05:20AM -0700

On Thursday, 28 March 2019 13:28:16 UTC+2, Mark wrote:
> noexcept(copy(nonNoexceptCopy)) << std::endl;
 
> std::cout << std::endl;
 
> }
 
That should say
 
noexcept(copy(noexceptCopy)): true
noexcept(copy(nonNoexceptCopy)): false
 
Is there some reason of doubt about it?
Can you elaborate for example what is unclear in online reference like:
https://en.cppreference.com/w/cpp/language/noexcept
 
 
> noexcept ( NoexceptCopy )
> noexcept ( NonNoexceptCopy )
 
> thanks in advance
 
It does not matter since expressions given as arguments of operators like sizeof
or noexcept do not get "invoked", just the types of those are evaluated compile
time.
Mark <ma740988@gmail.com>: Mar 28 05:38AM -0700

On Thursday, March 28, 2019 at 8:20:12 AM UTC-4, Öö Tiib wrote:
 
> It does not matter since expressions given as arguments of operators like sizeof
> or noexcept do not get "invoked", just the types of those are evaluated compile
> time.
 
Ah! I'm following now. I might have missed the 'operator' context in noexcept
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: