Sunday, January 28, 2024

Digest for comp.lang.c++@googlegroups.com - 5 updates in 2 topics

Bonita Montero <Bonita.Montero@gmail.com>: Jan 28 11:19AM +0100

With my thread pool there's some code that looks up a deque in reverse
order with find_if with reverse iterators to check if there's a "this"
pointer inside the deque. I also could have scanned the deque in forward
order but it's more likely to find a fitting element wenn searching from
the back first.
A deque usually consists of a number of linear parts in memory. This
lead me to the question if scanning memory is faster forward or back-
ward. I tried to test this with the below program:
 
#include <iostream>
#include <vector>
#include <atomic>
#include <chrono>
 
using namespace std;
using namespace chrono;
 
atomic_char aSum;
 
int main()
{
constexpr size_t GB = 1ull << 30;
vector<char> vc( GB );
auto sum = []( auto begin, auto end, ptrdiff_t step )
{
auto start = high_resolution_clock::now();
char sum = 0;
for( auto p = begin; end - p >= step; sum += *p, p += step );
::aSum.store( sum, memory_order_relaxed );
cout << duration_cast<nanoseconds>( high_resolution_clock::now() -
start ).count() / 1.0e6 << "ms" << endl;
};
constexpr size_t STEP = 100;
sum( vc.begin(), vc.end(), STEP );
sum( vc.rbegin(), vc.rend(), STEP );
}
 
On my Windows 7050X Zen4 computer scanning memory in both directions
has the same speed. On my Linux 3990X Zen2 computer scanning forward
is 22% faster. On my small Linux PC, a HP EliteDesk Mini PC with a
Skylake Pentium G4400 scanning memory forward is about 38% faster.
I'd first have guessed that the prefetchers between the memory-levels
are as effective for both directions. So I'd like to see some results
from you.
Marcel Mueller <news.5.maazl@spamgourmet.org>: Jan 28 11:32AM +0100

Am 28.01.24 um 11:19 schrieb Bonita Montero:
 
> I'd first have guessed that the prefetchers between the memory-levels
> are as effective for both directions. So I'd like to see some results
> from you.
 
Reverse memory access is typically slower simply because the last data
of a cache line (after a cache miss) arrives at last. If you read
forward the process continues when the first few bytes of the cache line
are read. The further data is read in parallel.
 
But details depend on many other factors. First of all the placement of
the memory chunks and the used prefetching technique (if any).
 
 
Marcel
Bonita Montero <Bonita.Montero@gmail.com>: Jan 28 02:07PM +0100

Am 28.01.2024 um 11:32 schrieb Marcel Mueller:
 
> Reverse memory access is typically slower simply because the
> last data of a cache line (after a cache miss) arrives at last.
 
I tested this and for all offsets within a cacheline I get thes
same timing for all three of my computers:
 
#include <iostream>
#include <vector>
#include <chrono>
#include <atomic>
 
using namespace std;
using namespace chrono;
 
#if defined(__cpp_lib_hardware_interference_size)
constexpr size_t CL_SIZE = hardware_constructive_interference_size;
#else
constexpr size_t CL_SIZE = 64;

No comments: