- Scanning memory forward vs. reverse - 4 Updates
- The ultimate thread pool - 1 Update
Bonita Montero <Bonita.Montero@gmail.com>: Jan 28 11:19AM +0100 With my thread pool there's some code that looks up a deque in reverse order with find_if with reverse iterators to check if there's a "this" pointer inside the deque. I also could have scanned the deque in forward order but it's more likely to find a fitting element wenn searching from the back first. A deque usually consists of a number of linear parts in memory. This lead me to the question if scanning memory is faster forward or back- ward. I tried to test this with the below program: #include <iostream> #include <vector> #include <atomic> #include <chrono> using namespace std; using namespace chrono; atomic_char aSum; int main() { constexpr size_t GB = 1ull << 30; vector<char> vc( GB ); auto sum = []( auto begin, auto end, ptrdiff_t step ) { auto start = high_resolution_clock::now(); char sum = 0; for( auto p = begin; end - p >= step; sum += *p, p += step ); ::aSum.store( sum, memory_order_relaxed ); cout << duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / 1.0e6 << "ms" << endl; }; constexpr size_t STEP = 100; sum( vc.begin(), vc.end(), STEP ); sum( vc.rbegin(), vc.rend(), STEP ); } On my Windows 7050X Zen4 computer scanning memory in both directions has the same speed. On my Linux 3990X Zen2 computer scanning forward is 22% faster. On my small Linux PC, a HP EliteDesk Mini PC with a Skylake Pentium G4400 scanning memory forward is about 38% faster. I'd first have guessed that the prefetchers between the memory-levels are as effective for both directions. So I'd like to see some results from you. |
Marcel Mueller <news.5.maazl@spamgourmet.org>: Jan 28 11:32AM +0100 Am 28.01.24 um 11:19 schrieb Bonita Montero: > I'd first have guessed that the prefetchers between the memory-levels > are as effective for both directions. So I'd like to see some results > from you. Reverse memory access is typically slower simply because the last data of a cache line (after a cache miss) arrives at last. If you read forward the process continues when the first few bytes of the cache line are read. The further data is read in parallel. But details depend on many other factors. First of all the placement of the memory chunks and the used prefetching technique (if any). Marcel |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 28 02:07PM +0100 Am 28.01.2024 um 11:32 schrieb Marcel Mueller: > Reverse memory access is typically slower simply because the > last data of a cache line (after a cache miss) arrives at last. I tested this and for all offsets within a cacheline I get thes same timing for all three of my computers: #include <iostream> #include <vector> #include <chrono> #include <atomic> using namespace std; using namespace chrono; #if defined(__cpp_lib_hardware_interference_size) constexpr size_t CL_SIZE = hardware_constructive_interference_size; #else constexpr size_t CL_SIZE = 64;
Subscribe to:
Post Comments (Atom)
|
No comments:
Post a Comment