Thursday, January 14, 2021

Digest for comp.programming.threads@googlegroups.com - 4 updates in 4 topics

Amine Moulay Ramdane <aminer68@gmail.com>: Jan 13 03:27PM -0800

Hello,
 
Read again, i correct about more precision about WaitOnAdress and windows futexes and convoying..
 
I think i am smart, and i think that you have to be smart Bonita Montero, so look at the following article of Joe Duffy that was a lead software architect inside Microsoft:
 
http://joeduffyblog.com/2006/12/14/anticonvoy-locks-in-windows-server-2003-sp1-and-windows-vista/
 
So when you are smart you will notice that the windows futex inside windows has to be unfair so that to efficiently avoid convoying, and
this is why i think that the last windows versions come with a windows futex that is unfair to efficiently avoid convoying and so that to efficiently avoid convoying it has also to use optimistic "spinning" in an interval waiting time before doing the system call inside the windows futex, so then i think i am right when i have just responded the following in my previous post:
 
I have just read the following from Bonita Montero, and my answer
is below:
 
============================================================
 
I know this isn't a Windows-group.
But as the Win32-groups are dead I ask here.
There's an API called WakeByAddressSingle/ WakeByAddressSingle /
WakeByAddressAll. It is like a binary semaphore (Win32 event),
but it has a superior performance.
Look at this code:
 
#include <Windows.h>
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
#include <cstdint>
 
#pragma warning(disable: 6387)
 
using namespace std;
using namespace chrono;
 
int main()
{
using hrc_tp = time_point<high_resolution_clock>;
auto thrAddr = []( uint64_t count, void **waitOnThis, void **notifyThat )
{
void *cmp = nullptr;
for( ; count; --count )
{
while( *waitOnThis == cmp )
WaitOnAddress( waitOnThis, &cmp, sizeof(void *), INFINITE );
*waitOnThis = cmp;
*notifyThat = (void *)-1;
WakeByAddressSingle( notifyThat );
}
};
uint64_t const ROUNDS = 1'000'000;
void *waitA = (void *)-1,
*waitB = nullptr;
hrc_tp start = high_resolution_clock::now();
thread thrA( thrAddr, ROUNDS + 1, &waitA, &waitB ),
thrB( thrAddr, ROUNDS, &waitB, &waitA );
thrA.join();
thrB.join();
int64_t ns = duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count();
cout << "WaitOnAddress: " << (double)ns / ROUNDS << endl;
auto thrEvent = []( uint64_t count, HANDLE hEvtThis, HANDLE hEvtThat )
{
for( ; count; --count )
WaitForSingleObject( hEvtThis, INFINITE ),
SetEvent( hEvtThat );
};
HANDLE hEvtA = CreateEvent( nullptr, FALSE, TRUE, nullptr ),
hEvtB = CreateEvent( nullptr, FALSE, FALSE, nullptr );
start = high_resolution_clock::now();
thrA = thread( thrEvent, ROUNDS + 1, hEvtA, hEvtB );
thrB = thread( thrEvent, ROUNDS, hEvtB, hEvtA );
thrA.join();
thrB.join();
ns = duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count();
cout << "WaitForSingleObject: " << (double)ns / ROUNDS << endl;
}
 
This outputs the following on my machine:
WaitOnAddress: 248.879
WaitForSingleObject: 10318.6
40 times the synchronization-performance like a kernel-event (binary
seamphore), this is really amazing. But does anyone know how this
works internally ? The following article is too vague to me:
https://devblogs.microsoft.com/oldnewthing/20160826-00/?p=94185
======================================================
 
Here is my answer:
 
I think that Bonita Montero is testing the optimistic spinning of the windows futex since there a waiting time of optimistic spinning before doing the system call, so this is why it is giving you this performance, but i think the optimistic spinning waiting time is not good for preemption tolerance, so it is not as good in preemption tolerance as lock-free or wait-free algorithms.
 
Note: Futexes have been implemented in Microsoft Windows since Windows 8 or Windows Server 2012 under the name WaitOnAddress.
 
Thank you,
Amine Moulay Ramdane.
Amine Moulay Ramdane <aminer68@gmail.com>: Jan 13 03:24PM -0800

Hello,
 
More precision about WaitOnAdress and windows futuxes and convoying..
 
I think i am smart, and i think that you have to be smart Bonita Montero, so look at the following article of Joe Duffy that was a lead software architect inside Microsoft:
 
http://joeduffyblog.com/2006/12/14/anticonvoy-locks-in-windows-server-2003-sp1-and-windows-vista/
 
So when you are smart you will notice that the windows futex inside windows has to be unfair so that to efficiently avoid convoying, and
this is why i think that the last windows versions come with a windows futex that is unfair to efficiently avoid convoying and so that to efficiently avoid convoying it has also to use optimistic "spinning" in an interval waiting time before doing the system call inside the windows futex, so then i think i am right when i have just responded the following in my previous post:
 
I have just read the following from Bonita Montero, and my answer
is below:
 
============================================================
 
I know this isn't a Windows-group.
But as the Win32-groups are dead I ask here.
There's an API called WakeByAddressSingle/ WakeByAddressSingle /
WakeByAddressAll. It is like a binary semaphore (Win32 event),
but it has a superior performance.
Look at this code:
 
#include <Windows.h>
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
#include <cstdint>
 
#pragma warning(disable: 6387)
 
using namespace std;
using namespace chrono;
 
int main()
{
using hrc_tp = time_point<high_resolution_clock>;
auto thrAddr = []( uint64_t count, void **waitOnThis, void **notifyThat )
{
void *cmp = nullptr;
for( ; count; --count )
{
while( *waitOnThis == cmp )
WaitOnAddress( waitOnThis, &cmp, sizeof(void *), INFINITE );
*waitOnThis = cmp;
*notifyThat = (void *)-1;
WakeByAddressSingle( notifyThat );
}
};
uint64_t const ROUNDS = 1'000'000;
void *waitA = (void *)-1,
*waitB = nullptr;
hrc_tp start = high_resolution_clock::now();
thread thrA( thrAddr, ROUNDS + 1, &waitA, &waitB ),
thrB( thrAddr, ROUNDS, &waitB, &waitA );
thrA.join();
thrB.join();
int64_t ns = duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count();
cout << "WaitOnAddress: " << (double)ns / ROUNDS << endl;
auto thrEvent = []( uint64_t count, HANDLE hEvtThis, HANDLE hEvtThat )
{
for( ; count; --count )
WaitForSingleObject( hEvtThis, INFINITE ),
SetEvent( hEvtThat );
};
HANDLE hEvtA = CreateEvent( nullptr, FALSE, TRUE, nullptr ),
hEvtB = CreateEvent( nullptr, FALSE, FALSE, nullptr );
start = high_resolution_clock::now();
thrA = thread( thrEvent, ROUNDS + 1, hEvtA, hEvtB );
thrB = thread( thrEvent, ROUNDS, hEvtB, hEvtA );
thrA.join();
thrB.join();
ns = duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count();
cout << "WaitForSingleObject: " << (double)ns / ROUNDS << endl;
}
 
This outputs the following on my machine:
WaitOnAddress: 248.879
WaitForSingleObject: 10318.6
40 times the synchronization-performance like a kernel-event (binary
seamphore), this is really amazing. But does anyone know how this
works internally ? The following article is too vague to me:
https://devblogs.microsoft.com/oldnewthing/20160826-00/?p=94185
======================================================
 
Here is my answer:
 
I think that Bonita Montero is testing the optimistic spinning of the windows futex since there a waiting time of optimistic spinning before doing the system call, so this is why it is giving you this performance, but i think the optimistic spinning waiting time is not good for preemption tolerance, so it is not as good in preemption tolerance as lock-free or wait-free algorithms.
 
Note: Futexes have been implemented in Microsoft Windows since Windows 8 or Windows Server 2012 under the name WaitOnAddress.
 
Thank you,
Amine Moulay Ramdane.
Amine Moulay Ramdane <aminer68@gmail.com>: Jan 13 12:30PM -0800

Hello,
 
I have just read the following from Bonita Montero, and my answer
is below:
 
============================================================
 
I know this isn't a Windows-group.
But as the Win32-groups are dead I ask here.
There's an API called WakeByAddressSingle/ WakeByAddressSingle /
WakeByAddressAll. It is like a binary semaphore (Win32 event),
but it has a superior performance.
Look at this code:
 
#include <Windows.h>
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
#include <cstdint>
 
#pragma warning(disable: 6387)
 
using namespace std;
using namespace chrono;
 
int main()
{
using hrc_tp = time_point<high_resolution_clock>;
auto thrAddr = []( uint64_t count, void **waitOnThis, void **notifyThat )
{
void *cmp = nullptr;
for( ; count; --count )
{
while( *waitOnThis == cmp )
WaitOnAddress( waitOnThis, &cmp, sizeof(void *), INFINITE );
*waitOnThis = cmp;
*notifyThat = (void *)-1;
WakeByAddressSingle( notifyThat );
}
};
uint64_t const ROUNDS = 1'000'000;
void *waitA = (void *)-1,
*waitB = nullptr;
hrc_tp start = high_resolution_clock::now();
thread thrA( thrAddr, ROUNDS + 1, &waitA, &waitB ),
thrB( thrAddr, ROUNDS, &waitB, &waitA );
thrA.join();
thrB.join();
int64_t ns = duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count();
cout << "WaitOnAddress: " << (double)ns / ROUNDS << endl;
auto thrEvent = []( uint64_t count, HANDLE hEvtThis, HANDLE hEvtThat )
{
for( ; count; --count )
WaitForSingleObject( hEvtThis, INFINITE ),
SetEvent( hEvtThat );
};
HANDLE hEvtA = CreateEvent( nullptr, FALSE, TRUE, nullptr ),
hEvtB = CreateEvent( nullptr, FALSE, FALSE, nullptr );
start = high_resolution_clock::now();
thrA = thread( thrEvent, ROUNDS + 1, hEvtA, hEvtB );
thrB = thread( thrEvent, ROUNDS, hEvtB, hEvtA );
thrA.join();
thrB.join();
ns = duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count();
cout << "WaitForSingleObject: " << (double)ns / ROUNDS << endl;
}
 
This outputs the following on my machine:
WaitOnAddress: 248.879
WaitForSingleObject: 10318.6
20 times the synchronization-performance like a kernel-event (binary
seamphore), this is really amazing. But does anyone know how this
works internally ? The following article is too vague to me:
https://devblogs.microsoft.com/oldnewthing/20160826-00/?p=94185
======================================================
 
 
I think that Bonita Montero is testing the optimistic spinning of the windows futex since there a waiting time of optimistic spinning before doing the system call, so this is why it is giving you this performance, but i think the optimistic spinning waiting time is not good for preemption tolerance, so it is not as good in preemption tolerance as lock-free or wait-free algorithms.
 
Note: Futexes have been implemented in Microsoft Windows since Windows 8 or Windows Server 2012 under the name WaitOnAddress.
 
Thank you,
Amine Moulay Ramdane.
"Christian Hanné" <the.hanne@gmail.com>: Jan 13 05:07PM +0100

If Delphi would be the most popular language there would be
the most bugs in Delphi-code.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.programming.threads+unsubscribe@googlegroups.com.

No comments: