Saturday, April 15, 2017

Digest for comp.lang.c++@googlegroups.com - 14 updates in 6 topics

rami17 <rami17@rami17.net>: Apr 15 12:49PM -0400

Hello....
 
 
Efficient C++ Bounded Thread-Safe FIFO Queue and LIFO Stack for
real-time systems was updated to version 1.06
 
 
Author: Amine Moulay Ramdane
 
Description:
 
Efficient C++ Bounded Thread-Safe FIFO Queue and LIFO Stack for
real-time systems, they are efficient and they eliminate false-sharing
and the Thread-Safe FIFO Queue is FIFO fair and starvation-free, and
the Thread-Safe LIFO Stack is LIFO fair and starvation-free.
 
Language: GNU C++ and Visual C++ and C++Builder
 
 
You can download them from:
 
https://sites.google.com/site/aminer68/efficient-c-bounded-thread-safe-fifo-queue-and-lifo-stack-for-real-time-systems
 
 
Thank you,
Amine Moulay Ramdane.
Bonita Montero <Bonita.Montero@gmail.com>: Apr 15 07:13PM +0200

Don't trust in someone who claims to know how to write synchronization
-facilities who doesn't even understand simple condidtion-variables.
aminer68@gmail.com: Apr 15 10:20AM -0700

On Saturday, April 15, 2017 at 12:50:01 PM UTC-4, rami17 wrote:
aminer68@gmail.com: Apr 15 10:20AM -0700

On Saturday, April 15, 2017 at 1:13:19 PM UTC-4, Bonita Montero wrote:
> Don't trust in someone who claims to know how to write synchronization
> -facilities who doesn't even understand simple condidtion-variables.
 
You don't understand Bonita, i have designed and implemented a condition variable, but i have not worked at that time on how to use a condition variable, but after that i have understood easily how to use a condition variable to implement a waitable Thread-safe FIFO queue etc.
 
 
Thank you,
Amine Moulay Ramdane.
Jerry Stuckle <jstucklex@attglobal.net>: Apr 15 01:25PM -0400

On 4/15/2017 1:13 PM, Bonita Montero wrote:
> Don't trust in someone who claims to know how to write synchronization
> -facilities who doesn't even understand simple condidtion-variables.
 
The sad part of it is, Bonita, Ramine is too stupid to realize what a
fool he is making of himself. He thinks he's showing his expertise,
when in reality he's showing just the opposite.
 
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
Real Troll <real.troll@trolls.com>: Apr 15 01:35PM -0400

On 15/04/2017 18:13, Bonita Montero wrote:
> Don't trust in someone who claims to know how to write synchronization
> -facilities who doesn't even understand simple condidtion-variables.
 
 
why are you responding to him? Just leave him alone and he'll get tired
of being ignored and go away. are you missing him?
"Chris M. Thomasson" <invalid@invalid.invalid>: Apr 15 04:18PM -0700


> You don't understand Bonita, i have designed and implemented a
> condition variable, but i have not worked at that time on how to use
> a condition variable,
 
ummmm. wtf!
 
 
> but after that i have understood easily how to
> use a condition variable to implement a waitable Thread-safe FIFO queue
> etc.
 
Yeah right!
aminer68@gmail.com: Apr 15 04:08PM -0700

Hello...
 
 
My Scalable Parallel C++ Conjugate Gradient Linear System Solver
Library for Windows and Linux was updated to version 1.63
 
I have just enabled floating-point exceptions, and i have thoroughly
tested it and it is now more stable and very fast.
 
 
You can download it from:
 
https://sites.google.com/site/aminer68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library
 
 
Thank you,
Amine Moulay Ramdane.
ram@zedat.fu-berlin.de (Stefan Ram): Apr 15 07:29PM

> > ...
>> operator long() {return v+MN;}
>Why Du you substract MN? That's useless.
 
The difficult part for me would be to adjust the
size of the member variable according to the size
of the range.
 
The code below is a first rough scetch only.
 
I believe it has several bugs and might only be
compiled due to a glitch of my compiler, but one
should get the idea. Maybe someone can improve on
it.
 
main.cpp
 
/* parts are based on a tutorial by Peter Dimov */
 
#include <cassert>
#include <cmath>
#include <cstdint>
#include <initializer_list>
#include <iostream>
#include <ostream>
 
/* section "type lists" */
template< class... T >struct type_list {};
 
/* section "sizeof for type lists" */
template< class L >struct size;
template< class... T >struct size<type_list< T... >>
{ using type =
std::integral_constant< ::std::size_t, sizeof...( T )>; };
template< class L >using sizetype = typename size< L >::type;
 
/* section "member access for type lists" */
template< class L, std::size_t I >struct at_c;
template< class L, std::size_t I >using at_c_type =
typename at_c< L, I >::type;
template< class L, int value >using at =
typename at_c< L, value >::type;
template< template< class... >class L, class T1, class... T >
struct at_c< L< T1, T... >, 0 >
{ using type = T1; };
template
<template< class... >class L, class T1, class... T, ::std::size_t I >
struct at_c< L< T1, T... >, I >
{ using type = at_c_type< L< T... >, I-1 >; };
 
/* section "the type list for integral types" */
using list = type_list< uint_least8_t, uint_least16_t,
uint_least32_t, uint_least64_t >;
constexpr auto listsize = sizetype< list >::value;
 
/* section "logarithms" */
constexpr int log256( unsigned long long x )
{ constexpr long double d = log2( 256.0l );
return static_cast< int >
( log2( static_cast< long double >( x ))/d ); }
 
/* section "the range type" */
template<uint_least64_t l, uint_least64_t r>
struct range
{ at< list, log256( r - l ) >member; };
 
int main()
{ assert( listsize == static_cast< ::std::size_t >( 4 ));
{ using t = at< list, 0 >;
assert( sizeof( t )== static_cast< ::std::size_t >( 1 )); }
{ using t = at< list, 1 >;
assert( sizeof( t )== static_cast< ::std::size_t >( 2 )); }
{ using t = at< list, 2 >;
assert( sizeof( t )== static_cast< ::std::size_t >( 4 )); }
{ using t = at< list, 3 >;
assert( sizeof( t )== static_cast< ::std::size_t >( 8 )); }
{ struct range<0,100> r; ::std::cout << sizeof r.member << '\n'; }
{ struct range<0,1000> r; ::std::cout << sizeof r.member << '\n'; }
{ struct range<0,1'000'000> r;
::std::cout << sizeof r.member << '\n'; }
{ struct range<0,1'000'000'000> r;
::std::cout << sizeof r.member << '\n'; }
}
 
transcript
 
1
2
4
8
 
(I would like the last number to be »4« instead of »8«, because
4 octets should suffice for range<0,1'000'000'000>. But I will
not try to debug this today.)
Robert Wessel <robertwessel2@yahoo.com>: Apr 15 12:47AM -0500

On 14 Apr 2017 20:38:29 GMT, ram@zedat.fu-berlin.de (Stefan Ram)
wrote:
 
 
>sizeof instance.i
 
> should be just 1, because one byte is enough to store
> one out of 10 values.
 
 
While only a rough outline, a template with conversion operators, like
the following, should provide a start. This does require the actual
storage type to be explicitly specified, I'm not sure how you'd
generate that automatically from the range.
 
 
#include <stdexcept>
template<typename T, long MN, long MX>
class rangetype
{
T v;
void setv(long a)
{
if (a < MN || a > MX)
throw std::out_of_range("blah");
v=a-MN;
}
 
public:
rangetype& operator=(long a) {setv(a); return *this;}
operator long() {return v+MN;}
};
 
 
int main()
{
int a;
char b;
rangetype<char, 100, 200> q;
rangetype<int, 100, 2000> r;
 
q = 3;
q = 3.14;
 
a = q;
b = q;
 
r = q;
}
Bonita Montero <Bonita.Montero@gmail.com>: Apr 15 10:43AM +0200

> template<typename T, long MN, long MX>
 
template<typename T, T MN, T MX>
 
> void setv(long a)
 
void setv(T a)
...
 
> rangetype& operator=(long a) {setv(a); return *this;}
 
rangetype& operator=(T a) {setv(a); return *this;}
 
> operator long() {return v+MN;}
 
operator T() {return v+MN;}
Winfied Valentin <Winfied.Valentin@gmail.com>: Apr 15 08:03PM +0200

> v=a-MN;
> ...
> operator long() {return v+MN;}
 
Why Du you substract MN? That's useless.
Manfred <noname@invalid.add>: Apr 15 07:39PM +0200

On 04/14/2017 03:39 PM, Ralf Goertz wrote:
>>> Sine being odd and cos even means that their Taylor serieses don't
>>> share any common powers of x. So I am a bit puzzled how that is
>>> accomplished.
About the simple observation of the series: they don't share common
powers, but, along the series in a plain implementation, some values
must be calculated even if they do not accumulate into one series, while
they do into the other series: look e.g. at the 1/(n!) terms (and also
x^n, although for these you can use x^2 to step over the non-used powers).
These algorithms have been subject to deep analysis and optimization,
and I can't say if modern implementations actually use these series,
nevertheless the Intel instruction set reference manual says about FSINCOS:
"This instruction is faster than executing the FSIN and FCOS
instructions in succession"
If the Intel designers decided to implement this in their hardware, I
think there must have been good reason.
 
 
>>> cosine separately takes as long as doing it in one go using sincos
>>> on my linux x86_64 architecture. Is it because the glibc sincos
>>> function is bogus here?
I did the same, with 1 billion iterations (code below **):
No optimization sincos is faster:
$ c++ -DUSE_SINCOS sincos.cc -o sincos && ./sincos
evals: 1073741824
cycle result: -1.41666e-14, -1.19209e-07
cycle time: 42.1826 sec
$ c++ sincos.cc -o sincos && ./sincos
evals: 1073741824
cycle result: -1.41666e-14, -1.19209e-07
cycle time: 51.1113 sec
 
 
with -O2 no difference (see asm below):
$ c++ -O2 -DUSE_SINCOS sincos.cc -o sincos && ./sincos
evals: 1073741824
cycle result: -1.41666e-14, -1.19209e-07
cycle time: 25.9019 sec
$ c++ -O2 sincos.cc -o sincos && ./sincos
evals: 1073741824
cycle result: -1.41666e-14, -1.19209e-07
cycle time: 26.4266 sec
 
 
 
>> 1) sincos only exists in (deprecated) 8087 math, so the instruction
>> is no longer used for SSE math
x87 FPUs do have some age, but I am not sure they are generally
deprecated for 32-bit code.
It is forbidden to use them in some contexts (e.g. conflict with MMX
which alias its registers with FPU registers)
SSE math does not implement such instructions in hardware, they have to
be implemented in application (library) code.
 
 
> So I guess g++ is not that clever. The function (mypolar) I used was
> defined according to the std::polar function I posted elsewhere in this
> thread:
I got a similar result with no optimization:
$ c++ -S -DUSE_SINCOS sincos.cc -o sincos1.s && c++ -S sincos.cc -o
sincos2.s && diff sincos{1,2}.s | less
< movq %rcx, %rsi
< movq %rdx, %rdi
< movq %rax, -200(%rbp)
< movsd -200(%rbp), %xmm0
< call sincos
< movsd -192(%rbp), %xmm0
< movsd -184(%rbp), %xmm1
< movsd %xmm1, -168(%rbp)
< movsd %xmm0, -176(%rbp)
---
> movq %rax, -168(%rbp)
116c106,108
< movsd -32(%rbp), %xmm1
---
> call sin
> movapd %xmm0, %xmm1
> movsd -32(%rbp), %xmm0
119,120c111,116
< movsd -176(%rbp), %xmm0
< movsd -40(%rbp), %xmm1
---
> call cos
> movapd %xmm0, %xmm1
> movsd -40(%rbp), %xmm0
 
But I got a definitely different result with -O2:
$ c++ -S -O2 -DUSE_SINCOS sincos.cc -o sincos1.s && c++ -S -O2 sincos.cc
-o sincos2.s && diff sincos{1,2}.s
55c55
< cmpq %rbx, %rax
---
> cmpq %rax, %rbx
96c96
< cmpq %rbx, %r15
---
> cmpq %r15, %rbx
 
So indeed, with -O2 the result is practically identical.
 
 
 
> }
 
> I had placed [0, M_2_PI)-uniformly distributed angles into the real
> components of the elements of vc beforehand.
Note that M_2_PI is defined as "two times the reciprocal of pi"
 
 
**) sample code:
#include <iostream>
 
#include <vector>
 
#include <cmath>
#include <ctime>
 
 
#define ANGLES 1024
#define ITER (1024*1024)
 
 
int main()
{
unsigned long evalCount = 0;
 
double sinAcc = 0.0;
double cosAcc = 0.0;
 
std::vector<double> angles(ANGLES, 0.0);
 
 
for(int a=0; ANGLES>a; ++a)
{
angles[a] = a*(2.0*M_PI/ANGLES);
}
 
clock_t startTime = clock();
 
for(int iter=0; ITER>iter; ++iter)
{
for(auto a : angles)
{
#ifdef USE_SINCOS
double sinVal, cosVal;
sincos(a, &sinVal, &cosVal);
sinAcc += sinVal;
cosAcc += cosVal;
#else
sinAcc += sin(a);
cosAcc += cos(a);

No comments: