- Efficient C++ Bounded Thread-Safe FIFO Queue and LIFO Stack for real-time systems was updated to version 1.06 - 7 Updates
- My Scalable Parallel C++ Conjugate Gradient Linear System Solver Library for Windows and Linux was updated to version version 1.63 - 1 Update
- Range types - 1 Update
- Range types - 3 Updates
- n-ary roots from complex numbers... - 1 Update
- std::byte - 1 Update
rami17 <rami17@rami17.net>: Apr 15 12:49PM -0400 Hello.... Efficient C++ Bounded Thread-Safe FIFO Queue and LIFO Stack for real-time systems was updated to version 1.06 Author: Amine Moulay Ramdane Description: Efficient C++ Bounded Thread-Safe FIFO Queue and LIFO Stack for real-time systems, they are efficient and they eliminate false-sharing and the Thread-Safe FIFO Queue is FIFO fair and starvation-free, and the Thread-Safe LIFO Stack is LIFO fair and starvation-free. Language: GNU C++ and Visual C++ and C++Builder You can download them from: https://sites.google.com/site/aminer68/efficient-c-bounded-thread-safe-fifo-queue-and-lifo-stack-for-real-time-systems Thank you, Amine Moulay Ramdane. |
Bonita Montero <Bonita.Montero@gmail.com>: Apr 15 07:13PM +0200 Don't trust in someone who claims to know how to write synchronization -facilities who doesn't even understand simple condidtion-variables. |
aminer68@gmail.com: Apr 15 10:20AM -0700 On Saturday, April 15, 2017 at 12:50:01 PM UTC-4, rami17 wrote: |
aminer68@gmail.com: Apr 15 10:20AM -0700 On Saturday, April 15, 2017 at 1:13:19 PM UTC-4, Bonita Montero wrote: > Don't trust in someone who claims to know how to write synchronization > -facilities who doesn't even understand simple condidtion-variables. You don't understand Bonita, i have designed and implemented a condition variable, but i have not worked at that time on how to use a condition variable, but after that i have understood easily how to use a condition variable to implement a waitable Thread-safe FIFO queue etc. Thank you, Amine Moulay Ramdane. |
Jerry Stuckle <jstucklex@attglobal.net>: Apr 15 01:25PM -0400 On 4/15/2017 1:13 PM, Bonita Montero wrote: > Don't trust in someone who claims to know how to write synchronization > -facilities who doesn't even understand simple condidtion-variables. The sad part of it is, Bonita, Ramine is too stupid to realize what a fool he is making of himself. He thinks he's showing his expertise, when in reality he's showing just the opposite. -- ================== Remove the "x" from my email address Jerry Stuckle jstucklex@attglobal.net ================== |
Real Troll <real.troll@trolls.com>: Apr 15 01:35PM -0400 On 15/04/2017 18:13, Bonita Montero wrote: > Don't trust in someone who claims to know how to write synchronization > -facilities who doesn't even understand simple condidtion-variables. why are you responding to him? Just leave him alone and he'll get tired of being ignored and go away. are you missing him? |
"Chris M. Thomasson" <invalid@invalid.invalid>: Apr 15 04:18PM -0700 > You don't understand Bonita, i have designed and implemented a > condition variable, but i have not worked at that time on how to use > a condition variable, ummmm. wtf! > but after that i have understood easily how to > use a condition variable to implement a waitable Thread-safe FIFO queue > etc. Yeah right! |
aminer68@gmail.com: Apr 15 04:08PM -0700 Hello... My Scalable Parallel C++ Conjugate Gradient Linear System Solver Library for Windows and Linux was updated to version 1.63 I have just enabled floating-point exceptions, and i have thoroughly tested it and it is now more stable and very fast. You can download it from: https://sites.google.com/site/aminer68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library Thank you, Amine Moulay Ramdane. |
ram@zedat.fu-berlin.de (Stefan Ram): Apr 15 07:29PM > > ... >> operator long() {return v+MN;} >Why Du you substract MN? That's useless. The difficult part for me would be to adjust the size of the member variable according to the size of the range. The code below is a first rough scetch only. I believe it has several bugs and might only be compiled due to a glitch of my compiler, but one should get the idea. Maybe someone can improve on it. main.cpp /* parts are based on a tutorial by Peter Dimov */ #include <cassert> #include <cmath> #include <cstdint> #include <initializer_list> #include <iostream> #include <ostream> /* section "type lists" */ template< class... T >struct type_list {}; /* section "sizeof for type lists" */ template< class L >struct size; template< class... T >struct size<type_list< T... >> { using type = std::integral_constant< ::std::size_t, sizeof...( T )>; }; template< class L >using sizetype = typename size< L >::type; /* section "member access for type lists" */ template< class L, std::size_t I >struct at_c; template< class L, std::size_t I >using at_c_type = typename at_c< L, I >::type; template< class L, int value >using at = typename at_c< L, value >::type; template< template< class... >class L, class T1, class... T > struct at_c< L< T1, T... >, 0 > { using type = T1; }; template <template< class... >class L, class T1, class... T, ::std::size_t I > struct at_c< L< T1, T... >, I > { using type = at_c_type< L< T... >, I-1 >; }; /* section "the type list for integral types" */ using list = type_list< uint_least8_t, uint_least16_t, uint_least32_t, uint_least64_t >; constexpr auto listsize = sizetype< list >::value; /* section "logarithms" */ constexpr int log256( unsigned long long x ) { constexpr long double d = log2( 256.0l ); return static_cast< int > ( log2( static_cast< long double >( x ))/d ); } /* section "the range type" */ template<uint_least64_t l, uint_least64_t r> struct range { at< list, log256( r - l ) >member; }; int main() { assert( listsize == static_cast< ::std::size_t >( 4 )); { using t = at< list, 0 >; assert( sizeof( t )== static_cast< ::std::size_t >( 1 )); } { using t = at< list, 1 >; assert( sizeof( t )== static_cast< ::std::size_t >( 2 )); } { using t = at< list, 2 >; assert( sizeof( t )== static_cast< ::std::size_t >( 4 )); } { using t = at< list, 3 >; assert( sizeof( t )== static_cast< ::std::size_t >( 8 )); } { struct range<0,100> r; ::std::cout << sizeof r.member << '\n'; } { struct range<0,1000> r; ::std::cout << sizeof r.member << '\n'; } { struct range<0,1'000'000> r; ::std::cout << sizeof r.member << '\n'; } { struct range<0,1'000'000'000> r; ::std::cout << sizeof r.member << '\n'; } } transcript 1 2 4 8 (I would like the last number to be »4« instead of »8«, because 4 octets should suffice for range<0,1'000'000'000>. But I will not try to debug this today.) |
Robert Wessel <robertwessel2@yahoo.com>: Apr 15 12:47AM -0500 On 14 Apr 2017 20:38:29 GMT, ram@zedat.fu-berlin.de (Stefan Ram) wrote: >sizeof instance.i > should be just 1, because one byte is enough to store > one out of 10 values. While only a rough outline, a template with conversion operators, like the following, should provide a start. This does require the actual storage type to be explicitly specified, I'm not sure how you'd generate that automatically from the range. #include <stdexcept> template<typename T, long MN, long MX> class rangetype { T v; void setv(long a) { if (a < MN || a > MX) throw std::out_of_range("blah"); v=a-MN; } public: rangetype& operator=(long a) {setv(a); return *this;} operator long() {return v+MN;} }; int main() { int a; char b; rangetype<char, 100, 200> q; rangetype<int, 100, 2000> r; q = 3; q = 3.14; a = q; b = q; r = q; } |
Bonita Montero <Bonita.Montero@gmail.com>: Apr 15 10:43AM +0200 > template<typename T, long MN, long MX> template<typename T, T MN, T MX> > void setv(long a) void setv(T a) ... > rangetype& operator=(long a) {setv(a); return *this;} rangetype& operator=(T a) {setv(a); return *this;} > operator long() {return v+MN;} operator T() {return v+MN;} |
Winfied Valentin <Winfied.Valentin@gmail.com>: Apr 15 08:03PM +0200 > v=a-MN; > ... > operator long() {return v+MN;} Why Du you substract MN? That's useless. |
Manfred <noname@invalid.add>: Apr 15 07:39PM +0200 On 04/14/2017 03:39 PM, Ralf Goertz wrote: >>> Sine being odd and cos even means that their Taylor serieses don't >>> share any common powers of x. So I am a bit puzzled how that is >>> accomplished. About the simple observation of the series: they don't share common powers, but, along the series in a plain implementation, some values must be calculated even if they do not accumulate into one series, while they do into the other series: look e.g. at the 1/(n!) terms (and also x^n, although for these you can use x^2 to step over the non-used powers). These algorithms have been subject to deep analysis and optimization, and I can't say if modern implementations actually use these series, nevertheless the Intel instruction set reference manual says about FSINCOS: "This instruction is faster than executing the FSIN and FCOS instructions in succession" If the Intel designers decided to implement this in their hardware, I think there must have been good reason. >>> cosine separately takes as long as doing it in one go using sincos >>> on my linux x86_64 architecture. Is it because the glibc sincos >>> function is bogus here? I did the same, with 1 billion iterations (code below **): No optimization sincos is faster: $ c++ -DUSE_SINCOS sincos.cc -o sincos && ./sincos evals: 1073741824 cycle result: -1.41666e-14, -1.19209e-07 cycle time: 42.1826 sec $ c++ sincos.cc -o sincos && ./sincos evals: 1073741824 cycle result: -1.41666e-14, -1.19209e-07 cycle time: 51.1113 sec with -O2 no difference (see asm below): $ c++ -O2 -DUSE_SINCOS sincos.cc -o sincos && ./sincos evals: 1073741824 cycle result: -1.41666e-14, -1.19209e-07 cycle time: 25.9019 sec $ c++ -O2 sincos.cc -o sincos && ./sincos evals: 1073741824 cycle result: -1.41666e-14, -1.19209e-07 cycle time: 26.4266 sec >> 1) sincos only exists in (deprecated) 8087 math, so the instruction >> is no longer used for SSE math x87 FPUs do have some age, but I am not sure they are generally deprecated for 32-bit code. It is forbidden to use them in some contexts (e.g. conflict with MMX which alias its registers with FPU registers) SSE math does not implement such instructions in hardware, they have to be implemented in application (library) code. > So I guess g++ is not that clever. The function (mypolar) I used was > defined according to the std::polar function I posted elsewhere in this > thread: I got a similar result with no optimization: $ c++ -S -DUSE_SINCOS sincos.cc -o sincos1.s && c++ -S sincos.cc -o sincos2.s && diff sincos{1,2}.s | less < movq %rcx, %rsi < movq %rdx, %rdi < movq %rax, -200(%rbp) < movsd -200(%rbp), %xmm0 < call sincos < movsd -192(%rbp), %xmm0 < movsd -184(%rbp), %xmm1 < movsd %xmm1, -168(%rbp) < movsd %xmm0, -176(%rbp) --- > movq %rax, -168(%rbp) 116c106,108 < movsd -32(%rbp), %xmm1 --- > call sin > movapd %xmm0, %xmm1 > movsd -32(%rbp), %xmm0 119,120c111,116 < movsd -176(%rbp), %xmm0 < movsd -40(%rbp), %xmm1 --- > call cos > movapd %xmm0, %xmm1 > movsd -40(%rbp), %xmm0 But I got a definitely different result with -O2: $ c++ -S -O2 -DUSE_SINCOS sincos.cc -o sincos1.s && c++ -S -O2 sincos.cc -o sincos2.s && diff sincos{1,2}.s 55c55 < cmpq %rbx, %rax --- > cmpq %rax, %rbx 96c96 < cmpq %rbx, %r15 --- > cmpq %r15, %rbx So indeed, with -O2 the result is practically identical. > } > I had placed [0, M_2_PI)-uniformly distributed angles into the real > components of the elements of vc beforehand. Note that M_2_PI is defined as "two times the reciprocal of pi" **) sample code: #include <iostream> #include <vector> #include <cmath> #include <ctime> #define ANGLES 1024 #define ITER (1024*1024) int main() { unsigned long evalCount = 0; double sinAcc = 0.0; double cosAcc = 0.0; std::vector<double> angles(ANGLES, 0.0); for(int a=0; ANGLES>a; ++a) { angles[a] = a*(2.0*M_PI/ANGLES); } clock_t startTime = clock(); for(int iter=0; ITER>iter; ++iter) { for(auto a : angles) { #ifdef USE_SINCOS double sinVal, cosVal; sincos(a, &sinVal, &cosVal); sinAcc += sinVal; cosAcc += cosVal; #else sinAcc += sin(a); cosAcc += cos(a);
Subscribe to:
Post Comments (Atom)
|
No comments:
Post a Comment