Saturday, January 4, 2020

Digest for comp.lang.c++@googlegroups.com - 15 updates in 7 topics

wolfgang bauer <schutz@gmx.de>: Jan 04 03:15PM +0100

Till now I was not aware of some details.
 
Thank you all, for shedding some light onto it.
James Kuyper <jameskuyper@alumni.caltech.edu>: Jan 04 09:48AM -0500

On 1/4/20 2:56 AM, Keith Thompson wrote:
>> ranges into bit counts to better match your question.
 
> A quibble: the required ranges of values for the standard integer types
> are copied from the C standard, but are not incorporated by reference.
 
In my copy of n4567.pdf, 18.3.3 says:
 
"1 Table 31 describes the header <climits>.
2 The contents are the same as the Standard C library header <limits.h>"
 
18.3.3p2 is precisely the kind of wording that "incorporated by
reference" means to me. What does it mean to you?
 
The "description" in table 31 is the only place that the C++ standard
that all of the *_MIN and *_MAX macros are referred to. The meanings of
those macros and the maximum and minimum (respectively) permitted values
for those macros which are only given in the C standard. If you read
only the C++ standard, you might not even realize that it imposes any
limits on the sizes of integer types, however indirectly.
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jan 04 02:48PM -0800

> for those macros which are only given in the C standard. If you read
> only the C++ standard, you might not even realize that it imposes any
> limits on the sizes of integer types, however indirectly.
 
You're right. N4567 3.9.1 [basic.fundamental] paragraph 3 says:
 
The signed and unsigned integer types shall satisfy the constraints
given in the C standard, section 5.2.4.2.1.
 
My earlier post was based on N4842, which is a working draft for C++20
(and happens to be the document that I had open at the time). In that
draft, 6.8.1 [basic.fundamental] paragraph 3 includes a table showing
the minimum widths of the 5 signed integer types:
 
Type Minimum width N
signed char 8
short 16
int 16
long 32
long long 64
 
The widths are sufficient to specify the ranges, since unlike the
current edition of the C++ standard, N4842 mandates 2's-complement
for signed integers, including the extra negative value:
 
The range of representable values for a signed integer type is
−2**(N−1) to 2**(N−1) − 1 (inclusive), where N is called the *width*
of the type.
...
An unsigned integer type has the same object representation,
value representation, and alignment requirements (6.7.6) as
the corresponding signed integer type. For each value x of a
signed integer type, the value of the corresponding unsigned
integer type congruent to x modulo 2**N has the same value of
corresponding bits in its value representation.
 
(expressions tweaked to avoid superscripts).
 
Of course N4842 is not a standard, and I should have checked the current
edition.
 
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
[Note updated email address]
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
Frederick Gotham <cauldwell.thomas@gmail.com>: Jan 04 06:12AM -0800

On Friday, January 3, 2020 at 7:33:08 PM UTC, Paavo Helde wrote:
 
> probably need to define some extra stuff. The allocator requirements
> have been in great flux in the recent standards, I'm not sure what it is
> missing exactly.
 
 
 
I've tried this on three compilers: GNU, Microsoft, Clang
 
The original code which has two template parameters, "typename T, std::size_t capacity", only compiles on the Clang compiler.
 
The second version with only one template parameter, "typename T", compiles on all three compilers.
 
Since the second version works on all three compilers, I don't think that this problem is anything to do with how allocators are implemented in the respective standard libraries for these three compilers. Making the change from two parameters to one parameter shouldn't cause compilation to fail.
 
Here's the code for the second version which works on all three compilers (I've just commented out the 2nd parameter):
 
#include <cstddef> /* size_t */
#include <new> /* Only for bad_alloc */
 
std::size_t constexpr capacity = 4; /* This is instead of a template parameter */
 
template<typename T /*, std::size_t capacity*/ >
class StaticAllocator {
public:
typedef T value_type;
 
protected:
 
static T buf[capacity];
 
public:
 
T *allocate(std::size_t const n)
{
if (n > capacity)
throw std::bad_alloc();
 
return buf;
}
 
void deallocate(T *, std::size_t)
{
/* Do Nothing */
}
};
 
template<typename T /*, std::size_t capacity*/ >
T StaticAllocator<T /*,capacity*/ >::buf[capacity];
 
using std::size_t;
 
#include <vector>
using std::vector;
 
#include <iostream>
using std::cout;
using std::endl;
 
auto main(void) -> int
{
vector< char, StaticAllocator<char /*, 4*/ > > v;
 
v.push_back('a');
v.push_back('b');
v.push_back('c');
v.push_back('d');
 
for (auto const &elem : v)
cout << elem << endl;

vector< char, StaticAllocator<char /*, 4 */> > v2;
 
v2.push_back('x');
v2.push_back('y');
v2.push_back('z');
 
for (auto const &elem : v2)
cout << elem << endl;

// Now try the first vector again

for (auto const &elem : v)
cout << elem << endl;
}
Bo Persson <bo@bo-persson.se>: Jan 04 03:53PM +0100

On 2020-01-04 at 15:12, Frederick Gotham wrote:
 
> for (auto const &elem : v)
> cout << elem << endl;
> }
 
Seems like the culprit is the rebind member template from the allocator
requirements. MSVC uses that to make sure that the allocator used for
vector<T> really allocates T's:
 
using _Rebind_alloc_t =
typename allocator_traits<_Alloc>::template rebind_alloc<_Value_type>;
 
 
The allocator table says:
 
A::template rebind<U>::other (optional)[1]
 
with the very important note:
 
"Notes:
 
rebind is only optional (provided by std::allocator_traits) if this
allocator is a template of the form SomeAllocator<T, Args>, where Args
is zero or more additional template type parameters."
 
https://en.cppreference.com/w/cpp/named_req/Allocator#cite_note-1
 
 
As your second template parameter is a non-type template parameter (the
value 4), it doesn't *fully* comply with these requirements and so a
compiler doesn't have to accept it.
 
 
Apparently, some compilers might work if the allocator is *almost*
correct, but they don't have to.
 
 
Bo Persson
Frederick Gotham <cauldwell.thomas@gmail.com>: Jan 04 02:34PM -0800

Bo wrote:
 
> rebind is only optional (provided by std::allocator_traits) if this
> allocator is a template of the form SomeAllocator<T, Args>, where Args
> is zero or more additional template type parameters."
 
 
Well spotted. Here's my workaround:
 
 
#include <cstddef> /* size_t */
#include <new> /* Only for bad_alloc */
 
template <std::size_t capacity>
class Outer {
 
template<typename T>
class StaticAllocator {
public:
typedef T value_type;
 
protected:
 
static T buf[capacity];
 
public:
 
T *allocate(std::size_t const n)
{
if (n > capacity)
throw std::bad_alloc();
 
return buf;
}
 
void deallocate(T *, std::size_t)
{
/* Do Nothing */
}
};
 
};
 
template<std::size_t capacity>
template<typename T>
T Outer<capacity>::StaticAllocator<T>::buf[capacity];
 
using std::size_t;
 
#include <vector>
using std::vector;
 
#include <iostream>
using std::cout;
using std::endl;
 
auto main(void) -> int
{
vector< char, Outer<4>::StaticAllocator<char> > v;
 
v.push_back('a');
v.push_back('b');
v.push_back('c');
v.push_back('d');
 
for (auto const &elem : v)
cout << elem << endl;

vector< char, Outer<4>::StaticAllocator<char> > v2;
 
v2.push_back('x');
v2.push_back('y');
v2.push_back('z');
 
for (auto const &elem : v2)
cout << elem << endl;

// Now try the first vector again

for (auto const &elem : v)
cout << elem << endl;
}
 
 
Now I can get back to testing my new allocator.
Ike Naar <ike@sdf.lonestar.org>: Jan 04 09:46PM

> bool operator<(const struct Foo &r) const { //needed for set
> if (i<r.i) return true;
> return j<r.j;
 
This looks suspect.
Do you want (2,1) to be less than (1,2) ?
if you want to define a lexicographical order on (i,j), the comparison should be
 
return i<r.i || (i==r.i && j<r.j);
"Öö Tiib" <ootiib@hot.ee>: Jan 04 05:21AM -0800

On Saturday, 4 January 2020 12:04:24 UTC+2, Bonita Montero wrote:
> > - the TIFF file format
> > - linker symbols as seen in the output from 'nm -CP'
 
> That has nothing to do with parsing.
 
It is extending all "deserializtion" into "parsing" that is
pedantically wrong to do. But your "nothing to do" is
exaggeration. Parsing is subset of activities, deserialization
of text formats.
Jorgen Grahn <grahn+nntp@snipabacken.se>: Jan 04 03:04PM

On Sat, 2020-01-04, Öö Tiib wrote:
> pedantically wrong to do. But your "nothing to do" is
> exaggeration. Parsing is subset of activities, deserialization
> of text formats.
 
(Note that the output from nm -CP is text, an address and a C++ name.)
 
I was not aware at all that there is a distinction. If there is one,
it must be hard to draw the line. Recursive definition?
 
Not that matters much.
 
The real reason I brought it up is that there's often a choice when
designing a data format:
 
- use XML or JSON or similar, and you need help parsing it
- make up your own simpler format (often "key: value" is enough) and
you can make your own parser, can use normal Unix tools on it ...
but don't get any help from XML or JSON tools.
 
This second option may not be popular right now, but it does exist.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
"Öö Tiib" <ootiib@hot.ee>: Jan 04 12:39PM -0800

On Saturday, 4 January 2020 17:04:29 UTC+2, Jorgen Grahn wrote:
 
> I was not aware at all that there is a distinction. If there is one,
> it must be hard to draw the line. Recursive definition?
 
> Not that matters much.
 
Are you saying the term has widened from linguistics (where parsing is
semantic analysis of text) to computer science where it now means
any kind of deserializations? I am last from whom to ask extent of
modern English anyway.
 
> you can make your own parser, can use normal Unix tools on it ...
> but don't get any help from XML or JSON tools.
 
> This second option may not be popular right now, but it does exist.
 
I myself like to use well-established portable formats, (like say png
for raster picture). Then I use json for everything for what I don't
have such format. That might result with lot of files that I prefer
to zip into one file in Open Document Format style.
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Jan 04 08:50PM

>> the top 5% of parsers performance-wise.
 
> Of course you did. Did you solve the 3 body problem and world peace at the
> same time?
 
Yes I did: if you don't believe me then you simply have to look at the source code as it is on github. Now kindly fuck off.
 
/Flibble
 
--
"Snakes didn't evolve, instead talking snakes with legs changed into snakes." - Rick C. Hodgin
 
"You won't burn in hell. But be nice anyway." – Ricky Gervais
 
"I see Atheists are fighting and killing each other again, over who doesn't believe in any God the most. Oh, no..wait.. that never happens." – Ricky Gervais
 
"Suppose it's all true, and you walk up to the pearly gates, and are confronted by God," Byrne asked on his show The Meaning of Life. "What will Stephen Fry say to him, her, or it?"
"I'd say, bone cancer in children? What's that about?" Fry replied.
"How dare you? How dare you create a world to which there is such misery that is not our fault. It's not right, it's utterly, utterly evil."
"Why should I respect a capricious, mean-minded, stupid God who creates a world that is so full of injustice and pain. That's what I would say."
Soviet_Mario <SovietMario@CCCP.MIR>: Jan 04 05:58PM +0100

On 03/01/20 20:12, Öö Tiib wrote:
 
> The <algorithm> also you need to compile in compiler set to C++17.
> That perhaps means adding
 
> CONFIG += c++17
 
tnx to you both
 
I'd have to verify if GCC installed supports such a recent
standard.
But I guess some proper version of algorithm exist also in
less recent version, or at least I hope
 
 
 
--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)
Bonita Montero <Bonita.Montero@gmail.com>: Jan 04 02:22PM +0100

> I suspect the results will be highly dependent on details, like the
> exact chip you are using, and where you draw the line between "small
> blocks" and "big blocks".
Here's a little benchmark that compares rep movsq with avx-copying
(without loop-unrolling!):
 
C++-Code:
 
#include <Windows.h>
#include <iostream>
#include <cstring>
#include <cstdint>
#include <chrono>
#include <intrin.h>
 
using namespace std;
using namespace chrono;
 
extern "C" void fAvx( __m256 *src, __m256 *dst, size_t size, size_t
repts );
extern "C" void fMovs( __m256 *src, __m256 *dst, size_t size, size_t
repts );
 
int main()
{
size_t const PAGE = 4096,
ROUNDS = 100'000;
char *pPage = (char *)VirtualAlloc( nullptr, 2 * PAGE,
MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE );
__m256 *src = (__m256 *)pPage,
*dst = (__m256 *)(pPage + PAGE);
memset( pPage, 0, 2 * PAGE );
using timestamp = time_point<high_resolution_clock>;
for( size_t size = 1; size <= (PAGE / 32); ++size )
{
timestamp start = high_resolution_clock::now();
fAvx( src, dst, size, ROUNDS );
uint64_t avxNs = (uint64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count();;
start = high_resolution_clock::now();
fMovs( src, dst, size, ROUNDS );
uint64_t movsNs = (uint64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count();;
cout << "size: " << size << "\tavx:\t" << avxNs / 1.0E6 <<
"\tmovs\t" << movsNs / 1.0E6 << endl;
}
}
 
Asm-Code:
 
_TEXT SEGMENT
 
; void fAvx( __m256 *src, __m256 *dst, size_t count, size_t repts );
; rcx: src
; rdx: dst
; r8: count
; r9: repts
fAvx PROC
test r9, r9
jz zero
test r8, r8
jz zero
mov rax, r8
shl rax, 5
add rax, rcx
sub rdx, rcx
mov r10, rcx
mov r11, rdx
jmp avxLoop
reptLoop:
mov rcx, r10
mov rdx, r11
avxLoop:
vmovups ymm0, [rcx]
vmovups [rcx+rdx], ymm0
add rcx, 32
cmp rcx, rax
jne avxLoop
dec r9
jnz reptLoop
zero:
ret
fAvx ENDP
 
; void fMovs( __m256 *src, __m256 *dst, size_t count, size_t repts );
; rcx: src
; rdx: dst
; r8: count
; r9: repts
fMovs PROC
test r9, r9
jz zero
push rsi
push rdi
mov r10, rcx
mov r11, rdx
lea rdx, [r8 * 4]
reptLoop:
mov rsi, r10
mov rdi, r11
mov rcx, rdx
rep movsq
dec r9
jnz reptLoop
pop rdi
pop rsi
zero:
ret
fMovs ENDP
 
_TEXT ENDS
END
 
That's the relative speedup of AVX over rep movsq:
 
size: 1 1383,79%
size: 2 737,12%
size: 3 433,35%
size: 4 342,41%
size: 5 283,20%
size: 6 431,57%
size: 7 351,47%
size: 8 340,53%
size: 9 314,24%
size: 10 325,57%
size: 11 270,96%
size: 12 327,83%
size: 13 296,13%
size: 14 275,73%
size: 15 284,19%
size: 16 317,27%
size: 17 331,54%
size: 18 266,05%
size: 19 287,00%
size: 20 281,83%
size: 21 276,17%
size: 22 261,85%
size: 23 263,01%
size: 24 251,48%
size: 25 247,98%
size: 26 237,64%
size: 27 239,66%
size: 28 187,04%
size: 29 185,92%
size: 30 189,09%
size: 31 168,90%
size: 32 179,31%
size: 33 220,31%
size: 34 192,71%
size: 35 207,33%
size: 36 214,69%
size: 37 156,90%
size: 38 169,47%
size: 39 184,87%
size: 40 159,98%
size: 41 175,79%
size: 42 156,60%
size: 43 162,29%
size: 44 155,36%
size: 45 158,09%
size: 46 164,42%
size: 47 154,88%
size: 48 164,17%
size: 49 155,84%
size: 50 157,59%
size: 51 148,29%
size: 52 152,67%
size: 53 139,59%
size: 54 149,78%
size: 55 140,99%
size: 56 146,94%
size: 57 142,01%
size: 58 148,15%
size: 59 141,62%
size: 60 152,89%
size: 61 152,00%
size: 62 149,20%
size: 63 150,13%
size: 64 150,45%
size: 65 140,96%
size: 66 132,11%
size: 67 142,80%
size: 68 135,96%
size: 69 146,18%
size: 70 140,17%
size: 71 139,63%
size: 72 139,22%
size: 73 131,02%
size: 74 145,43%
size: 75 138,23%
size: 76 132,02%
size: 77 142,05%
size: 78 135,97%
size: 79 136,52%
size: 80 138,93%
size: 81 136,06%
size: 82 138,59%
size: 83 139,08%
size: 84 134,50%
size: 85 136,64%
size: 86 134,28%
size: 87 133,35%
size: 88 129,82%
size: 89 138,07%
size: 90 132,57%
size: 91 125,16%
size: 92 138,73%
size: 93 135,70%
size: 94 131,55%
size: 95 126,62%
size: 96 134,87%
size: 97 130,83%
size: 98 129,21%
size: 99 126,70%
size: 100 133,07%
size: 101 129,39%
size: 102 129,12%
size: 103 125,27%
size: 104 124,14%
size: 105 131,78%
size: 106 132,87%
size: 107 131,40%
size: 108 128,29%
size: 109 122,95%
size: 110 121,13%
size: 111 121,73%
size: 112 126,26%
size: 113 130,87%
size: 114 131,31%
size: 115 124,70%
size: 116 119,53%
size: 117 121,42%
size: 118 120,34%
size: 119 125,65%
size: 120 124,95%
size: 121 130,36%
size: 122 128,35%
size: 123 128,25%
size: 124 127,47%
size: 125 124,28%
size: 126 124,14%
size: 127 122,69%
size: 128 122,76%
 
So movsq is never faster.
Here's the result graphically: https://app.unsee.cc/#45f34f42
So its also exact the opposite as Melzzz said: movsq becomes
more competitive as the block-size raises.
boltar@nowhere.org: Jan 04 12:43PM

On Fri, 03 Jan 2020 10:09:04 -0600
>done is not really relevant), but has explicitly decided *not* to
>support C99 and later. That's changed a smidgen, of late, and will
>VLAs no longer being mandatory,
 
They're not? Figures, aside from variadic macros they're the only thing in
C99 that I found useful.
boltar@nowhere.org: Jan 04 12:45PM

On Fri, 3 Jan 2020 17:22:08 +0100
 
>Actually, it is.
 
>Section 6.5.6p5 of the C standard says "The result of the binary +
>operator is the sum of the operands." There will be an equivalent
 
Ok, you got me there. I genuinely didn't expect the bleeding obvious to be
included in the standard but then I have better things to do with my time
than read it.
 
>definition in the C++ standard if you choose to look for it.
 
>At what point will you realise you'll benefit more by trying to learn
>from other people, rather than continually making a fool of yourself?
 
Its such fun winding you all up :)
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: