Tuesday, January 31, 2017

Digest for comp.lang.c++@googlegroups.com - 9 updates in 4 topics

woodbrian77@gmail.com: Jan 30 05:14PM -0800

> needs support for such long strings? Does the standard
> require that? I could live with 4 billion as a limit for the
> length of strings.
 
When string_view is 16 bytes it's probably a good idea
to use (lvalue) references with it. If it were 12 bytes,
taking it by value would be more palatable. And if I
understand correctly, taking it by rvalue reference
would be the same as taking it by value.
 
Are others interested in a 12 byte string_view?
 

Brian
Ebenezer Enterprises - "If I give to a needy soul, but don't
have love then who is poor?" for KING & COUNTRY
 
https://duckduckgo.com/?q=%22proof+of+your+love%22+king+and+country&t=ffsb&ia=videos&iax=1&iai=b-2dKOfbC9c
 
http://webEbenezer.net
"Öö Tiib" <ootiib@hot.ee>: Jan 31 12:52AM -0800

> understand correctly, taking it by rvalue reference
> would be the same as taking it by value.
 
> Are others interested in a 12 byte string_view?
 
We go back to such first grade basics now?
 
Imagine that we would only be content with up to 255 byte
view then 1 byte size would be fine? So sizeof string_view
would be 9? Wrong!
 
#include <iostream>
 
struct nah {void* ptr; uint8_t size;};

int main()
{
std::cout << "Size is still " << sizeof (nah)
<< " bytes, Brian.\n";
}
 
What it answers?
Why it is so? ;-)
scott@slp53.sl.home (Scott Lurndal): Jan 31 02:06PM


>> It seems though that putting the pointer first might help
>> in terms of preventing some padding in the type if the pointer
>> is 8 bytes and the length member is 4.
 
On linux systems, size_t is the same size as a pointer
(4 bytes on ia32, 8 bytes on x86_64). size_t must be a
type large enough to represent the entire physical address
space.
 
Can't speak to windows, but I'd find it unusual for size_t to be
4 bytes on any 64-bit system.
 
 
>> length of strings.
 
>When string_view is 16 bytes it's probably a good idea
>to use (lvalue) references with it.
 
Not necessarily, a processor specific ABI's may pass it
in a 128-bit SIMD register when passed by value, or it may
pass a 128-bit value in two integer 64-bit registers. e.g.
as required by the intel x86_64 psABI:
 
The classification of aggregate (structures and arrays) and union types works
as follows:
1. If the size of an object is larger than two eightbytes, or it contains unaligned
fields, it has class MEMORY.
2. If a C++ object has either a non-trivial copy constructor or a non-trivial
destructor 10 it is passed by invisible reference (the object is replaced in the
parameter list by a pointer that has class INTEGER). 11
3. If the size of the aggregate exceeds a single eightbyte, each is classified
separately. Each eightbyte gets initialized to class NO_CLASS.
4. Each field of an object is classified recursively so that always two fields are
considered. The resulting class is calculated according to the classes of the
fields in the eightbyte:
(a) If both classes are equal, this is the resulting class.
(b) If one of the classes is NO_CLASS, the resulting class is the other
class.
(c) If one of the classes is MEMORY, the result is the MEMORY class.
(d) If one of the classes is INTEGER, the result is the INTEGER.
(e) If one of the classes is X87, X87UP, COMPLEX_X87 class, MEM-
ORY is used as class.
(f) Otherwise class SSE is used.
5. Then a post merger cleanup is done:
(a) If one of the classes is MEMORY, the whole argument is passed in
memory.
(b) If SSEUP is not preceeded by SSE, it is converted to SSE.
 
 
 
 
>understand correctly, taking it by rvalue reference
>would be the same as taking it by value.
 
>Are others interested in a 12 byte string_view?
 
Nyet.
woodbrian77@gmail.com: Jan 31 08:43AM -0800

On Tuesday, January 31, 2017 at 2:52:15 AM UTC-6, Öö Tiib wrote:
> << " bytes, Brian.\n";
> }
 
> What it answers?
 
16.
 
> Why it is so? ;-)
 
I think it's due to alignment and arrays. In some cases, a reordering
of the data members can help reduce the size of a class, but not in
this case. Thank you for your reply.
 
 
Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net
scott@slp53.sl.home (Scott Lurndal): Jan 31 05:25PM


>I think it's due to alignment and arrays. In some cases, a reordering
>of the data members can help reduce the size of a class, but not in
>this case. Thank you for your reply.
 
It is so because the compiler is required to align members of structure (MoS)
on natural boundaries. The natural boundary for size_t and any pointer will
be 8-bytes (since both are 8-byte quantities) on modern 64-bit architectures.
 
If the uint8_t preceeds the pointer, then the compiler will need to allocate
7 filler bytes before the pointer. If the uint8_t follows the pointer, the
filler bytes will be added because the structure has a minimum alignment
derived from the largest minimum alignment of any MoS, which in this case is
again eight bytes (consider, for example, an array of this structure - for
the pointer to be aligned correctly in all elements of the array, each element
of the array must be a multiple of 8-bytes in size).
 
You can specify that the structure should be "packed" using implementation
defined mechanisms (__attribute__((packed)) in gcc, #pragma packed in other
compilers) if you really don't want padding between the fields (and are
prepared to take the substantial performance hit from accessing unaligned
data (which requires trapping and fixup on some architectures that don't
allow direct access to unaligned data), and causes substantial pipeline
bubbles on architectures that do support access to unaligned data).
 
The linux tool 'pahole' will extract the structure definition from the
DWARF data in the ELF executable and show where the holes are and how
large they are.
 
e.g.:
 
struct tm {
int tm_sec; /* 0 4 */
int tm_min; /* 4 4 */
int tm_hour; /* 8 4 */
int tm_mday; /* 12 4 */
int tm_mon; /* 16 4 */
int tm_year; /* 20 4 */
int tm_wday; /* 24 4 */
int tm_yday; /* 28 4 */
int tm_isdst; /* 32 4 */
 
/* XXX 4 bytes hole, try to pack */
 
long int tm_gmtoff; /* 40 8 */
const char * tm_zone; /* 48 8 */
 
/* size: 56, cachelines: 1, members: 11 */
/* sum members: 52, holes: 1, sum holes: 4 */
/* last cacheline: 56 bytes */
};
"Chris M. Thomasson" <invalid@invalid.invalid>: Jan 31 01:53PM -0800

On 1/31/2017 9:25 AM, Scott Lurndal wrote:
 
> It is so because the compiler is required to align members of structure (MoS)
> on natural boundaries. The natural boundary for size_t and any pointer will
> be 8-bytes (since both are 8-byte quantities) on modern 64-bit architectures.
 
Well, just to be safe: alignof(size_t)?
 
 
 
 
Juha Nieminen <nospam@thanks.invalid>: Jan 31 03:30PM

> delete [] answers; // <------- (1)
> delete answers; (2)
> }
 
This demonstrates why you should really avoid doing your own allocations,
if you can, and instead use the standard containers, if they suffice for
the task. std::vector does not only make your class much simpler to
implement, but also safer.
 
If you really, really need to do your own allocation, because the
standard containers just don't do what you need, then you have to
either disable or implement the copy constructor and assignment
operator of that class. Else you have a big problem leading to
multiple deallocations and accessing deallocated memory.
 
Disabling the copy constructor and assignment operator is the
easiest solution. But if you class really, really needs them enabled,
then it becomes quite a laborious task to implement them.
ram@zedat.fu-berlin.de (Stefan Ram): Jan 31 02:46PM

>Note that the original question was about the case where the character
>type is biggish, not `char` but perhaps `wchar_t` or `char32_t`.
 
I changed my code to that:
 
using tchar = char32_t;
using tstring = ::std::basic_string< tchar >;
 
static tchar first_unique_char_in( tstring const & s )
{ ::std::unordered_multiset< tchar >const chars( begin( s ), end( s ));
for( tchar const ch : s )
if( chars.count( ch ) == 1 )return ch;
return tchar{ 0 }; }
 
static tchar first_unique_char_of( tstring const & s )
{ auto const z = s.size();
auto i = z; for( i = 0; i < z; ++i )
{ tchar const ch = s[ i ]; bool found = false;
auto j = z; for( j = 0; j < z; ++j )
if( s[ j ]== ch && i != j ){ found = true; break; }
if( !found )return ch; }
return tchar{ 0 }; }
 
When the program runs, it blocks my computer. So, I cannot
afford to have it run for hours. So, the maximum feasible
length for a string for my tests is about 200'000. It gives
 
dti = 3317031700
dtf = 1800200
 
That means, in this case, »first_unique_char_of« is more
than 1000 times faster. (Unless I made an error, so feel
free to reproduce this.)
Jeff-Relf.Me <@.>: Jan 31 04:04AM -0800

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: