soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

"Need for Speed - C++ versus Assembly Language" - 15 Updates
Error message when defining a static data member - 4 Updates
Finding an objects location in an array - 3 Updates
Error message when defining a static data member - 3 Updates

"Need for Speed - C++ versus Assembly Language"

Gareth Owen <gwowen@gmail.com>: May 10 07:08AM +0100

> You, and Chris Vine are always bulling me since years. Insults after
> insults, I am used to your stuff.

You take everyone who disagrees with you as a personal insult.
Get over yourself.

> No one can insult me, unless I give some importance to their words.

Right back at you fella.

jacobnavia <jacob@jacob.remcomp.fr>: May 10 09:15AM +0200

Le 09/05/2017 à 23:06, Ian Collins a écrit :
> saying that the hand rolled code will be faster on the processor (model,
> not family) it was written for, but might not be faster on next year's
> model.

That can be the case. It suffices to say that Intel is really an
ecxample here, with shifts becvoming more expensive than multiplies in
some models, for instance.

But this applies to compiled code also.

What I am speaking about is this

int i;

for (i=0; i<32;i++) {
if (data & (1 << i))
break;
}

This searches for the rightmost bit set in "data".

The WHOLE loop can be replaced by a single instruction (either bsf or
bsr, I do not remember right now).

The point is, a human UNDERSTANDS what the machine is doing, and can
optimize things that no compiler is now able to recognize.

David Brown <david.brown@hesbynett.no>: May 10 11:26AM +0200

On 09/05/17 17:53, bitrex wrote:
> going to be more expensive to be reading and writing this particular
> variable that you need to be updating every interrupt cycle in and out
> of SRAM then just leaving it in place.

There is a medium ground here, between C/C++ and assembler - the use of
compiler and target extensions. Someone mentioned that a C compiler
might not do as good a job as an assembler programmer on SIMD
vectorisation, because it does not know about the alignments - thus you
have compiler extensions such as gcc's __attribute__((aligned)) to give
the compiler that information.

In this case, you want a global variable to remain in processor
registers - you can do that with a gcc extension:

register uint8_t glob asm("r8");

(I can't remember the syntax for an asm register variable that uses
multiple registers.)

And while the AVR has quite a number of GPRs, they get used up quickly
because they are all 8 bit - reserving 4 for a global variable is likely
to affect code quality somewhat.

David Brown <david.brown@hesbynett.no>: May 10 11:37AM +0200

On 10/05/17 09:15, jacobnavia wrote:
> bsr, I do not remember right now).

> The point is, a human UNDERSTANDS what the machine is doing, and can
> optimize things that no compiler is now able to recognize.

Human understanding of compiler manuals is also useful when you want
optimal code. For gcc, this is just "__builtin_ffs(data)". Many other
compilers will have similar extensions or intrinsics. You might need to
make a macro that is wrapped in conditional compilation depending on the
compiler (with your standard C code above as a fall-back), but it is
still much more portable than assemble - and will give more
opportunities for compiler optimisation.

jacobnavia <jacob@jacob.remcomp.fr>: May 10 12:41PM +0200

Le 10/05/2017 à 11:37, David Brown a écrit :
> compiler (with your standard C code above as a fall-back), but it is
> still much more portable than assemble - and will give more
> opportunities for compiler optimisation.

Yes, gcc is the best compiler in the universe, David, I know that.

Now, that was an EXAMPLE of course.

But this is moot. Do not use assembly, it is better that you stick to c++.

David Brown <david.brown@hesbynett.no>: May 10 02:14PM +0200

On 10/05/17 12:41, jacobnavia wrote:
>> still much more portable than assemble - and will give more
>> opportunities for compiler optimisation.

> Yes, gcc is the best compiler in the universe, David, I know that.

If you think that, that's fine. If you prefer to read what I wrote,
rather than making snide remarks, you will see that I gave that as an
example - because it is an example that is easily tested and verified,
and easy for you to look up the manual. I /could/ have picked
CodeWarrioer 10.1 for the PowerPC as an example which has something
similar - but that would involve me looking up the details, and you
would not be able to check them. I am fairly confident that MSVC,
clang, Intel icc, and various other compilers have similar features -
which is why I wrote exactly that.

> Now, that was an EXAMPLE of course.

Of course it was. And I showed an example of how an understanding of
your tools can mean you might not need assembly for that kind of
purpose. There will be many other examples where you might at first
think you'd need to write hand-coded assembly for efficiency, but
compilers can generate as good or better code. There will be a few
examples where the hand-written assembly really is significantly better
than even the best compilers can generate, even with compiler-specific
extensions - but such examples are getting fewer and more obscure as
compilers get better.

> But this is moot. Do not use assembly, it is better that you stick to c++.

It almost always is better to stick to C or C++. It is very rare that
using assembly makes sense. There are few cases where there is a
significant speedup - and in many cases, it may look like the assembly
code is faster when in fact it is not. Making assembly code that is
faster on a wide variety of targets, rather than just one particular
model of cpu, is particularly hard. Making such code in a way that
interacts efficiently with surrounding code is also a problem - the
hand-written assembly maybe faster in isolation, but in combination with
other code in C or C++, the total result is slower.

One of the few situations where assembly can be faster is precisely your
example - when the cpu supports an instruction that is difficult to
express in plain C, or difficult for a compiler to identify in plain C
(let's forget about builtins and intrinsics for the moment). In that
case, you might well want to make a static inline function that wraps
the assembly instruction. You want the assembly involved to be minimal.
So for example, I have these definitions for some ARM code:

static inline uint16_t swapEndian16(uint16_t x) {
uint16_t y;
asm ("rev16 %[y], %[x]" : [y] "=r" (y) : [x] "r" (x) : );
return y;
}

static inline uint32_t swapEndian32(uint32_t x) {
uint32_t y;
asm ("rev %[y], %[x]" : [y] "=r" (y) : [x] "r" (x) : );
return y;
}

(If it makes you feel better, pretend it is for the CodeWarrior ARM
compiler, not gcc - that compiler supports the same syntax for inline
assembly.)

These minimal assembly wrappers let me take advantage of the best
assembly instructions for the job, while allowing the compiler to
generate the rest of the code.

scott@slp53.sl.home (Scott Lurndal): May 10 12:42PM

> if (data & (1 << i))
> break;
>}

Or the programmer can use a compiler intrinsic, such as
GCC's __builtin_ffsll (or __builtin_clz for leftmost bit).

e.g.

static inline int log2(uint64_t x) {
int i = 0;
//while (x>>=1) { i++; }
i = 63 - __builtin_clzll(x);
if (i < 0) i = 0;
return i;
}

Bonita Montero <Bonita.Montero@gmail.com>: May 10 02:53PM +0200

> if (data & (1 << i))
> break;
> }

There are intrinsics for this pupose.
And no, this is not assembly.

jacobnavia <jacob@jacob.remcomp.fr>: May 10 03:06PM +0200

Le 10/05/2017 à 14:53, Bonita Montero a écrit :
>> break;
>> }

> There are intrinsics for this pupose.

Not in all compilers, but anyway, this is an example of a long high
level construct that can be converted to a single instruction.

Byte swapping is also such an example, and there are many others.

> And no, this is not assembly.

In the case of an intrinsic certainly, it is not assembler. It is a non
portable construct geared to a single compiler.

jacobnavia <jacob@jacob.remcomp.fr>: May 10 03:10PM +0200

Le 10/05/2017 à 14:53, Bonita Montero a écrit :
>> }

> There are intrinsics for this pupose.
> And no, this is not assembly.

Yes, there are intrinsics in some compilers for this.

Many other examples are available:

o Carry handling in the four operations.
o Overflow testing
o Interrupts

etc.

David Brown <david.brown@hesbynett.no>: May 10 03:29PM +0200

On 10/05/17 15:06, jacobnavia wrote:

> Not in all compilers, but anyway, this is an example of a long high
> level construct that can be converted to a single instruction.

> Byte swapping is also such an example, and there are many others.

uint32_t endianSwap1(uint32_t x) {
return ((x & 0xff) << 24)
| ((x & 0xff00) << 8)
| ((x & 0xff0000) >> 8)
| ((x & 0xff000000) >> 24);
}

uint32_t endianSwap2(uint32_t x) {
return __builtin_bswap32(x);
}

gcc turns both of these into a "bswap" instruction. Maybe not all
compilers will do so, but it is certainly possible for a compiler to
recognise such patterns.

>> And no, this is not assembly.

> In the case of an intrinsic certainly, it is not assembler. It is a non
> portable construct geared to a single compiler.

Yes, indeed. No one denies that to get optimal code for a target you
will sometimes need target-specific extensions that may not be portable
to other targets, or may not be portable to other compilers. But in
either case, they are still more portable than assembly - it's a
half-way option.

jacobnavia <jacob@jacob.remcomp.fr>: May 10 03:56PM +0200

Le 10/05/2017 à 15:29, David Brown a écrit :
> they are still more portable than assembly

x86 assembly is fully portable to:

MAC OS X
Windows
Linux

That's almost 100 of the PC market.

scott@slp53.sl.home (Scott Lurndal): May 10 02:30PM

>MAC OS X
>Windows
>Linux

Nonsense - linux runs on hundreds of processor types.

>That's almost 100 of the PC market.

Which is almost irrelevent now, as the pc market is less
than 10% of the overall computer market.

David Brown <david.brown@hesbynett.no>: May 10 04:30PM +0200

On 10/05/17 15:56, jacobnavia wrote:

> MAC OS X
> Windows
> Linux

No it is not.

x86 assembly code is not directly portable to different assemblers.
Inline assembly in C is a little better - if you use gcc's format, it is
portable to gcc, icc, clang, and perhaps other compilers.

x86 code is either 32-bit or 64-bit, and not directly portable from one
to the other - even though much of it is the same, there are usually
still changes to be made.

x86 assembly code will work on a range of x86 chips if you stick to a
common subset - but if you are trying to write optimal code (and if you
are not, why are you bothering with assembly?) then you need to
fine-tune it for all sorts of different x86 devices. On one cpu, MMX
instructions might be faster - on another, SSE. On one device,
unrolling a loop might be faster but on a different device, instruction
prefetch buffers may mean the loop format is faster.

> That's almost 100 of the PC market.

That is a rapidly declining share of the computing world, and one in
which the small speed optimisations available with assembly is of
declining relevance. x86 assembly is useful for compiler writers,
low-level support libraries (clearly it is useful to /you/), low-level
systems code which C cannot handle (such as working with interrupts),
and occasional libraries where it is worth making an enormous effort for
tiny speed differences. For almost all people programming for x86
systems, if you are using assembly for anything except fun, you are
making a mistake.

On other platforms, especially the embedded world, there is more scope
for useful assembly - but only in a tiny fraction of code.

Bonita Montero <Bonita.Montero@gmail.com>: May 10 05:14PM +0200

>> There are intrinsics for this pupose.

> Not in all compilers, ...

In all relevant compilers, i.e. g++, clang, msvc++ and intel-c++.

> Byte swapping is also such an example, and there are many others.

The above compilers cover everything with intrinsics the c++-language
doesn't supply.

> In the case of an intrinsic certainly, it is not assembler.
> It is a non portable construct geared to a single compiler.

And assembly is portable?

Error message when defining a static data member

Ian Collins <ian-news@hotmail.com>: May 10 08:41PM +1200

On 05/10/17 08:32 PM, Stefan Ram wrote:
> main.cpp:40:41: error: 'my_class' in 'class std::vector<my_class::listentry>' does not name a type
> ::std::vector< ::my_class::listentry >::my_class::list;
> ^

All those spurious colons make it hard to read and it looks like they
have confused you as well.. Shouldn't that be

std::vector<my_class::listentry> my_class::list;

--
Ian

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: May 10 10:44AM +0200

On 10-May-17 10:32 AM, Stefan Ram wrote:
> main.cpp:40:41: error: 'my_class' in 'class std::vector<my_class::listentry>' does not name a type
> ::std::vector< ::my_class::listentry >::my_class::list;
> ^

`::` does double duty both as scope resolution operator and as name of
the global scope.

You intend the latter but you get the former.

Just omit that `::`. ;-)

Cheers & hth.,

- Alf

Ian Collins <ian-news@hotmail.com>: May 10 10:30PM +1200

On 05/10/17 09:01 PM, Stefan Ram wrote:

> Now I see. Thank you!

> It seems that I also can use braces:

> ::std::vector< ::my_class::listentry >( ::my_class::list );

It's much easier and clearer to omit the superfluous colons.

--
Ian

scott@slp53.sl.home (Scott Lurndal): May 10 12:47PM

> It seems that I also can use braces:

>::std::vector< ::my_class::listentry >( ::my_class::list );

Or do as has been suggested and lose the leading "::". I really
hope you don't teach your students that practice, as it will cause
them problems once they hit the real world.

Finding an objects location in an array

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: May 10 04:17AM +0200

> the x,y to the constructor at "new", but perhaps the 2011 or 2014 standards
> have added automated this at all? Or is there some STL trick that can achieve
> this using a container?

For a raw array all that you have to work with is the default
constructor, and all that it knows about the object it's initializing,
is its address.

If, however, it has available the start address of the array, a pointer
to first item in the array, then it can compute its array index.

This means a class that's not usable as anything but array item. But
maybe it can be done as templated wrapper for the "real" array item class.

* * *

An alternative is to just define a factory function that first creates
the array, and then loops over all items, passing them the item index.

Cheers & hth.,

- Akf

spud@potato.field: May 10 08:29AM

On Wed, 10 May 2017 04:17:59 +0200
>An alternative is to just define a factory function that first creates
>the array, and then loops over all items, passing them the item index.

Sure, I mean writing 2 loops and manually creating the objects and passing
them their location isn't hard, but initialising arrays of objects is so common
I wondered if buried somewhere in the latest standards there was some special
constructor that would be passed the array location of the object.

Since they've thrown in so much other rubbish into 2011 and 2014 which barely
anyone will ever use I imagine this would have been a useful addition.

--
Spud

spud@potato.field: May 10 11:14AM

On 10 May 2017 09:32:28 GMT
>>} myarray[10][20];
>>Is there a way for each individual object to find its x,y location?

> Here's my quick take at it:

Its certainly an interesting method, but its not exactly clean and simple and
having to hard code the arry dimensions into the code rather defeats the
point :) Also shouldn't "entry->major = i / 20u" be "i / 10u"?

--
Spud

Error message when defining a static data member

ram@zedat.fu-berlin.de (Stefan Ram): May 10 08:32AM

I have this line of code:

using u = ::my_class; ::std::vector< ::my_class::listentry >u::list;

, and it compiles just fine. However,
if I change it to

::std::vector< ::my_class::listentry >::my_class::list;

, I get this error from GCC (IIRC 5.1):

main.cpp: At global scope:
main.cpp:40:41: error: 'my_class' in 'class std::vector<my_class::listentry>' does not name a type
::std::vector< ::my_class::listentry >::my_class::list;
^

. For your reference, here is the complete program:

#include <initializer_list>
#include <iostream>
#include <ostream>
#include <string>
#include <vector>

using namespace ::std::literals;

static
class my_class /* line from OP */
{ /* line from OP */

public:

struct listentry
{ const my_class * const object;
int major;
int minor; };

static ::std::vector<::my_class::listentry> list;

static void push_into_the_list( my_class const * const object )
{ list.push_back( { object, 0, 0 } ); }

static void assign_coordinates_to_each_object_from_the_list()
{ size_t i = 0; for( listentry & entry : list )
{ entry.major = i / 20;
entry.minor = i++ % 20; }}

static void accept( my_class const * const object )
{ push_into_the_list( object );
assign_coordinates_to_each_object_from_the_list(); }

my_class()
{ accept( this ); }

} myarray[10][20]; /* line from OP */

::std::vector< ::my_class::listentry >::my_class::list;

int main()
{ ::std::cout << static_cast< void * >( &myarray )<< '\n'; }

ram@zedat.fu-berlin.de (Stefan Ram): May 10 09:01AM

>the global scope.
>You intend the latter but you get the former.
>Just omit that `::`. ;-)

Now I see. Thank you!

It seems that I also can use braces:

::std::vector< ::my_class::listentry >( ::my_class::list );

.

ram@zedat.fu-berlin.de (Stefan Ram): May 10 09:32AM

> :
>} myarray[10][20];
>Is there a way for each individual object to find its x,y location?

Here's my quick take at it:

main.cpp

#include <algorithm>
#include <cassert>
#include <initializer_list>
#include <iostream>
#include <iterator>
#include <ostream>
#include <string>
#include <vector>

using namespace ::std::literals;

static
class my_class /* line from OP */
{ /* line from OP */

public:

size_t major;
size_t minor;

static ::std::vector< ::my_class* >list;

static void push_into_the_list( my_class * const address )
{ list.push_back( address ); }

static void sort_the_list()
{ sort( begin( ::my_class::list ), end( ::my_class::list )); }

static void assign_coordinates_to_each_object_from_the_list()
{ size_t i = 0; for( ::my_class * entry : list )
{ entry->major = i / 20u;
entry->minor = i++ % 20u; }}

static void statically_register_object_address
( my_class * const address )
{ push_into_the_list( address );
sort_the_list();
assign_coordinates_to_each_object_from_the_list(); }

my_class(): major{ 0u }, minor{ 0u }
{ statically_register_object_address( this ); }

} myarray[10][20]; /* line from OP */

::std::vector< ::my_class* >( ::my_class::list );

int main()
{ for( int i = 0; i < 10; ++i )
for( int j = 0; j < 20; ++j )
{ assert( myarray[ i ][ j ].major == i );
assert( myarray[ i ][ j ].minor == j ); }}

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Wednesday, May 10, 2017

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

No comments:

Blog Archive

About Me