- "Need for Speed - C++ versus Assembly Language" - 15 Updates
- Error message when defining a static data member - 4 Updates
- Finding an objects location in an array - 3 Updates
- Error message when defining a static data member - 3 Updates
Gareth Owen <gwowen@gmail.com>: May 10 07:08AM +0100 > You, and Chris Vine are always bulling me since years. Insults after > insults, I am used to your stuff. You take everyone who disagrees with you as a personal insult. Get over yourself. > No one can insult me, unless I give some importance to their words. Right back at you fella. |
jacobnavia <jacob@jacob.remcomp.fr>: May 10 09:15AM +0200 Le 09/05/2017 à 23:06, Ian Collins a écrit : > saying that the hand rolled code will be faster on the processor (model, > not family) it was written for, but might not be faster on next year's > model. That can be the case. It suffices to say that Intel is really an ecxample here, with shifts becvoming more expensive than multiplies in some models, for instance. But this applies to compiled code also. What I am speaking about is this int i; for (i=0; i<32;i++) { if (data & (1 << i)) break; } This searches for the rightmost bit set in "data". The WHOLE loop can be replaced by a single instruction (either bsf or bsr, I do not remember right now). The point is, a human UNDERSTANDS what the machine is doing, and can optimize things that no compiler is now able to recognize. |
David Brown <david.brown@hesbynett.no>: May 10 11:26AM +0200 On 09/05/17 17:53, bitrex wrote: > going to be more expensive to be reading and writing this particular > variable that you need to be updating every interrupt cycle in and out > of SRAM then just leaving it in place. There is a medium ground here, between C/C++ and assembler - the use of compiler and target extensions. Someone mentioned that a C compiler might not do as good a job as an assembler programmer on SIMD vectorisation, because it does not know about the alignments - thus you have compiler extensions such as gcc's __attribute__((aligned)) to give the compiler that information. In this case, you want a global variable to remain in processor registers - you can do that with a gcc extension: register uint8_t glob asm("r8"); (I can't remember the syntax for an asm register variable that uses multiple registers.) And while the AVR has quite a number of GPRs, they get used up quickly because they are all 8 bit - reserving 4 for a global variable is likely to affect code quality somewhat. |
David Brown <david.brown@hesbynett.no>: May 10 11:37AM +0200 On 10/05/17 09:15, jacobnavia wrote: > bsr, I do not remember right now). > The point is, a human UNDERSTANDS what the machine is doing, and can > optimize things that no compiler is now able to recognize. Human understanding of compiler manuals is also useful when you want optimal code. For gcc, this is just "__builtin_ffs(data)". Many other compilers will have similar extensions or intrinsics. You might need to make a macro that is wrapped in conditional compilation depending on the compiler (with your standard C code above as a fall-back), but it is still much more portable than assemble - and will give more opportunities for compiler optimisation. |
jacobnavia <jacob@jacob.remcomp.fr>: May 10 12:41PM +0200 Le 10/05/2017 à 11:37, David Brown a écrit : > compiler (with your standard C code above as a fall-back), but it is > still much more portable than assemble - and will give more > opportunities for compiler optimisation. Yes, gcc is the best compiler in the universe, David, I know that. Now, that was an EXAMPLE of course. But this is moot. Do not use assembly, it is better that you stick to c++. |
David Brown <david.brown@hesbynett.no>: May 10 02:14PM +0200 On 10/05/17 12:41, jacobnavia wrote: >> still much more portable than assemble - and will give more >> opportunities for compiler optimisation. > Yes, gcc is the best compiler in the universe, David, I know that. If you think that, that's fine. If you prefer to read what I wrote, rather than making snide remarks, you will see that I gave that as an example - because it is an example that is easily tested and verified, and easy for you to look up the manual. I /could/ have picked CodeWarrioer 10.1 for the PowerPC as an example which has something similar - but that would involve me looking up the details, and you would not be able to check them. I am fairly confident that MSVC, clang, Intel icc, and various other compilers have similar features - which is why I wrote exactly that. > Now, that was an EXAMPLE of course. Of course it was. And I showed an example of how an understanding of your tools can mean you might not need assembly for that kind of purpose. There will be many other examples where you might at first think you'd need to write hand-coded assembly for efficiency, but compilers can generate as good or better code. There will be a few examples where the hand-written assembly really is significantly better than even the best compilers can generate, even with compiler-specific extensions - but such examples are getting fewer and more obscure as compilers get better. > But this is moot. Do not use assembly, it is better that you stick to c++. It almost always is better to stick to C or C++. It is very rare that using assembly makes sense. There are few cases where there is a significant speedup - and in many cases, it may look like the assembly code is faster when in fact it is not. Making assembly code that is faster on a wide variety of targets, rather than just one particular model of cpu, is particularly hard. Making such code in a way that interacts efficiently with surrounding code is also a problem - the hand-written assembly maybe faster in isolation, but in combination with other code in C or C++, the total result is slower. One of the few situations where assembly can be faster is precisely your example - when the cpu supports an instruction that is difficult to express in plain C, or difficult for a compiler to identify in plain C (let's forget about builtins and intrinsics for the moment). In that case, you might well want to make a static inline function that wraps the assembly instruction. You want the assembly involved to be minimal. So for example, I have these definitions for some ARM code: static inline uint16_t swapEndian16(uint16_t x) { uint16_t y; asm ("rev16 %[y], %[x]" : [y] "=r" (y) : [x] "r" (x) : ); return y; } static inline uint32_t swapEndian32(uint32_t x) { uint32_t y; asm ("rev %[y], %[x]" : [y] "=r" (y) : [x] "r" (x) : ); return y; } (If it makes you feel better, pretend it is for the CodeWarrior ARM compiler, not gcc - that compiler supports the same syntax for inline assembly.) These minimal assembly wrappers let me take advantage of the best assembly instructions for the job, while allowing the compiler to generate the rest of the code. |
scott@slp53.sl.home (Scott Lurndal): May 10 12:42PM > if (data & (1 << i)) > break; >} Or the programmer can use a compiler intrinsic, such as GCC's __builtin_ffsll (or __builtin_clz for leftmost bit). e.g. static inline int log2(uint64_t x) { int i = 0; //while (x>>=1) { i++; } i = 63 - __builtin_clzll(x); if (i < 0) i = 0; return i; } |
Bonita Montero <Bonita.Montero@gmail.com>: May 10 02:53PM +0200 > if (data & (1 << i)) > break; > } There are intrinsics for this pupose. And no, this is not assembly. |
jacobnavia <jacob@jacob.remcomp.fr>: May 10 03:06PM +0200 Le 10/05/2017 à 14:53, Bonita Montero a écrit : >> break; >> } > There are intrinsics for this pupose. Not in all compilers, but anyway, this is an example of a long high level construct that can be converted to a single instruction. Byte swapping is also such an example, and there are many others. > And no, this is not assembly. In the case of an intrinsic certainly, it is not assembler. It is a non portable construct geared to a single compiler. |
jacobnavia <jacob@jacob.remcomp.fr>: May 10 03:10PM +0200 Le 10/05/2017 à 14:53, Bonita Montero a écrit : >> } > There are intrinsics for this pupose. > And no, this is not assembly. Yes, there are intrinsics in some compilers for this. Many other examples are available: o Carry handling in the four operations. o Overflow testing o Interrupts etc. |
David Brown <david.brown@hesbynett.no>: May 10 03:29PM +0200 On 10/05/17 15:06, jacobnavia wrote: > Not in all compilers, but anyway, this is an example of a long high > level construct that can be converted to a single instruction. > Byte swapping is also such an example, and there are many others. uint32_t endianSwap1(uint32_t x) { return ((x & 0xff) << 24) | ((x & 0xff00) << 8) | ((x & 0xff0000) >> 8) | ((x & 0xff000000) >> 24); } uint32_t endianSwap2(uint32_t x) { return __builtin_bswap32(x); } gcc turns both of these into a "bswap" instruction. Maybe not all compilers will do so, but it is certainly possible for a compiler to recognise such patterns. >> And no, this is not assembly. > In the case of an intrinsic certainly, it is not assembler. It is a non > portable construct geared to a single compiler. Yes, indeed. No one denies that to get optimal code for a target you will sometimes need target-specific extensions that may not be portable to other targets, or may not be portable to other compilers. But in either case, they are still more portable than assembly - it's a half-way option. |
jacobnavia <jacob@jacob.remcomp.fr>: May 10 03:56PM +0200 Le 10/05/2017 à 15:29, David Brown a écrit : > they are still more portable than assembly x86 assembly is fully portable to: MAC OS X Windows Linux That's almost 100 of the PC market. |
scott@slp53.sl.home (Scott Lurndal): May 10 02:30PM >MAC OS X >Windows >Linux Nonsense - linux runs on hundreds of processor types. >That's almost 100 of the PC market. Which is almost irrelevent now, as the pc market is less than 10% of the overall computer market. |
David Brown <david.brown@hesbynett.no>: May 10 04:30PM +0200 On 10/05/17 15:56, jacobnavia wrote: > MAC OS X > Windows > Linux No it is not. x86 assembly code is not directly portable to different assemblers. Inline assembly in C is a little better - if you use gcc's format, it is portable to gcc, icc, clang, and perhaps other compilers. x86 code is either 32-bit or 64-bit, and not directly portable from one to the other - even though much of it is the same, there are usually still changes to be made. x86 assembly code will work on a range of x86 chips if you stick to a common subset - but if you are trying to write optimal code (and if you are not, why are you bothering with assembly?) then you need to fine-tune it for all sorts of different x86 devices. On one cpu, MMX instructions might be faster - on another, SSE. On one device, unrolling a loop might be faster but on a different device, instruction prefetch buffers may mean the loop format is faster. > That's almost 100 of the PC market. That is a rapidly declining share of the computing world, and one in which the small speed optimisations available with assembly is of declining relevance. x86 assembly is useful for compiler writers, low-level support libraries (clearly it is useful to /you/), low-level systems code which C cannot handle (such as working with interrupts), and occasional libraries where it is worth making an enormous effort for tiny speed differences. For almost all people programming for x86 systems, if you are using assembly for anything except fun, you are making a mistake. On other platforms, especially the embedded world, there is more scope for useful assembly - but only in a tiny fraction of code. |
Bonita Montero <Bonita.Montero@gmail.com>: May 10 05:14PM +0200 >> There are intrinsics for this pupose. > Not in all compilers, ... In all relevant compilers, i.e. g++, clang, msvc++ and intel-c++. > Byte swapping is also such an example, and there are many others. The above compilers cover everything with intrinsics the c++-language doesn't supply. > In the case of an intrinsic certainly, it is not assembler. > It is a non portable construct geared to a single compiler. And assembly is portable? |
Ian Collins <ian-news@hotmail.com>: May 10 08:41PM +1200 On 05/10/17 08:32 PM, Stefan Ram wrote: > main.cpp:40:41: error: 'my_class' in 'class std::vector<my_class::listentry>' does not name a type > ::std::vector< ::my_class::listentry >::my_class::list; > ^ All those spurious colons make it hard to read and it looks like they have confused you as well.. Shouldn't that be std::vector<my_class::listentry> my_class::list; -- Ian |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: May 10 10:44AM +0200 On 10-May-17 10:32 AM, Stefan Ram wrote: > main.cpp:40:41: error: 'my_class' in 'class std::vector<my_class::listentry>' does not name a type > ::std::vector< ::my_class::listentry >::my_class::list; > ^ `::` does double duty both as scope resolution operator and as name of the global scope. You intend the latter but you get the former. Just omit that `::`. ;-) Cheers & hth., - Alf |
Ian Collins <ian-news@hotmail.com>: May 10 10:30PM +1200 On 05/10/17 09:01 PM, Stefan Ram wrote: > Now I see. Thank you! > It seems that I also can use braces: > ::std::vector< ::my_class::listentry >( ::my_class::list ); It's much easier and clearer to omit the superfluous colons. -- Ian |
scott@slp53.sl.home (Scott Lurndal): May 10 12:47PM > It seems that I also can use braces: >::std::vector< ::my_class::listentry >( ::my_class::list ); Or do as has been suggested and lose the leading "::". I really hope you don't teach your students that practice, as it will cause them problems once they hit the real world. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: May 10 04:17AM +0200 > the x,y to the constructor at "new", but perhaps the 2011 or 2014 standards > have added automated this at all? Or is there some STL trick that can achieve > this using a container? For a raw array all that you have to work with is the default constructor, and all that it knows about the object it's initializing, is its address. If, however, it has available the start address of the array, a pointer to first item in the array, then it can compute its array index. This means a class that's not usable as anything but array item. But maybe it can be done as templated wrapper for the "real" array item class. * * * An alternative is to just define a factory function that first creates the array, and then loops over all items, passing them the item index. Cheers & hth., - Akf |
spud@potato.field: May 10 08:29AM On Wed, 10 May 2017 04:17:59 +0200 >An alternative is to just define a factory function that first creates >the array, and then loops over all items, passing them the item index. Sure, I mean writing 2 loops and manually creating the objects and passing them their location isn't hard, but initialising arrays of objects is so common I wondered if buried somewhere in the latest standards there was some special constructor that would be passed the array location of the object. Since they've thrown in so much other rubbish into 2011 and 2014 which barely anyone will ever use I imagine this would have been a useful addition. -- Spud |
spud@potato.field: May 10 11:14AM On 10 May 2017 09:32:28 GMT >>} myarray[10][20]; >>Is there a way for each individual object to find its x,y location? > Here's my quick take at it: Its certainly an interesting method, but its not exactly clean and simple and having to hard code the arry dimensions into the code rather defeats the point :) Also shouldn't "entry->major = i / 20u" be "i / 10u"? -- Spud |
ram@zedat.fu-berlin.de (Stefan Ram): May 10 08:32AM I have this line of code: using u = ::my_class; ::std::vector< ::my_class::listentry >u::list; , and it compiles just fine. However, if I change it to ::std::vector< ::my_class::listentry >::my_class::list; , I get this error from GCC (IIRC 5.1): main.cpp: At global scope: main.cpp:40:41: error: 'my_class' in 'class std::vector<my_class::listentry>' does not name a type ::std::vector< ::my_class::listentry >::my_class::list; ^ . For your reference, here is the complete program: #include <initializer_list> #include <iostream> #include <ostream> #include <string> #include <vector> using namespace ::std::literals; static class my_class /* line from OP */ { /* line from OP */ public: struct listentry { const my_class * const object; int major; int minor; }; static ::std::vector<::my_class::listentry> list; static void push_into_the_list( my_class const * const object ) { list.push_back( { object, 0, 0 } ); } static void assign_coordinates_to_each_object_from_the_list() { size_t i = 0; for( listentry & entry : list ) { entry.major = i / 20; entry.minor = i++ % 20; }} static void accept( my_class const * const object ) { push_into_the_list( object ); assign_coordinates_to_each_object_from_the_list(); } my_class() { accept( this ); } } myarray[10][20]; /* line from OP */ ::std::vector< ::my_class::listentry >::my_class::list; int main() { ::std::cout << static_cast< void * >( &myarray )<< '\n'; } |
ram@zedat.fu-berlin.de (Stefan Ram): May 10 09:01AM >the global scope. >You intend the latter but you get the former. >Just omit that `::`. ;-) Now I see. Thank you! It seems that I also can use braces: ::std::vector< ::my_class::listentry >( ::my_class::list ); . |
ram@zedat.fu-berlin.de (Stefan Ram): May 10 09:32AM > : >} myarray[10][20]; >Is there a way for each individual object to find its x,y location? Here's my quick take at it: main.cpp #include <algorithm> #include <cassert> #include <initializer_list> #include <iostream> #include <iterator> #include <ostream> #include <string> #include <vector> using namespace ::std::literals; static class my_class /* line from OP */ { /* line from OP */ public: size_t major; size_t minor; static ::std::vector< ::my_class* >list; static void push_into_the_list( my_class * const address ) { list.push_back( address ); } static void sort_the_list() { sort( begin( ::my_class::list ), end( ::my_class::list )); } static void assign_coordinates_to_each_object_from_the_list() { size_t i = 0; for( ::my_class * entry : list ) { entry->major = i / 20u; entry->minor = i++ % 20u; }} static void statically_register_object_address ( my_class * const address ) { push_into_the_list( address ); sort_the_list(); assign_coordinates_to_each_object_from_the_list(); } my_class(): major{ 0u }, minor{ 0u } { statically_register_object_address( this ); } } myarray[10][20]; /* line from OP */ ::std::vector< ::my_class* >( ::my_class::list ); int main() { for( int i = 0; i < 10; ++i ) for( int j = 0; j < 20; ++j ) { assert( myarray[ i ][ j ].major == i ); assert( myarray[ i ][ j ].minor == j ); }} |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment