- On endianness, many #ifdefs, and the need thereof - 2 Updates
- alloca()-support - 22 Updates
- Compile a program from all C and C++ files in current folder - 1 Update
Daniel <danielaparker@gmail.com>: Oct 02 04:00AM -0700 Most C++ software that needs to know whether the host is big endian or little endian comes with a header file with a very long sequence of #ifdef's, along the lines of # if (defined(__BYTE_ORDER__) && defined(__ORDER_BIG_ENDIAN__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) || \ (defined(__BYTE_ORDER) && defined(__BIG_ENDIAN) && __BYTE_ORDER == __BIG_ENDIAN) || \ (defined(BYTE_ORDER) && defined(BIG_ENDIAN) && BYTE_ORDER == BIG_ENDIAN) || \ (defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)) || (defined(__BIG_ENDIAN__) && !defined(__LITTLE_ENDIAN__)) || \ defined(__ARMEB__) || defined(__MIPSEB__) || defined(__s390__) || defined(__sparc__) On the other hand, at least one library merely has static constexpr bool little_endianess(int num = 1) noexcept { return *reinterpret_cast<char*>(&num) == 1; } referencing a contribution on stackoverflow, http://stackoverflow.com/a/1001328/266378. So why the many #ifdef's, if this is enough? Or is it enough? Thanks, Daniel |
David Brown <david.brown@hesbynett.no>: Oct 02 01:26PM +0200 On 02/10/2019 13:00, Daniel wrote: > } > referencing a contribution on stackoverflow, http://stackoverflow.com/a/1001328/266378. > So why the many #ifdef's, if this is enough? Or is it enough? This function is cannot be evaluated as a constant expression, despite the "constexpr" qualifier. So you can't use in in places where you actually need a constant expression (size of an array, template parameter, etc.). It will still be good enough in some contexts, especially as compilers can optimise knowing the result of the function. For constant expression endianness detection, conditional compilation is your main choice (the complexity depends on the portability you need), with pre-build configuration being an alternative, until C++20 standardises endianness. |
Joe Pfeiffer <pfeiffer@cs.nmsu.edu>: Oct 01 09:08PM -0600 > I've seen the same thing on other processors too. IIRC, on the 68060 > the hardware integer division instruction was slower than a software > division routine. There was a time when it was true. I'd be pretty surprised if it still were. |
Melzzzzz <Melzzzzz@zzzzz.com>: Oct 02 04:15AM >> Which one does not supports VLA and supports alloca? > I'm using C++ and C++ hasn't VLAs. > But almost any C++-compiler supports alloca(). Any C++11 compil.er that supports C99 also supports VLA, I bet... -- press any key to continue or any other to quit... U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi bili naoruzani. -- Mladen Gogala |
Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 06:28AM +0200 > But `alloca()' is not a platform-pecific facility, because > it may exist or be absent on the majority of platforms. .. Doesn't matter, almost any compiler supports it. |
Melzzzzz <Melzzzzz@zzzzz.com>: Oct 02 04:28AM > .size f, .-f > .ident "GCC: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516" > .section .note.GNU-stack,"",@progbits Yes, this is against optimisations as you use volatile... -- press any key to continue or any other to quit... U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi bili naoruzani. -- Mladen Gogala |
Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 06:28AM +0200 >> I'm using C++ and C++ hasn't VLAs. >> But almost any C++-compiler supports alloca(). > Any C++11 compil.er that supports C99 also supports VLA, I bet... Boy, you're such a mega-idiot. |
Melzzzzz <Melzzzzz@zzzzz.com>: Oct 02 04:29AM >>> But almost any C++-compiler supports alloca(). >> Any C++11 compil.er that supports C99 also supports VLA, I bet... > Boy, you're such a mega-idiot. Look into mirror... -- press any key to continue or any other to quit... U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi bili naoruzani. -- Mladen Gogala |
Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 06:30AM +0200 >> .ident "GCC: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516" >> .section .note.GNU-stack,"",@progbits > Yes, this is against optimisations as you use volatile... I use volatile to prevent the compiler from optimizing away the function body. |
Melzzzzz <Melzzzzz@zzzzz.com>: Oct 02 04:36AM >> Yes, this is against optimisations as you use volatile... > I use volatile to prevent the compiler from optimizing away the > function body. " Linus Torvalds has expressed his displeasure in the past over VLA usage for arrays with predetermined small sizes, with comments like "USING VLA'S IS ACTIVELY STUPID! It generates much more code, and much slower code (and more fragile code), than just using a fixed key size would have done." [6] With the Linux 4.20 kernel, Linux kernel is effectively VLA-free.[7] " -- press any key to continue or any other to quit... U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi bili naoruzani. -- Mladen Gogala |
Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 06:39AM +0200 >> That's wrong. > As a general point, I don't think it is wrong - ... CISC-like instructions that are used often in x86 are usually mapped to the same uOPS, but they use only one slot in the decoder. > A little googling suggests that the "enter" instruction is slower > than the individual instructions, but that "leave" is quite fast. ENTER isn't used because it is used to allocate static stack-frames, i.e. stack-frames without variable parts like VLAs or alloca()-frames. But static stack-frames are used by the compilers without EBP, so ENTER isn't needed. But that doesn't mean that it is not efficient. > There are plenty of other legacy CISC instructions that are slower > than breaking them apart into equivalent simpler instructions, such > as the "LOOP" instructions. ... My Ryzen needs 2 clock-cycles for LOOP if the branch is taken; that's the same number of clock cycles if you would assemble it from simpler instructions. |
Melzzzzz <Melzzzzz@zzzzz.com>: Oct 02 04:42AM >> Yes, this is against optimisations as you use volatile... > I use volatile to prevent the compiler from optimizing away the > function body. Try more realistic example: float read_val(); float process(int,float*); float read_and_process(int n) { float vals[n]; for (int i = 0; i < n; ++i) vals[i] = read_val(); return process(n, vals); } Where is `leave` now? -- press any key to continue or any other to quit... U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi bili naoruzani. -- Mladen Gogala |
Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 02 12:03AM -0700 On Monday, September 30, 2019 at 1:02:16 PM UTC+1, Bonita Montero wrote: > I know alloca() neither is an official part of the C nor C++ program- > ming-language. But I use it for performance-reasons. Can anyone name > compilers that don't support alloca()? I wonder if this would be a good reason to add a new compile-time operator to the C++ language? We could re-use the keyword 'extern' as follows: #include <some_header.hpp> int main() { if extern(alloca) { /* alloca is declared so we can use it */ } else { /* Do some other trick */ } } |
David Brown <david.brown@hesbynett.no>: Oct 02 10:01AM +0200 On 02/10/2019 06:36, Melzzzzz wrote: >>>> int volatile a[s]; >>>> a[s - 1] = 123; >>>> } <snip> >>> Yes, this is against optimisations as you use volatile... >> I use volatile to prevent the compiler from optimizing away the >> function body. There are usually better ways to get this, such as giving a return value that depends on the function inputs. If you have to use "volatile" to ensure you are generating code that can be examined, then minimise its use rather than making the whole array volatile. (In a test function that used the VLA for calculations, gcc still generated a "leave" instruction.) > have done." [6] With the Linux 4.20 kernel, Linux kernel is effectively > VLA-free.[7] > " VLA's with pre-determined small sizes result in /identical/ code to arrays with fixed sizes. void foo1(void) { const int arr_size = 12; int arr[arr_size]; ... } void foo2(void) { #define ARR_SIZE 12 int arr[ARR_SIZE]; ... } "foo1" has a VLA with a predetermined small size, while "foo2" has a normal array with fixed size. These are different as far as the C language is concerned, but identical in the generated code. I /think/ what Torvalds is trying to say is that if you know that the size "n" of your VLA is going be at most "max_n", and you know "max_n" is small, then you should use a fixed size array of size "max_n" instead of a VLA. I believe that has both advantages and disadvantages. When the array is of fixed size (i.e., known at compile time - even if it is technically a VLA) the code for allocating and deallocating the space is simpler. Code for accessing the contents of the array or other data might be a little simpler. But it is all just one-time calculations. Code that is run repeatedly - and if you have an array, you probably have a loop - is going to be identical. So you get the trade-off between a few extra instructions for the variable VLA, balanced against the benefits of reduced cache use from a smaller array. I'd not like to be categorical about which is best - it would need measuring. If the VLA is making the code fragile, then I expect it would be equally fragile with a fixed size array. (Fixed size arrays are also easier for analysis of stack usage, if that is important.) |
David Brown <david.brown@hesbynett.no>: Oct 02 10:10AM +0200 On 02/10/2019 06:42, Melzzzzz wrote: > return process(n, vals); > } > Where is `leave` now? The compiler will use it if it is convenient, and not use it otherwise. This sample code /does/ generate "leave" with gcc : int f(int s ) { int a[s]; for (int i = 0; i < s; i++) a[i] = i; int t = 0; for (int i = 0; i < s; i++) t += a[i]; return t; } (clang does not generate a "leave" - it generates its usual unrolled vector monstrosities that are highly efficient for huge loop counts, and terrible for tiny counts.) Basically, it seems that "leave" is equivalent to "mov %ebp, %esp" then "pop %ebp". If the compiler would otherwise have generated these instructions, then it uses "leave". If not, then it doesn't. "leave" appears to be marginally more efficient than the separate instructions - enough so to use it when there is the option, but not enough to go out of its way to use it. (The matching "enter" instruction is much slower than manual manipulation of the registers, and is thus avoided.) |
David Brown <david.brown@hesbynett.no>: Oct 02 10:19AM +0200 On 02/10/2019 05:08, Joe Pfeiffer wrote: >> division routine. > There was a time when it was true. I'd be pretty surprised if it still > were. I'm not quite sure what you mean by that. You are surely not saying the 68060 hardware division has got faster recently, or that software division code for it has got slower! |
Paavo Helde <myfirstname@osa.pri.ee>: Oct 02 11:30AM +0300 On 2.10.2019 10:03, Frederick Gotham wrote: > if extern(alloca) > { > /* alloca is declared so we can use it */ Please, when proposing new extensions for C++, could we refrain ourselves from things which would increase the danger of poorly predictable nasty fatalities like stack overflows? There are already enough ways to shoot someone's legs off in C++. |
Ian Collins <ian-news@hotmail.com>: Oct 02 09:34PM +1300 On 02/10/2019 20:03, Frederick Gotham wrote: > /* Do some other trick */ > } > } Not much use what a "function" is actually inlined code.. Best not to use dodgy things like alloca. Even its Linux man page recommends against its use. -- Ian. |
David Brown <david.brown@hesbynett.no>: Oct 02 10:34AM +0200 On 02/10/2019 06:39, Bonita Montero wrote: > i.e. stack-frames without variable parts like VLAs or alloca()-frames. > But static stack-frames are used by the compilers without EBP, so ENTER > isn't needed. But that doesn't mean that it is not efficient. "enter" is only potentially useful when you have a fixed stack frame but want a frame pointer - that is true. And usually you don't bother with a frame pointer (they used to be popular for debuggers, but debuggers are smarter these days). But "enter" is not used even when it could be used, because it is very slow. It is microcoded, and takes 12 micro-ops on modern x86 devices - far more than "push ebp", "mov ebp, esp", "sub esp, xxxx" that replace it. > My Ryzen needs 2 clock-cycles for LOOP if the branch is taken; that's > the same number of clock cycles if you would assemble it from simpler > instructions. Interesting. On an Intel Skylake it is 7 µ-ops for a "loop" and 11 for a "loopne". <https://www.agner.org/optimize/instruction_tables.pdf> <https://www.agner.org/optimize/> This is why I rely on the compiler to generate good code for particular chips. The x86 assembly world is too chaotic for my liking. |
Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 02 01:36AM -0700 On Wednesday, October 2, 2019 at 9:34:20 AM UTC+1, Ian Collins wrote: > Not much use what a "function" is actually inlined code.. > Best not to use dodgy things like alloca. Even its Linux man page > recommends against its use. That's as ridiculous as telling people not to use "goto". I'm working on an embedded Linux x86_64 firmware program right now and I've got two "goto" calls in my code. |
Ian Collins <ian-news@hotmail.com>: Oct 02 09:41PM +1300 On 02/10/2019 21:36, Frederick Gotham wrote: >> Best not to use dodgy things like alloca. Even its Linux man page >> recommends against its use. > That's as ridiculous as telling people not to use "goto". Using goto is a tar and feather offense in my team. There is never a justifiable use case in C++. alloca and VLAs are a bomb waiting to go off in your code. > I'm working on an embedded Linux x86_64 firmware program right now and I've got two "goto" calls in my code. And you admit it? -- Ian. |
Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 02 02:28AM -0700 On Wednesday, October 2, 2019 at 9:41:14 AM UTC+1, Ian Collins wrote: > > I'm working on an embedded Linux x86_64 firmware program right now and I've got two "goto" calls in my code. > And you admit it? I could show you if you like. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 02 02:35AM -0700 On 9/30/2019 5:02 AM, Bonita Montero wrote: > I know alloca() neither is an official part of the C nor C++ program- > ming-language. But I use it for performance-reasons. Can anyone name > compilers that don't support alloca()? Fwiw, the last time I use alloca was to offset threads stacks, using a thread id, to get around the 64k aliasing problem on old Intel hyperthreaded processors. It was needed to get around false sharing wrt thread stacks! alloca was a simple convenience hack to get the damn job done. Iirc, it was mentioned in an Intel paper. Well, I have something even more hackish. The region allocator that can be stack based. It can be fed with memory reaped from alloca: http://pastebin.com/raw/f37a23918 https://groups.google.com/forum/#!original/comp.lang.c/7oaJFWKVCTw/sSWYU9BUS_QJ |
Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 02 02:43AM -0700 On Wednesday, October 2, 2019 at 10:35:31 AM UTC+1, Chris M. Thomasson wrote: > even more hackish. The region allocator that can be stack based. It can > be fed with memory reaped from alloca: > http://pastebin.com/raw/f37a23918 I still remember the little earthquake I felt in my head 10 years ago the first time I ever walked into a morgue. I'm not saying that looking at your code was in any way similar to the experience I had 10 years ago, but I definitely did feel a mild unbalancing shake in my head as I reluctantly clicked the link and scrolled downward. |
Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 02 01:16AM -0700 Okay what do you think of this? PROGRAM = super_cool_program FLAGS_IN_COMMON = -pedantic -Wall -rdynamic -funwind-tables -fno-omit-frame-pointer -fdump-rtl-expand -Og -g CXXFLAGS = $(FLAGS_IN_COMMON) -std=c++11 CFLAGS = $(FLAGS_IN_COMMON) -std=c99 CC = gcc CXX = g++ SRC_FILES_C := $(wildcard *.c) SRC_FILES_CXX := $(wildcard *.cpp) OBJECTS = $(SRC_FILES_C:.c=.c.o) $(SRC_FILES_CXX:.cpp=.cpp.o) DEPENDENCIES = $(OBJECTS:.o=.d) .PHONY: all all: $(PROGRAM) $(DEPENDENCIES) $(PROGRAM): $(OBJECTS) $(CXX) $(FLAGS_IN_COMMON) -o $@ $^ #This rule makes dependencies files for the source files .PRECIOUS: %.c.d %.c.d: %.c $(CC) -MM -MT $(<:.c=.c.o) $< -MF $@ .PRECIOUS: %.cpp.d %.cpp.d: %.cpp $(CXX) -MM -MT $(<:.cpp=.cpp.o) $< -MF $@ #This rule compiles the source files to object files %.c.o: %.c %.c.d $(CC) $(CFLAGS) -o $@ -c $< %.cpp.o: %.cpp %.cpp.d $(CXX) $(CXXFLAGS) -o $@ -c $< #This next line pulls in all the dependency files (if they exist) -include $(DEPENDENCIES) .PHONY: clean clean: rm -f $(DEPENDENCIES) $(OBJECTS) $(PROGRAM) $(OBJECTS:.o=*.expand) .PHONY: install install: $(PROGRAM) cp $< /usr/bin/ .PHONY: uninstall uninstall: rm -f /usr/bin/$(PROGRAM) |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment