soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

On endianness, many #ifdefs, and the need thereof - 2 Updates
alloca()-support - 22 Updates
Compile a program from all C and C++ files in current folder - 1 Update

On endianness, many #ifdefs, and the need thereof

Daniel <danielaparker@gmail.com>: Oct 02 04:00AM -0700

Most C++ software that needs to know whether the host is big endian or little
endian comes with a header file with a very long sequence of #ifdef's, along
the lines of

# if (defined(__BYTE_ORDER__) && defined(__ORDER_BIG_ENDIAN__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) || \
(defined(__BYTE_ORDER) && defined(__BIG_ENDIAN) && __BYTE_ORDER == __BIG_ENDIAN) || \
(defined(BYTE_ORDER) && defined(BIG_ENDIAN) && BYTE_ORDER == BIG_ENDIAN) || \
(defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)) || (defined(__BIG_ENDIAN__) && !defined(__LITTLE_ENDIAN__)) || \
defined(__ARMEB__) || defined(__MIPSEB__) || defined(__s390__) || defined(__sparc__)

On the other hand, at least one library merely has

static constexpr bool little_endianess(int num = 1) noexcept
{
return *reinterpret_cast<char*>(&num) == 1;
}

referencing a contribution on stackoverflow, http://stackoverflow.com/a/1001328/266378.

So why the many #ifdef's, if this is enough? Or is it enough?

Thanks,
Daniel

David Brown <david.brown@hesbynett.no>: Oct 02 01:26PM +0200

On 02/10/2019 13:00, Daniel wrote:
> }

> referencing a contribution on stackoverflow, http://stackoverflow.com/a/1001328/266378.

> So why the many #ifdef's, if this is enough? Or is it enough?

This function is cannot be evaluated as a constant expression, despite
the "constexpr" qualifier. So you can't use in in places where you
actually need a constant expression (size of an array, template
parameter, etc.).

It will still be good enough in some contexts, especially as compilers
can optimise knowing the result of the function.

For constant expression endianness detection, conditional compilation is
your main choice (the complexity depends on the portability you need),
with pre-build configuration being an alternative, until C++20
standardises endianness.

alloca()-support

Joe Pfeiffer <pfeiffer@cs.nmsu.edu>: Oct 01 09:08PM -0600

> I've seen the same thing on other processors too. IIRC, on the 68060
> the hardware integer division instruction was slower than a software
> division routine.

There was a time when it was true. I'd be pretty surprised if it still
were.

Melzzzzz <Melzzzzz@zzzzz.com>: Oct 02 04:15AM

>> Which one does not supports VLA and supports alloca?

> I'm using C++ and C++ hasn't VLAs.
> But almost any C++-compiler supports alloca().
Any C++11 compil.er that supports C99 also supports VLA, I bet...

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 06:28AM +0200

> But `alloca()' is not a platform-pecific facility, because
> it may exist or be absent on the majority of platforms. ..

Doesn't matter, almost any compiler supports it.

Melzzzzz <Melzzzzz@zzzzz.com>: Oct 02 04:28AM

> .size f, .-f
> .ident "GCC: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516"
> .section .note.GNU-stack,"",@progbits
Yes, this is against optimisations as you use volatile...

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 06:28AM +0200

>> I'm using C++ and C++ hasn't VLAs.
>> But almost any C++-compiler supports alloca().

> Any C++11 compil.er that supports C99 also supports VLA, I bet...

Boy, you're such a mega-idiot.

Melzzzzz <Melzzzzz@zzzzz.com>: Oct 02 04:29AM

>>> But almost any C++-compiler supports alloca().

>> Any C++11 compil.er that supports C99 also supports VLA, I bet...

> Boy, you're such a mega-idiot.

Look into mirror...

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 06:30AM +0200

>> .ident "GCC: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516"
>> .section .note.GNU-stack,"",@progbits

> Yes, this is against optimisations as you use volatile...

I use volatile to prevent the compiler from optimizing away the
function body.

Melzzzzz <Melzzzzz@zzzzz.com>: Oct 02 04:36AM

>> Yes, this is against optimisations as you use volatile...

> I use volatile to prevent the compiler from optimizing away the
> function body.
"
Linus Torvalds has expressed his displeasure in the past over VLA usage
for arrays with predetermined small sizes, with comments like "USING
VLA'S IS ACTIVELY STUPID! It generates much more code, and much slower
code (and more fragile code), than just using a fixed key size would
have done." [6] With the Linux 4.20 kernel, Linux kernel is effectively
VLA-free.[7]
"

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Bonita Montero <Bonita.Montero@gmail.com>: Oct 02 06:39AM +0200

>> That's wrong.

> As a general point, I don't think it is wrong - ...

CISC-like instructions that are used often in x86 are usually mapped to
the same uOPS, but they use only one slot in the decoder.

> A little googling suggests that the "enter" instruction is slower
> than the individual instructions, but that "leave" is quite fast.

ENTER isn't used because it is used to allocate static stack-frames,
i.e. stack-frames without variable parts like VLAs or alloca()-frames.
But static stack-frames are used by the compilers without EBP, so ENTER
isn't needed. But that doesn't mean that it is not efficient.

> There are plenty of other legacy CISC instructions that are slower
> than breaking them apart into equivalent simpler instructions, such
> as the "LOOP" instructions. ...

My Ryzen needs 2 clock-cycles for LOOP if the branch is taken; that's
the same number of clock cycles if you would assemble it from simpler
instructions.

Melzzzzz <Melzzzzz@zzzzz.com>: Oct 02 04:42AM

>> Yes, this is against optimisations as you use volatile...

> I use volatile to prevent the compiler from optimizing away the
> function body.
Try more realistic example:
float read_val();
float process(int,float*);
float read_and_process(int n)
{
float vals[n];

for (int i = 0; i < n; ++i)
vals[i] = read_val();

return process(n, vals);
}

Where is `leave` now?
--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 02 12:03AM -0700

On Monday, September 30, 2019 at 1:02:16 PM UTC+1, Bonita Montero wrote:
> I know alloca() neither is an official part of the C nor C++ program-
> ming-language. But I use it for performance-reasons. Can anyone name
> compilers that don't support alloca()?

I wonder if this would be a good reason to add a new compile-time operator to the C++ language? We could re-use the keyword 'extern' as follows:

#include <some_header.hpp>

int main()
{
if extern(alloca)
{
/* alloca is declared so we can use it */
}
else
{
/* Do some other trick */
}
}

David Brown <david.brown@hesbynett.no>: Oct 02 10:01AM +0200

On 02/10/2019 06:36, Melzzzzz wrote:
>>>> int volatile a[s];
>>>> a[s - 1] = 123;
>>>> }

<snip>

>>> Yes, this is against optimisations as you use volatile...

>> I use volatile to prevent the compiler from optimizing away the
>> function body.

There are usually better ways to get this, such as giving a return value
that depends on the function inputs. If you have to use "volatile" to
ensure you are generating code that can be examined, then minimise its
use rather than making the whole array volatile.

(In a test function that used the VLA for calculations, gcc still
generated a "leave" instruction.)

> have done." [6] With the Linux 4.20 kernel, Linux kernel is effectively
> VLA-free.[7]
> "

VLA's with pre-determined small sizes result in /identical/ code to
arrays with fixed sizes.

void foo1(void) {
const int arr_size = 12;
int arr[arr_size];
...
}

void foo2(void) {
#define ARR_SIZE 12
int arr[ARR_SIZE];
...
}

"foo1" has a VLA with a predetermined small size, while "foo2" has a
normal array with fixed size. These are different as far as the C
language is concerned, but identical in the generated code.

I /think/ what Torvalds is trying to say is that if you know that the
size "n" of your VLA is going be at most "max_n", and you know "max_n"
is small, then you should use a fixed size array of size "max_n" instead
of a VLA. I believe that has both advantages and disadvantages. When
the array is of fixed size (i.e., known at compile time - even if it is
technically a VLA) the code for allocating and deallocating the space is
simpler. Code for accessing the contents of the array or other data
might be a little simpler. But it is all just one-time calculations.
Code that is run repeatedly - and if you have an array, you probably
have a loop - is going to be identical.

So you get the trade-off between a few extra instructions for the
variable VLA, balanced against the benefits of reduced cache use from a
smaller array. I'd not like to be categorical about which is best - it
would need measuring.

If the VLA is making the code fragile, then I expect it would be equally
fragile with a fixed size array.

(Fixed size arrays are also easier for analysis of stack usage, if that
is important.)

David Brown <david.brown@hesbynett.no>: Oct 02 10:10AM +0200

On 02/10/2019 06:42, Melzzzzz wrote:

> return process(n, vals);
> }

> Where is `leave` now?

The compiler will use it if it is convenient, and not use it otherwise.
This sample code /does/ generate "leave" with gcc :

int f(int s )
{
int a[s];
for (int i = 0; i < s; i++) a[i] = i;
int t = 0;
for (int i = 0; i < s; i++) t += a[i];
return t;
}

(clang does not generate a "leave" - it generates its usual unrolled
vector monstrosities that are highly efficient for huge loop counts, and
terrible for tiny counts.)

Basically, it seems that "leave" is equivalent to "mov %ebp, %esp" then
"pop %ebp". If the compiler would otherwise have generated these
instructions, then it uses "leave". If not, then it doesn't. "leave"
appears to be marginally more efficient than the separate instructions -
enough so to use it when there is the option, but not enough to go out
of its way to use it.

(The matching "enter" instruction is much slower than manual
manipulation of the registers, and is thus avoided.)

David Brown <david.brown@hesbynett.no>: Oct 02 10:19AM +0200

On 02/10/2019 05:08, Joe Pfeiffer wrote:
>> division routine.

> There was a time when it was true. I'd be pretty surprised if it still
> were.

I'm not quite sure what you mean by that. You are surely not saying the
68060 hardware division has got faster recently, or that software
division code for it has got slower!

Paavo Helde <myfirstname@osa.pri.ee>: Oct 02 11:30AM +0300

On 2.10.2019 10:03, Frederick Gotham wrote:
> if extern(alloca)
> {
> /* alloca is declared so we can use it */

Please, when proposing new extensions for C++, could we refrain
ourselves from things which would increase the danger of poorly
predictable nasty fatalities like stack overflows? There are already
enough ways to shoot someone's legs off in C++.

Ian Collins <ian-news@hotmail.com>: Oct 02 09:34PM +1300

On 02/10/2019 20:03, Frederick Gotham wrote:
> /* Do some other trick */
> }
> }

Not much use what a "function" is actually inlined code..

Best not to use dodgy things like alloca. Even its Linux man page
recommends against its use.

--
Ian.

David Brown <david.brown@hesbynett.no>: Oct 02 10:34AM +0200

On 02/10/2019 06:39, Bonita Montero wrote:
> i.e. stack-frames without variable parts like VLAs or alloca()-frames.
> But static stack-frames are used by the compilers without EBP, so ENTER
> isn't needed. But that doesn't mean that it is not efficient.

"enter" is only potentially useful when you have a fixed stack frame but
want a frame pointer - that is true. And usually you don't bother with
a frame pointer (they used to be popular for debuggers, but debuggers
are smarter these days).

But "enter" is not used even when it could be used, because it is very
slow. It is microcoded, and takes 12 micro-ops on modern x86 devices -
far more than "push ebp", "mov ebp, esp", "sub esp, xxxx" that replace it.

> My Ryzen needs 2 clock-cycles for LOOP if the branch is taken; that's
> the same number of clock cycles if you would assemble it from simpler
> instructions.

Interesting. On an Intel Skylake it is 7 µ-ops for a "loop" and 11 for
a "loopne".

<https://www.agner.org/optimize/instruction_tables.pdf>
<https://www.agner.org/optimize/>

This is why I rely on the compiler to generate good code for particular
chips. The x86 assembly world is too chaotic for my liking.

Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 02 01:36AM -0700

On Wednesday, October 2, 2019 at 9:34:20 AM UTC+1, Ian Collins wrote:

> Not much use what a "function" is actually inlined code..

> Best not to use dodgy things like alloca. Even its Linux man page
> recommends against its use.

That's as ridiculous as telling people not to use "goto".

I'm working on an embedded Linux x86_64 firmware program right now and I've got two "goto" calls in my code.

Ian Collins <ian-news@hotmail.com>: Oct 02 09:41PM +1300

On 02/10/2019 21:36, Frederick Gotham wrote:

>> Best not to use dodgy things like alloca. Even its Linux man page
>> recommends against its use.

> That's as ridiculous as telling people not to use "goto".

Using goto is a tar and feather offense in my team. There is never a
justifiable use case in C++.

alloca and VLAs are a bomb waiting to go off in your code.

> I'm working on an embedded Linux x86_64 firmware program right now and I've got two "goto" calls in my code.

And you admit it?

--
Ian.

Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 02 02:28AM -0700

On Wednesday, October 2, 2019 at 9:41:14 AM UTC+1, Ian Collins wrote:

> > I'm working on an embedded Linux x86_64 firmware program right now and I've got two "goto" calls in my code.

> And you admit it?

I could show you if you like.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Oct 02 02:35AM -0700

On 9/30/2019 5:02 AM, Bonita Montero wrote:
> I know alloca() neither is an official part of the C nor C++ program-
> ming-language. But I use it for performance-reasons. Can anyone name
> compilers that don't support alloca()?

Fwiw, the last time I use alloca was to offset threads stacks, using a
thread id, to get around the 64k aliasing problem on old Intel
hyperthreaded processors. It was needed to get around false sharing wrt
thread stacks! alloca was a simple convenience hack to get the damn job
done. Iirc, it was mentioned in an Intel paper. Well, I have something
even more hackish. The region allocator that can be stack based. It can
be fed with memory reaped from alloca:

http://pastebin.com/raw/f37a23918

https://groups.google.com/forum/#!original/comp.lang.c/7oaJFWKVCTw/sSWYU9BUS_QJ

Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 02 02:43AM -0700

On Wednesday, October 2, 2019 at 10:35:31 AM UTC+1, Chris M. Thomasson wrote:
> even more hackish. The region allocator that can be stack based. It can
> be fed with memory reaped from alloca:

> http://pastebin.com/raw/f37a23918

I still remember the little earthquake I felt in my head 10 years ago the first time I ever walked into a morgue. I'm not saying that looking at your code was in any way similar to the experience I had 10 years ago, but I definitely did feel a mild unbalancing shake in my head as I reluctantly clicked the link and scrolled downward.

Compile a program from all C and C++ files in current folder

Frederick Gotham <cauldwell.thomas@gmail.com>: Oct 02 01:16AM -0700

Okay what do you think of this?

PROGRAM = super_cool_program

FLAGS_IN_COMMON = -pedantic -Wall -rdynamic -funwind-tables -fno-omit-frame-pointer -fdump-rtl-expand -Og -g
CXXFLAGS = $(FLAGS_IN_COMMON) -std=c++11
CFLAGS = $(FLAGS_IN_COMMON) -std=c99

CC = gcc
CXX = g++

SRC_FILES_C := $(wildcard *.c)
SRC_FILES_CXX := $(wildcard *.cpp)

OBJECTS = $(SRC_FILES_C:.c=.c.o) $(SRC_FILES_CXX:.cpp=.cpp.o)

DEPENDENCIES = $(OBJECTS:.o=.d)

.PHONY: all
all: $(PROGRAM) $(DEPENDENCIES)

$(PROGRAM): $(OBJECTS)
$(CXX) $(FLAGS_IN_COMMON) -o $@ $^

#This rule makes dependencies files for the source files
.PRECIOUS: %.c.d
%.c.d: %.c
$(CC) -MM -MT $(<:.c=.c.o) $< -MF $@

.PRECIOUS: %.cpp.d
%.cpp.d: %.cpp
$(CXX) -MM -MT $(<:.cpp=.cpp.o) $< -MF $@

#This rule compiles the source files to object files
%.c.o: %.c %.c.d
$(CC) $(CFLAGS) -o $@ -c $<

%.cpp.o: %.cpp %.cpp.d
$(CXX) $(CXXFLAGS) -o $@ -c $<

#This next line pulls in all the dependency files (if they exist)
-include $(DEPENDENCIES)

.PHONY: clean
clean:
rm -f $(DEPENDENCIES) $(OBJECTS) $(PROGRAM) $(OBJECTS:.o=*.expand)

.PHONY: install
install: $(PROGRAM)
cp $< /usr/bin/

.PHONY: uninstall
uninstall:
rm -f /usr/bin/$(PROGRAM)

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Wednesday, October 2, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

No comments:

Blog Archive

About Me