soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

???Microsoft Azure CTO Mark Russinovich: C/C++ should be deprecated??? - 9 Updates
Order of initialization of static libraries in gcc - 3 Updates
Never use strncpy! - 5 Updates
How do C -programmers do reliable string handling? - 7 Updates
manual memory management, vs an automatic gc... - 1 Update

???Microsoft Azure CTO Mark Russinovich: C/C++ should be deprecated???

Kaz Kylheku <864-117-4973@kylheku.com>: Sep 26 04:53AM

> Non-GC is definitely not for embedded programming only.
> I have seen a pretty large project for digital imaging workstations fail
> because of the non deterministic nature of GC.

And are you sure that you didn't see a large project fail for various
reasons, whereby some people tried to shift the blame to garbage
collection?

Today someone will pull it off ... in the browser.

(How many decades before the web was this?)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

David Brown <david.brown@hesbynett.no>: Sep 26 08:42AM +0200

On 26/09/2022 08:07, Blue-Maned_Hawk wrote:
>> the main one.)

> I think it's possible to use other languages instead of Java, but i
> don't have any sort of citation for that.

Some easy references:

<https://en.wikipedia.org/wiki/Kotlin_(programming_language)>
<https://en.wikipedia.org/wiki/Android_software_development>

Kotlin is now the preferred choice (I have little experience of Java,
and none of Kotlin, so I have no idea of its pros and cons). But you
can use all sorts of other languages too.

In practice, a great many apps for Android are basically webpages - so
HTML5 and JavaScript are all you need.

> a terrible game that's somehow one of the most popular in the world, is
> based upon Java for it's primary edition.
>

Minecraft is probably the Java application with most users. But Java
has been hugely popular for in-house and dedicated programs for
businesses, so there is a vast investment in Java code that most people
never see. It may go out of fashion, but like Cobol, it will never die.

Juha Nieminen <nospam@thanks.invalid>: Sep 26 08:01AM

>> are). Someone seeing just "convert(...)" in the code can't have any
>> idea what it's doing.

> It sounds like an argument against C++-style static polymorphism.

Polymorphism has its uses (especially in templates), but also its abuses.

Every good feature can be abused to make it a bad feature.

And in situations where the shorter and more generic name is needed because
of polymorphism (eg. in templates), my answer is almost always the same:

"Implement convert_to_utf8_from_utf16() anyway, and make the equivalent
convert() function call it. Use the former when you don't need the
polymorphism, restrict the use of the latter only to those situations
where it's actually needed."

Juha Nieminen <nospam@thanks.invalid>: Sep 26 08:01AM

> giving an example of someone who chose a bad name in a particular
> program? (I know about std::convert in Rust, but that does not seem to
> be what you are talking about.)

It's just a hypothetical example (which is actually based on real
production code).

Juha Nieminen <nospam@thanks.invalid>: Sep 26 08:09AM

>> Because why not.

> "Rc" stands for "Reference Counted Smart Pointer". It's not just a
> random 2-letter sequence.

Of course it's not random, but it's needlessly short, for no reason nor
advantage.

Juha Nieminen <nospam@thanks.invalid>: Sep 26 08:13AM

> If that are your arguments against a lanuage - don't program at all.

I thought you weren't reading any of my "nonsense". So why are you?

Just go away, asshole.

Juha Nieminen <nospam@thanks.invalid>: Sep 26 08:19AM

> these devices. Despite wanting the newest microcontrollers, embedded
> programmers are a conservative bunch - C90 is, I think, the most popular
> choice of language. (Yes, C90 - not C99.)

In my experience working in the field I think C99 has gained popularity
even among many (although not all) old-school "as bare metal as possible"
embedded C programmers.

Designated initializers are perhaps the best thing that has happened to C
during its entire existence (which is why they have been widely adopted
in the Linux kernel, and many other major C projects).

David Brown <david.brown@hesbynett.no>: Sep 26 01:55PM +0200

On 26/09/2022 10:19, Juha Nieminen wrote:

> In my experience working in the field I think C99 has gained popularity
> even among many (although not all) old-school "as bare metal as possible"
> embedded C programmers.

Certainly C99 has gained popularity, but I think many C programmers
would be surprised to see how common C90 still is. It's also common to
have a kind of mixture of C90 with bits of C99 - people might use single
line comments but define all their local variables uninitialised at the
top of a function and use "int" when they should use "bool". And the
use of compiler extensions is also common - for many microcontrollers,
it is unavoidable.

> Designated initializers are perhaps the best thing that has happened to C
> during its entire existence (which is why they have been widely adopted
> in the Linux kernel, and many other major C projects).

Designated initialisers are certainly nice, but I would not call them
the "best" feature of C99. If I were to pick one favourite C99 feature,
it would be mixing declarations and statements - but such choices are
highly subjective. However, it's taken until C++20 to get designated
initialisers in C++, which suggests that they were not viewed as the
most important feature (though there has certainly been plenty of call
for them in C++).

Juha Nieminen <nospam@thanks.invalid>: Sep 26 02:31PM

> the "best" feature of C99. If I were to pick one favourite C99 feature,
> it would be mixing declarations and statements - but such choices are
> highly subjective.

It is indeed highly subjective. Many C programmers are of the opinion
that declaring variables within the code implementation is actually a
bad thing, and they still prefer declaring all the variables of the
function at the beginning. (If I'm not mistaken, this is actually one
of the style requirements of the Linux kernel code. Although I'm not
sure if it's outright required or just recommended.)

When you have seen and programmed with designated initializers, however,
you really start to appreciate them. If you are writing eg. a Linux
kernel module, you essentially "fill out" some particular structs with
the data required for your module to work. The great thing about
designated initializers is that not only are these struct initializations
much more readable, but moreover the code doesn't need to care what the
order of the member variables is, or if there are more member variables
before, after, or in-between the ones being initialized. It really makes
a lot easier. (It also allows for those structs to be changed by eg.
adding new member variables without breaking tons of existing code.)

> initialisers in C++, which suggests that they were not viewed as the
> most important feature (though there has certainly been plenty of call
> for them in C++).

And C++20 ruined one of the best aspects of them: The fact that you don't
need to know the order in which the member variables have been defined
in the struct.

(Not having to care about the order allows for refactoring of the struct
by swapping things around. Also, the order of the member variables in
the struct might have been chosen for space efficiency, while in the
initialization you can use a more logical order of initialization by
grouping related values together.)

Order of initialization of static libraries in gcc

Paavo Helde <eesnimi@osa.pri.ee>: Sep 26 01:36PM +0300

I know the order of TU-s is not determined when initializing the global
statics. But what about static libraries? Are the global statics in
static libraries initialized in the order I list the libraries on the
linker command-line? If so, it seems the needed order would be in
general opposite to the order needed for symbol resolving dependencies.

I'm specifically interested in an answer for gcc, as this is where a
third-party library was failing. For now it looks like I resolved the
issue by re-shuffling and duplicating the static lib names in the link
command line, but I wonder how permanent this fix is.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 26 01:38PM +0200

Am 26.09.2022 um 12:36 schrieb Paavo Helde:
> static libraries initialized in the order I list the libraries on the
> linker command-line? If so, it seems the needed order would be in
> general opposite to the order needed for symbol resolving dependencies.

As there are usually no dependencies on the other global
objects outside the library that's actually not a problem.

Paavo Helde <eesnimi@osa.pri.ee>: Sep 26 03:18PM +0300

26.09.2022 14:38 Bonita Montero kirjutas:
>> dependencies.

> As there are usually no dependencies on the other global
> objects outside the library that's actually not a problem.

Except when there is (a problem). You do not need a dependency on a
global object, just calling a function using the non-constructed global
object is enough.

For fun, here are snippets from the concrete library which suddenly
started to fail:

Library aws-crt-cpp:

source file Api.cpp, namespace level
Allocator *g_allocator = Aws::Crt::DefaultAllocator();

source file allocator.c:

void *aws_mem_acquire(struct aws_allocator *allocator, size_t size) {
AWS_FATAL_PRECONDITION(allocator != NULL); // <----- FAILS -----
// ...
}

header file StlAllocator.h

extern AWS_CRT_CPP_API Allocator *g_allocator;

template <typename T> class StlAllocator :
public std::allocator<T>
{
public:
StlAllocator() noexcept : Base()
{ m_allocator = g_allocator; }

// ...
Allocator *m_allocator;
};

Library aws-cpp-sdk-core:

Header file AWSString.h:

using String = std::basic_string< char, std::char_traits< char >,
Aws::Allocator< char > >;

Source file AWSConfigFileProfileConfigLoader.cpp, namespace level:

const Aws::String IDENTIFIER_ALLOWED_CHARACTERS =
R"(%+-./0123456789:@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz)";

Never use strncpy!

Juha Nieminen <nospam@thanks.invalid>: Sep 26 07:53AM

> accesses for sizes that are bigger than native C/C++ types (such as
> 128-bit accesses). Some cannot handle atomic writes for the bigger
> native types.

I think there's a bit of confusion here about what the term "atomic"
means. You seem to be talking about a concept of "atomic" with some kind
of meaning like "mutual exclusion supported by the CPU itself".

That's not what "atomic" means in general, when talking about
multithreaded programming. In general "atomic" merely means that the
resource in question can only be accessed by one thread at a time
(in other words, it implements some sort of mutual exclusion).

As a concrete example: POSIX requires that fwrite() be atomic (for a
particular FILE object). This means that no two threads can write
to the same FILE object with a singular fwrite() call at the same
time. In other words, fwrite() implements (at some level) some kind
of (per FILE object) mutex.

"Atomic" is actually a stronger guarantee than merely "thread-safe".

If fwrite() were merely guaranteed to be "thread-safe", it would just
mean that it won't break (eg. corrupt its internal state, or any other
data anywhere else) if two threads call it at the same time, but it
wouldn't guarantee that the data written by those two threads won't
be interleaved somehow.

However, since fwrite() is "atomic", not just "thread-safe" (if it
conforms to POSIX), then it implements a mutex for the entire function
call (for that particular FILE object).

Juha Nieminen <nospam@thanks.invalid>: Sep 26 07:54AM

>> in order to copy it. (Why traverse it twice? You can copy it while
>> traversing it for the first time!)

> How about using memccpy? It is in C23, don't know about C++.

That sounds like it would be the exact tool for the job.

Bonita Montero <Bonita.Montero@gmail.com>: Sep 26 10:21AM +0200

Am 26.09.2022 um 09:53 schrieb Juha Nieminen:

> I think there's a bit of confusion here about what the term "atomic"
> means. You seem to be talking about a concept of "atomic" with some
> kind of meaning like "mutual exclusion supported by the CPU itself".

You're confused, David not.

Juha Nieminen <nospam@thanks.invalid>: Sep 26 08:34AM

> You're confused, David not.

Just go away, asshole.

David Brown <david.brown@hesbynett.no>: Sep 26 01:39PM +0200

On 26/09/2022 09:53, Juha Nieminen wrote:

> I think there's a bit of confusion here about what the term "atomic"
> means. You seem to be talking about a concept of "atomic" with some kind
> of meaning like "mutual exclusion supported by the CPU itself".

No, that's not what I am saying.

> multithreaded programming. In general "atomic" merely means that the
> resource in question can only be accessed by one thread at a time
> (in other words, it implements some sort of mutual exclusion).

And that's not quite right either.

"Atomic" means that accesses are indivisible. As many threads as you
want can read or write to the data at the same time - the defining
feature is that there is no possibility of a partial access succeeding.

We've mostly mentioned reads and writes - but more complex transactions
can be atomic too, such as increments. The term can also apply to
collections of accesses, well-known from the database world. Such
atomic transactions need to be built on top of low-level atomic accesses
with locks, lock-free algorithms, or more advanced protocols such as
software transactional memory.

Atomic accesses do not have to be purely hardware implementations,
though that is the most efficient - and anything software-based is going
to depend on smaller hardware-based atomic accesses. By far the most
convenient accesses are when you can read or write the memory with
normal memory access instructions, or at most by using things such as a
"bus lock prefix" available on some processors. On RISC processors,
anything beyond a single read or write of a size handled directly by
hardware typically involves load-store-exclusive sequences.

When you have to use code sequences for access, then it's common that
you end up with mutual exclusion - one thread at a time has access. But
it doesn't have to be that way, and different software sequences can be
used to optimise different usage patterns. All that matters is that if
a read sequence exits happily saying "I've read the data", then the data
it read matches exactly the data that some thread wrote at some point.

> to the same FILE object with a singular fwrite() call at the same
> time. In other words, fwrite() implements (at some level) some kind
> of (per FILE object) mutex.

That's at a much higher level than has been under discussion here - but
yes, that is applying the same term and guarantees for different
purposes. (The "atomic" requirement does not force a mutex, but
fwrite() has other guarantees beyond mere atomicity.)

> data anywhere else) if two threads call it at the same time, but it
> wouldn't guarantee that the data written by those two threads won't
> be interleaved somehow.

"Thread safe" is not as well-defined a term as "atomic", as far as I see it.

> However, since fwrite() is "atomic", not just "thread-safe" (if it
> conforms to POSIX), then it implements a mutex for the entire function
> call (for that particular FILE object).

"Atomic" is not really enough to describe the behaviour of a function
like "fwrite", since the function does not act on a single "state". If
you have two threads trying to write A and B to the same object
simultaneously, atomicity means that a third thread reading the object
will see A or B, and never a mixture. It's fine if this is implemented
by a write of A then a write of B, a write of B then a write of A, a
write of A alone, a write of B alone, a lock blocking the thread then a
mix of A, B, C and D that gets sorted into one of A or B before the lock
is released, or any other combination. Clearly that is not the
behaviour you want from fwrite() - here there should be either A then B,
or B then A.

How do C -programmers do reliable string handling?

Gawr Gura <gawrgura@mail.hololive.com>: Sep 25 04:34PM -0700

On 9/25/22 15:01, JiiPee wrote:

> I mean, it opens much more doors to human mistakes or risks.

I prefer the convenience of C++ but if you need to ensure good behavior
in C you can use the technique I outlined. I think the amount of risk
undertaken by a competent programmer in this case is very low.

JiiPee <kerrttuPoistaTama11@gmail.com>: Sep 26 07:27AM +0300

On 26/09/2022 02:02, Richard Damon wrote:

>> I mean, it opens much more doors to human mistakes or risks.

> But C doesn't have references, so you can't do that.

> In C++, it would be private, so you couldh't do it.

oh but could use pointer then. so if it was int* l

JiiPee <kerrttuPoistaTama11@gmail.com>: Sep 26 07:32AM +0300

On 26/09/2022 02:34, Gawr Gura wrote:
> I think the amount of risk
> undertaken by a competent programmer in this case is very low.

was it Python creator or some other language who said that a good
programmer does not accidentally change public member variable

Juha Nieminen <nospam@thanks.invalid>: Sep 26 08:33AM

> void destroy_string(struct string *const str);
> size_t get_string_length(const struct string *const str);
> /* etc. */

Which is a horrendously inefficient thing to do. Basically never do that!

(I have even seen suggstions of doing the above with very simple struct
types, like ones containing a couple of ints, and which would ostensibly
be instantiated millions of times (such as structs representing a point
or a pixel). Burn that kind of suggestion with fire!)

Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Sep 26 02:03AM -0700

On Saturday, 24 September 2022 at 18:23:10 UTC+1, JiiPee wrote:
> programmers represend a string?

> How do C programmers make string safe that the above problem does not
> occur (that the lenght of the string goes wrong)?

If the program's primary purpose is not string processing, then it's best to represent
string as character pointers to nul-terminated sequences of bytes. This is because
the efficency improvement you can get from storing the length separately isn't
worth the additonal complexity and possibility for error.

If you are worried about reliability, it's best to make a rule that a string is either
a string literal (embedded in double quotes), or allocated with malloc. So there's
no confusion between the string and the buffer which holds the string. This also
causes a slight efficiency loss, since most strings are short and it means short
allocations.

(Note that this won't always be feasible in embedded applications, where malloc can
be problematic.)

Paavo Helde <eesnimi@osa.pri.ee>: Sep 26 12:18PM +0300

26.09.2022 11:33 Juha Nieminen kirjutas:
> types, like ones containing a couple of ints, and which would ostensibly
> be instantiated millions of times (such as structs representing a point
> or a pixel). Burn that kind of suggestion with fire!)

If by that you mean that dynamic allocations are slow, then you can
create also larger opaque types in C without any dynamic allocations:

struct item_tag {
// use a type guaranteeing proper alignment,
// choose big enough N to cover the real struct size.
uint64_t opaque[N];
};
typedef struct item_tag item_t;

void InitItem(item_t* it) {
// cast it to real struct type and do things.
// ...
}

// Client C code example:

item_t x;
InitItem(&x);
// ...
DestroyItem(&x);

This way one could also support e.g. SSO strings, avoiding both
arbitrary string length limits and excessive dynamic allocation
overheads. Been there, done that.

Juha Nieminen <nospam@thanks.invalid>: Sep 26 09:47AM

> If by that you mean that dynamic allocations are slow, then you can
> create also larger opaque types in C without any dynamic allocations:

Of course it depends on what exactly the struct is for, and how it's used.

For example, if it's a large struct which basically contains nothing the
programmer may be directly interested in, and which is ostensibly
instantiated only relatively rarely and infrequently, then it's not
wrong to use this idiom per se. (A lot of C libraries do this, such
as libpng, libz, etc, and that's ok, because those structs are
usually no instantiated in the millions nor accessed in tight
inner loops requiring maximum speed.)

However, when it comes to small structs that are instantiated in
the millions and which should be as efficient as possible, this
idiom would completely kill the performance. Not only is instantiating
them slow, but also handling them is very slow as well (in comparison
to the structs being "public" and directly accessed.)

This is especially so in number-crunching applications (which things
like image manipulation etc. tend to be in practice). Not only would
this idiom consume significantly more RAM than necessary, and not
only would instantiating the objects be slower, but accessing them
would be a lot slower as well. (Modern compilers are relatively good
at autovectorizing linear accesses to values in an array. However,
if these accesses are done via non-inline functions, ie. resulting
in actual function calls, that pretty much kills all these
autovectorization optimizations.)

> This way one could also support e.g. SSO strings, avoiding both
> arbitrary string length limits and excessive dynamic allocation
> overheads. Been there, done that.

If such a struct is intended to be as efficient as possible, then that
idiom might be acceptable, assuming that all the accessor functions are
inline.

manual memory management, vs an automatic gc...

Juha Nieminen <nospam@thanks.invalid>: Sep 26 08:29AM

> Being able to create objects all over the place willy-nilly, and never
> even have to think about destroying them... Is a "convenience" that a GC
> can help one out with...

I think that as data-oriented design (as opposed to object-oriented design)
is gaining popularity, especially in certain fields of programming that
require extreme efficiency (such as game engines), the need for automatic
garbage collection is diminishing, at least in those fields.

The problem with automatic GC is that it's mostly needed when you
allocate dynamically individual objects (which is the case with most
GC'd languages). However, using individually allocated objects is a
performance killer. (In fact, using "objects" at all, ie. class
instances, is a performance killer.)

DOD doesn't require individually allocated objects, as everything is
put into arrays. (And not as in arrays of objects. Arrays of individual
values, which would normally be class member variables.)

Since optimally all the dynamically allocated data is in arrays, and
no "object" refers to any other "object", the need for automatic GC
is significantly lessened.

In contrast, low-level control of what the compiler produces is
significantly more important.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Monday, September 26, 2022

Digest for comp.lang.c++@googlegroups.com - 25 updates in 5 topics

No comments:

Blog Archive

About Me