soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Some C++14 code - 12 Updates
garbage collection - 3 Updates
garbage collection - 1 Update
"Jonathan Blow: "C++ is a weird mess"" - 1 Update

Bart <bc@freeuk.com>: Jul 15 01:16AM +0100

On 15/07/2018 00:12, Ian Collins wrote:
>> level requirements I've worked out how to balance low-level static code
>> and scripting code.

> Some of us need performance from higher level code...

That's where getting the balance right comes in.

I guess you're familiar with getting tasks done by a combination of
shell scripts plus actual compiled programs that do the real work. There
wouldn't be much advantage in writing everything in 100% C or C++.

Well I believe in doing rather more using scripting. That still leaves
plenty of work for a fast language somewhat more advanced than C, but it
might not need to go quite as far as C++ does.

(My main text editor for last few years has been written in 100%
interpreted code. You don't notice until you get to 5 or 10-million line
files then some operations are slow, but then they use simple
techniques. And I've used other editors (such as gedit on Linux, which I
guess is not interpreted) which can be slow enough to be almost unusable.)

--
bart

Melzzzzz <Melzzzzz@zzzzz.com>: Jul 15 01:08AM

> files then some operations are slow, but then they use simple
> techniques. And I've used other editors (such as gedit on Linux, which I
> guess is not interpreted) which can be slow enough to be almost unusable.)

Where have you found 5 or 10-million line files ;)?

--
press any key to continue or any other to quit...

Ian Collins <ian-news@hotmail.com>: Jul 15 01:24PM +1200

On 15/07/18 12:16, Bart wrote:

> I guess you're familiar with getting tasks done by a combination of
> shell scripts plus actual compiled programs that do the real work. There
> wouldn't be much advantage in writing everything in 100% C or C++.

For real time control software, performance is crucial. If you don't
meet your performance budget, things fail in horrible ways or you have
to blow your hardware budget on faster processors.

> Well I believe in doing rather more using scripting.

Belief is irrelevant when you have budgets to meet!

--
Ian.

bitrex <user@example.net>: Jul 15 12:17AM -0400

On 07/12/2018 10:39 PM, Tim Rentsch wrote:
>> directly into object code, is really nothing much more than a mild
>> superset of C, [...]

> "a mild superset". That's a good one. :)

Without metaprogramming (and all the stuff that utilizes template
metaprogramming like the STL) C++ really isn't much to write home about
in the year of our Lord 2018.

Paavo Helde <myfirstname@osa.pri.ee>: Jul 15 10:48AM +0300

On 15.07.2018 1:07, Bart wrote:

>> C wasn't.

> You tend to adapt code, algorithms and data structures to suit the
> language.

Well, that's a point in favor of C++, it provides much more flexible
means for expressing the subject area concepts directly in the language.

> be counted arrays (ie. carrying their length with them), then in C I
> might write them as simply-linked lists, especially with elements more
> involved than simple ints.

A linked list with each node allocated dynamically is about the worst
data structure one can come up with, performance-wise. It is slow when
constructed, slow when destructed, and slow when traversed (because of
memory hopping).

There are very few usage cases for std::list in C++, and I haven't
encountered any in ~20 years of C++ programming. Each time either
std::deque or std::vector has been a better fit.

> all the stuff in C++ (although string handlng is a PITA). For higher
> level requirements I've worked out how to balance low-level static code
> and scripting code.

In a scripting language the language designer has taken all the
technical decisions (are objects stored by value or allocated
dynamically, are they garbage-collected, reference-counted or something
else; is the copy deep or shallow, or maybe copy-on-write (shallow copy
emulating deep copy), how automatic are the conversions, are the data
structures multithread-safe or single-threaded, etc.

This means that a scripting language is much less flexible than C++
where one can choose all such points as the best fit for each usage
case. Of course greater flexibility means greater responsibility and
more brain cycles to figure out the correct solution.

Typically a scripting language makes some more or less sensible choices
and hopes these are good enough for a large audience. E.g. Python has
everything dynamically allocated and reference-counted, lots of global
mutable state, not multithread-safe (requiring a global lock), shallow
copy (creating multiple references into the data structures which may be
hard to manage). As a result it is pretty easy to use and hack in simple
cases, a bit harder to work out long-term robust solutions, and the
performance is overall not so great. Because of the last point, Python
needs extensions written in C or C++ for performance-critical work, e.g.
NumPy, Keras+Tensorflow etc.

I have nothing against the scripting languages, I'm developing and
maintaining one myself. However, I would not like to write code in one,
because of limited expressiveness, lack of strict type checking and less
thorough error checking in general. Ironically, the C language shares
all these points, so to me it appears more similar to scripting
languages than to C++.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jul 15 12:48PM +0200

On 15.07.2018 09:48, Paavo Helde wrote:

> There are very few usage cases for std::list in C++, and I haven't
> encountered any in ~20 years of C++ programming. Each time either
> std::deque or std::vector has been a better fit.

I found one usage a few days ago: to create and hold on to a set of
instances of the item type, dynamically.

Because nothing's invalidated by inserting into a `std::list`.

Well I used a `std::forward_list`, since it would never be traversed.

Cheers!,

- Alf

Bart <bc@freeuk.com>: Jul 15 12:55PM +0100

On 15/07/2018 02:08, Melzzzzz wrote:
>> techniques. And I've used other editors (such as gedit on Linux, which I
>> guess is not interpreted) which can be slow enough to be almost unusable.)

> Where have you found 5 or 10-million line files ;)?

Test outputs or sometimes test inputs of various programs. Or sometimes
files containing textual data.

Is this really so exceptional?

Obviously normal source files and such will be a few thousand lines at most.

--
bart

Bart <bc@freeuk.com>: Jul 15 12:56PM +0100

On 15/07/2018 02:24, Ian Collins wrote:
> to blow your hardware budget on faster processors.

>> Well I believe in doing rather more using scripting.

> Belief is irrelevant when you have budgets to meet!

How about deadlines?

--
bart

Bart <bc@freeuk.com>: Jul 15 02:09PM +0100

On 15/07/2018 08:48, Paavo Helde wrote:
> data structure one can come up with, performance-wise. It is slow when
> constructed, slow when destructed, and slow when traversed (because of
> memory hopping).

(Interesting. I have several compilers and an assembler which both make
heavy use of linked lists (and in intricate ways: see below).

The assembler can process input at up to 4 million lines per second, two
of the compilers can approach one million lines per second.

It would be great if using C++ container types could make them even faster!)

> There are very few usage cases for std::list in C++, and I haven't
> encountered any in ~20 years of C++ programming. Each time either
> std::deque or std::vector has been a better fit.

That wouldn't suit my styling of coding. In one compiler, the main
symbol table record includes these fields (expressed as C):

struct _strec {
...
struct _strec* owner; // up
struct _strec* deflist; // down
struct _strec* deflistx; // down (points to end of list)
struct _strec* nextdef; // hoz (singly linked)
struct _strec* nextdupl; // hoz (these 2 doubly linked)
struct _strec* prevdupl; //
...
struct _strec* nextmacro; // hoz (singly linked)
...

The record can be linked in up to 7 different ways to other records (how
would std::list work here?). Your deque and vector suggestions are akin
to how I used to do this in interpreted code:

global record strec=
...
var owner
var deflist
var dupllist
...

Tidier (and more convenient since those lists are random access and
contain their bounds), but the implementation was much slower. I would
imagine the equivalent workings inside C++ would also have overheads
above those of using bare linked lists.

> performance is overall not so great. Because of the last point, Python
> needs extensions written in C or C++ for performance-critical work, e.g.
> NumPy, Keras+Tensorflow etc.

In the case of NumPy, Python is being used as a front end to a
scientific numeric library written in a faster language. I understand
that it works well (as scientists find it easier to write Python than C++).

(For my purposes, I find Python and its ecosystem to be large, sprawling
and cumbersome; NumPy was so difficult to install at one time that the
simplest suggestion was to install Anaconda, a 1GB download that
included NumPy. However C++ guys would feel right at home...)

The scripting language I use is lower level, less dynamic, and usually
executes more briskly (than Python) without having to use accelerators.

> I have nothing against the scripting languages, I'm developing and
> maintaining one myself. However, I would not like to write code in one,
> because of limited expressiveness,

That's not often a criticism of scripting languages. Usually you can
express exactly what you like and leave to it to the implementation to
do the work needed.

Mine for example can use Pascal-style sets and set constructors,
something I've rarely seen anywhere else:

identstarter := ['A'..'Z', 'a'..'z', '_', '$']
identbody := identstarter + ['0'..'9] - ['Z'] #exclude Z
if c in identstarter then ...

I expect both Python and C++ have a Set implementation so powerful that
it can do anything - except construct simple sets as easily as above).

--
bart

woodbrian77@gmail.com: Jul 15 10:22AM -0700

On Sunday, July 15, 2018 at 2:48:40 AM UTC-5, Paavo Helde wrote:
> hard to manage). As a result it is pretty easy to use and hack in simple
> cases, a bit harder to work out long-term robust solutions, and the
> performance is overall not so great.

I think that's an understatement:
https://blog.famzah.net/2016/09/10/cpp-vs-python-vs-php-vs-java-vs-others-performance-benchmark-2016-q3/

Python was 18 times slower than ++C.

> maintaining one myself. However, I would not like to write code in one,
> because of limited expressiveness, lack of strict type checking and less
> thorough error checking in general.

They are known for getting something working fast, but
then you are stuck with either biting the bullet and
converting to another language or digging a deeper hole.

Brian
Ebenezer Enterprises - Enjoying programming again.
http://webEbenezer.net

Ian Collins <ian-news@hotmail.com>: Jul 16 08:30AM +1200

On 15/07/18 23:56, Bart wrote:

>>> Well I believe in doing rather more using scripting.

>> Belief is irrelevant when you have budgets to meet!

> How about deadlines?

What about them? Scripting languages don't scale as well as compiled
languages for big projects with a big team.

--
Ian

Bart <bc@freeuk.com>: Jul 15 11:08PM +0100

On 15/07/2018 21:30, Ian Collins wrote:

>> How about deadlines?

> What about them? Scripting languages don't scale as well as compiled
> languages for big projects with a big team.

I would have thought the opposite. It depends on the design of the
scripting language, but it is possible for one module to be
independently compiled and run (if compilation is even a requirement)
without all modules needing to be combined in a linker-like process into
a monolithic binary before any change can be tested.

That also means individuals can develop, run and test their own scripts
without knowing what others are doing.

And when I used the technique a lot, the language was an add-on to the
application, and users - even 1000s if I had that many - could write
scripts to interact with their copy of the application, without the
application needing rebuilding each time.

(In early days it was also a necessity for script files to reside on
floppy and to only load and run the one that was necessary for a
specific command. In that form, the program could grow indefinitely with
hundreds of modules store on dozens of floppies. Modules that could be
written by different people. That's why I say it is more scalable.)

Anyway isn't something like Javascript used on quite a big scale?

--
bart

garbage collection

Andrew Goh <andrewgoh0@gmail.com>: Jul 15 07:47AM -0700

hi all,

this is probably as old when c++ with new / delete is invented or even older than c++ has been around

what's the current state-of-art for 'dynamic' memory management, in the notion of *garbage collection*?
e.g.
there is smart pointers today e.g. shared_ptr

there is Boehm garbage collector
http://www.hboehm.info/gc/
is this still very popular or is this commonly being used today?

smart pointers are possibly simply 'good' but i've come across some articles stating that share_ptr in boost is *slow* (some 10x slower than without), is that still true today?

the other thing would be that has any one embedded smart pointers in place of (ordinary) pointers in complex linked data structures such as elaborate trees (e.g. AVL trees), complex hash maps / linked lists, in 'objects linked to objects' in complex graphs type of scenarios? possibly with circular references

notions of things like

root - o1 - o2 - o3 - o4 - o5

^^ this can be a complex tree / linked list / graph structure

would releasing the smart pointer between root-o1 automatically cause o1 to o5 to be 'garbage collected' and recovered?

bhoehm garbage collector is possibly good and may help resolve the circular references problem
however, bhoehm garbage collector is conservative, and does only mark and sweep
this could leave a lot of memory uncollected
e.g. in the above example when root-o1 is disconnected gc would need to figure out all the linked nodes in the graph and garbage collect them
and that the use of mark and sweep would leave fragmented memory after collection

is there something such as a mark and compact garbage collector? if the simple implementation is difficult perhaps with the combined used of smart pointers?

thanks in advance

Paavo Helde <myfirstname@osa.pri.ee>: Jul 15 07:34PM +0300

On 15.07.2018 17:47, Andrew Goh wrote:

> there is Boehm garbage collector
> http://www.hboehm.info/gc/
> is this still very popular or is this commonly being used today?

In C++ the only sensible way to have GC is to logically think that all
allocated objects are leaked, but that's ok as you have got infinite
memory. Behind the scenes the memory gets reused of course, but this
should not have any visible effect on the program behavior.

This means that any kind of non-memory resources will need extra care at
all levels as you need to ensure they get released properly and are not
left to GC which will run in unpredictable times in a random thread.

I believe the current consensus is that with RAII one can control both
memory and non-memory resources in the same way and with less hassle, so
RAII all the way down it is. Thus GC is effectively not needed in C++.

> smart pointers are possibly simply 'good' but i've come across some articles stating that share_ptr in boost is *slow* (some 10x slower than without), is that still true today?

The main slowness in C++ nowadays comes from multithread
synchronization. Both dynamic memory allocations and std::shared_ptr
reference count updates are guaranteed to be multithread-safe, so they
are inherently slow in this regard. Note that this only matters if you
have millions or billions of them.

Anyway, if the performance appears to be a problem for you then your
best bet is to reduce the number of multithread synchronizations.
Avoiding std::shared_ptr or using a single-threaded version of it does
not buy you much if you still allocate each node in your data structures
separately. One possible approach is to use memory pools instead so that
e.g. the whole AVL tree is constructed in a pool which will be allocated
and released in one or few steps.

Inside the memory pool one can implement links either as raw pointers or
better yet as integer offsets. The latter can save a lot of memory (been
there, done that, loading a 200 MB XML file into a DOM tree which would
not eat up all the CPU and RAM the computer had got).

Inside a memory pool cycles are a non-issue as well, the whole pool will
be released in a single step anyway.

> root - o1 - o2 - o3 - o4 - o5

> ^^ this can be a complex tree / linked list / graph structure

> would releasing the smart pointer between root-o1 automatically cause o1 to o5 to be 'garbage collected' and recovered?

Yes, if you need separate parts of your data structures to be released
ASAP, then you can indeed use smart pointers. If the number of objects
is rather in thousands than in millions this ought to be OK. Cycles will
need special care.

> this could leave a lot of memory uncollected
> e.g. in the above example when root-o1 is disconnected gc would need to figure out all the linked nodes in the graph and garbage collect them
> and that the use of mark and sweep would leave fragmented memory after collection

Even if the deallocation of nodes is delayed and performed in a
background thread or whatever, the time to allocate the nodes one-by-one
would be still significant. So I do not believe that using a Boehm
collector would automatically provide the best possible result.

woodbrian77@gmail.com: Jul 15 11:50AM -0700

On Sunday, July 15, 2018 at 11:34:54 AM UTC-5, Paavo Helde wrote:
> synchronization. Both dynamic memory allocations and std::shared_ptr
> reference count updates are guaranteed to be multithread-safe, so they
> are inherently slow in this regard.

To put it a little differently, you may be able to use
multiple instances of single-threaded processes that
use raw pointers and/or unique_ptr. That's what I do
with the C++ Middleware Writer:
https://github.com/Ebenezer-group/onwards

Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net

garbage collection

ram@zedat.fu-berlin.de (Stefan Ram): Jul 15 03:10PM

>what's the current state-of-art for 'dynamic' memory
>management, in the notion of *garbage collection*?

internals.rust-lang.org/t/herb-sutter-deferred-heaps-and-pointers/4183
github.com/hsutter/gcpp
www.youtube.com/watch?v=JfmTagWcqoE

"Jonathan Blow: "C++ is a weird mess""

Hergen Lehmann <hlehmann.expires.5-11@snafu.de>: Jul 15 10:38AM +0200

Am 14.07.2018 um 21:00 schrieb Rick C. Hodgin:

> the user will never see it.

> Is it a bottleneck in network processing on your system? Does it need
> that extra 30% to 50% to handle maximum network throughput?

It's around 30% for algorithm-oriented stuff and more towards 50% when
multi-threading comes into place.

In regards to network throughput, the difference between Windows and
Linux may even reach several 100%. Of course, that's very likely not the
compiler's fault. The API concepts (epoll vs. IOCP) are fundamentally
different and my understatement of the rather complex IOCP may not be
good enough to fully utilize its power. Although a windows expert
consulted back than did not find anything, i'm doing totally wrong...

> Visual Studio 2015 and 2017 do integrate GCC into their toolchain:

> At 1:01:
> www.youtube.com/watch?v=-3toI8L3Oug&t=1m1s

You did oversee the split-second in which he did click on
"cross-platform". It's only supported for cross-compilation, not for
Windows development. :-(

> smaller files with only those functions you're working on at the moment,
> and then it will compile much faster, and the change code applied will
> be much faster.

My project is already split into lots of rather small, independent cpp
files. But that does shift the compiler workload even more towards the
interface headers and increases the chance that a change does indeed
have to take place in one of these headers...

> I'll stick with MSVC++ for nearly everything, and cringe each time I
> have to go to a GCC+GDB or other Linux-based toolchain.

For me, it's the other way around.
System APIs are much more straightforward and easier to overlook on
Posix-like systems, while the two major compilers (g++,clang) stick very
close to the language standards. Saves a lot of time figuring out the
differences between theory and practice.
When switching over to Windows towards the end of the development phase,
i already have proven code and do only need to figure out the MSVC++
twirks...

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Sunday, July 15, 2018

Digest for comp.lang.c++@googlegroups.com - 17 updates in 4 topics

No comments:

Blog Archive

About Me