soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Looking for the tgaimg exe/code - 9 Updates
Poor Mans RCU... - 9 Updates
"The weirdest compiler bug" by Scott Rasmussen - 6 Updates
negating unsigned.... - 1 Update

Lew Pitcher <lew.pitcher@digitalfreehold.ca>: Feb 08 11:38PM

On Mon, 08 Feb 2021 22:53:26 +0000, Mike Garcia wrote:

> I'm looking for an old unix/win95'ish commandline executable.

> "TGA2IMG takes an uncompressed 24-bit Targa file and converts it to a
> format that was used by the Vivid raytracer."
[snip]
> Anyone got a win32 exe or the source code to tga2img?

Not me, nor anyone that I know. /But/...

Slackware Linux still provides a tgatoppm program that converts Targa
files to Portable Pixmap files. And, it appears that the tgatoppm program
is part of the NetPBM package available at http://netpbm.sourceforge.net/

Since tgatoppm has an Open Source licence, you can get the sourcecode
from the NetPBM site.

Would the tgatoppm source code help you?

--
Lew Pitcher
"In Skills, We Trust"

Eli the Bearded <*@eli.users.panix.com>: Feb 09 12:03AM

> It's an old program witch converted tga images to other formats, including
> c code and header files (like bin2c), I can't seem to find the original
> exe or code to it on github or SF or a web search.

There's a lot of old software on ISOs at archive.org. Unfortunately many
of the ISOs don't provide searchable file listings. If I were desparate,
I'd start looking there.

Elijah
------
archive.org has a real "find this item" problem

Stef <stef33d@yahooI-N-V-A-L-I-D.com.invalid>: Feb 09 09:26AM +0100

On 2021-02-08 Mike Garcia wrote in comp.lang.c:

> It's an old program witch converted tga images to other formats, including
> c code and header files (like bin2c), I can't seem to find the original
> exe or code to it on github or SF or a web search.

Are you looking for this specific program or a way to open Targa files
and save them as C header files?

If the latter, you can probably use GIMP. It can open targa files and
save as C header files.

--
Stef (remove caps, dashes and .invalid from e-mail address to reply by mail)

Cogito cogito ergo cogito sum --
"I think that I think, therefore I think that I am."
-- Ambrose Bierce, "The Devil's Dictionary"

Anton Shepelev <anton.txt@g{oogle}mail.com>: Feb 09 04:13PM +0300

Mike Garcia:

> "TGA2IMG takes an uncompressed 24-bit Targa file and
> converts it to a format that was used by the Vivid
> raytracer."

Is it a quotation from the manual?

> github or SF or a web search.

> Here's what the usage command looks like:
> tga2img -q -r4 -i320,0 -p0,480 file.tga

Is it an exact sample invocation of the program?

> I've found 2 different programs but it's not it:
> [...]

I got the same results, but I have also found some
alternative software that can convert .tga to the Vivid
raytracer .img, but you seem to need that specific utility.

--
() ascii ribbon campaign - against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]

Eli the Bearded <*@eli.users.panix.com>: Feb 09 05:54PM

> and save them as C header files?

> If the latter, you can probably use GIMP. It can open targa files and
> save as C header files.

There's "tgatoppm foo.targa | ppmtoxpm > foo.c" using netpbm tools,
for at least one flavor of image-as-C-code image. I'd expect the
particular flavor of C encoding is significant.

Elijah
------
for many purposes PPM is easier to use in C than XPM

Mike Garcia <mike@mgarcia.nospam>: Feb 09 10:32PM

On Mon, 08 Feb 2021 23:38:31 +0000, Lew Pitcher wrote:

> Since tgatoppm has an Open Source licence, you can get the sourcecode
> from the NetPBM site.

> Would the tgatoppm source code help you?

Hey, thanks for the reply,
It's used in a makefile, I just wanted the same exe, but i'll have a look.
thanks
Mike.

--

Mike Garcia
http://mgarcia.org

Mike Garcia <mike@mgarcia.nospam>: Feb 09 10:37PM

On Tue, 09 Feb 2021 00:03:49 +0000, Eli the Bearded wrote:

> I'd start looking there.

> Elijah ------
> archive.org has a real "find this item" problem

Hey, thanks for the reply, yeah, I think this program predates ISOs,
so... it's probably on floppy images!

That's why I'm asking on newsgroup first.. i'll ask DOS groups next I
guess.

--

Mike Garcia
http://mgarcia.org

Mike Garcia <mike@mgarcia.nospam>: Feb 09 10:39PM

On Tue, 09 Feb 2021 09:26:09 +0100, Stef wrote:

> and save them as C header files?

> If the latter, you can probably use GIMP. It can open targa files and
> save as C header files.

Hey, thanks for the reply, sorry I should have been more clear, it's for
a few makefiles that use it, it's no big deal, bin2c works.

Just would have been nice to use the same exe.

--

Mike Garcia
http://mgarcia.org

Mike Garcia <mike@mgarcia.nospam>: Feb 09 11:01PM

On Tue, 09 Feb 2021 16:13:28 +0300, Anton Shepelev wrote:

>> "TGA2IMG takes an uncompressed 24-bit Targa file and converts it to a
>> format that was used by the Vivid raytracer."

> Is it a quotation from the manual?

It's from the only info I could find on tga2img!

https://groups.google.com/g/comp.graphics.apps.paint-shop-pro/c/
XXn09MH2tpM/m/W0f72jKxhCAJ

>> Here's what the usage command looks like:
>> tga2img -q -r4 -i320,0 -p0,480 file.tga

> Is it an exact sample invocation of the program?

Yes, from a makefile:

stars.img: stars.tga

tga2img -q -r8 -i448,0 -p0,482 $<

from a dos batch file:
tga2img -r8 kf_d0_f0.tga

> I got the same results, but I have also found some alternative software
> that can convert .tga to the Vivid raytracer .img, but you seem to need
> that specific utility.

Thanks for your time, I'll keep looking.
If I find I'll upload it to archive.org and drop a link in the thread.

--

Mike Garcia
http://mgarcia.org

Poor Mans RCU...

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 08 07:20PM -0800

On 2/8/2021 4:50 AM, Manfred wrote:

> node_allocations = 29400000
> node_deallocations = 29400000

> Test Completed!

Perfect output. That is the exact number I am looking for. Thank you so
much for taking the time to give it a go Manfred. Now, its time to move
this over to github. I am writing my response to another kind person,
Paavo, who also took the time and energy to compile _and_ run the sucker.

I am currently writing up some other use cases for my poor mans rcu.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 08 07:35PM -0800

On 2/8/2021 11:14 AM, Paavo Helde wrote:

> node_allocations = 29400000
> node_deallocations = 29400000

> Test Completed!

Wonderful! First off thanks Paavo, and Manfred. Its nice to see my code
being run on a diverse set of arch's and os's. Okay, your output is
perfect: you got the numbers I am looking for. Time for me to move it to
github, and create other use cases.

Actually, this version 0.0.1 is a pretty hardcore test. Usually RCU does
not like heavy writer activity. Hence the reader to writer thread ratio,
at 42:7. However, those seven writer threads are assaulting the poor
mans proxy collector. I need to create a benchmark test wrt a read/write
mutex vs. my proxy gc. Since rwmutex is the perfect thing to test
against, well, I am currently coding one up. Wrt this type of benchmark,
we are going to measure the number of reads-per-second, per-thread with
my poor RCU vs rwmutex. Since writers do not block readers, and vise
versa, and from past experience, well... The rwmutex does not really
stand a chance.

So, I will post a link to my new entry into gibhub for my code. Using
pastebin is not ideal.

Thanks again for taking the time to run my code Paavo. I really do
appreciate it.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 08 11:05PM -0800

On 2/7/2021 8:20 PM, Chris M. Thomasson wrote:

> https://pastebin.com/raw/nPVYXbWM
> _____________________________________
> // Chris M. Thomassons Poor Mans RCU... Example 123...
[...]
>             std::uint32_t old_refs = c.m_count.fetch_add(refs +
> ct_proxy_quiescent, std::memory_order_release);

>             if (old_refs == refs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There is a rather odd issue with this. It can sometimes miss collector
cycles. Everything works as far as allowing concurrent reads and writes
to the data-structure, but its still a big issue. Correcting it makes
the collector more efficient. It will be corrected in my next version.

Btw, Manfred and Pavvo, here is the fix:

if (old_refs == 0 - refs)

Its an odd bug in a sense, because it works either way wrt allowing for
lockfree reads and writes to a shared data-structure, except one can
hold up some collection cycles. The condition is hard to trip.

>             }
>         }
>     }
[...]

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 08 11:31PM -0800

On 2/7/2021 8:20 PM, Chris M. Thomasson wrote:
> Well, here is a test program. When you get some free time, can you
> please try to run it; give it a go?

I need to ask for one more favor, especially to Manfred and Pavvo. Can
you please run this version 0.0.2? It corrected a reference counting
issue in version 0.0.1. Even though the first version works, it can
sometimes miss a collection cycle, and allow memory to grow when it does
not have to. Here is the new version and it contains a lot more debug
stuff to show how your system is reacting to the algorihtm. I am sorry
for missing that counting issue in version 0.0.1. If you can please,
when you get free time, run it again? I need to see your output.

https://pastebin.com/raw/CYZ78gVj

Here is my output from a run:

Chris M. Thomassons Proxy Collector Port ver .0.0.2...
_______________________________________

Booting threads...
Threads running...
Threads completed!

node_allocations = 92400000
node_deallocations = 92400000

dtor_collect = 4
release_collect = 120
quiesce_complete = 124
quiesce_begin = 124
quiesce_complete_nodes = 92200000

Test Completed!

----

quiesce_complete should always equal quiesce_begin and node_allocations
should always equal node_deallocations. Also, quiesce_complete_nodes
should always be <= node_allocations

This version performs much better.

Here is the code:
_______________________________
// Chris M. Thomassons Poor Mans RCU... Example 456...

#include <iostream>
#include <atomic>
#include <thread>
#include <cstdlib>
#include <cstdint>
#include <climits>
#include <functional>

// Masks
static constexpr std::uint32_t ct_ref_mask = 0xFFFFFFF0U;
static constexpr std::uint32_t ct_ref_complete = 0x30U;
static constexpr std::uint32_t ct_ref_inc = 0x20U;
static constexpr std::uint32_t ct_proxy_mask = 0xFU;
static constexpr std::uint32_t ct_proxy_quiescent = 0x10U;

// Iteration settings
static constexpr unsigned long ct_reader_iters_n = 2000000;
static constexpr unsigned long ct_writer_iters_n = 200000;

// Thread counts
static constexpr unsigned long ct_reader_threads_n = 53;
static constexpr unsigned long ct_writer_threads_n = 11;

// Some debug/sanity check things...
// Need to make this conditional in compilation with some macros...
static std::atomic<std::uint32_t> g_debug_node_allocations(0);
static std::atomic<std::uint32_t> g_debug_node_deallocations(0);
static std::atomic<std::uint32_t> g_debug_dtor_collect(0);
static std::atomic<std::uint32_t> g_debug_release_collect(0);
static std::atomic<std::uint32_t> g_debug_quiesce_begin(0);
static std::atomic<std::uint32_t> g_debug_quiesce_complete(0);
static std::atomic<std::uint32_t> g_debug_quiesce_complete_nodes(0);

// Need to align and pad data structures! To do...

struct ct_node
{
std::atomic<ct_node*> m_next;
ct_node* m_defer_next;

ct_node() : m_next(nullptr), m_defer_next(nullptr)
{
g_debug_node_allocations.fetch_add(1, std::memory_order_relaxed);
}

~ct_node()
{
g_debug_node_deallocations.fetch_add(1, std::memory_order_relaxed);
}
};

// The proxy collector itself... :^)
template<std::size_t T_defer_limit>
class ct_proxy
{
static std::uint32_t prv_destroy(ct_node* n)
{
std::uint32_t count = 0;

while (n)
{
ct_node* next = n->m_defer_next;
delete n;
count++;
n = next;
}

return count;
}

public:
class collector
{
friend class ct_proxy;

private:
std::atomic<ct_node*> m_defer;
std::atomic<std::uint32_t> m_defer_count;
std::atomic<std::uint32_t> m_count;

public:
collector()
: m_defer(nullptr),
m_defer_count(0),
m_count(0)
{

}

~collector()
{
prv_destroy(m_defer.load(std::memory_order_relaxed));
}
};

private:
std::atomic<std::uint32_t> m_current;
std::atomic<bool> m_quiesce;
ct_node* m_defer;
collector m_collectors[2];

public:
ct_proxy()
: m_current(0),
m_quiesce(false),
m_defer(nullptr)
{

}

~ct_proxy()
{
prv_destroy(m_defer);
}

private:
void prv_quiesce_begin()
{
// Try to begin the quiescence process.
if (! m_quiesce.exchange(true, std::memory_order_acquire))
{
g_debug_quiesce_begin.fetch_add(1, std::memory_order_relaxed);

// advance the current collector and grab the old one.
std::uint32_t old =
m_current.load(std::memory_order_relaxed) & ct_proxy_mask;
old = m_current.exchange((old + 1) & 1,
std::memory_order_acq_rel);
collector& c = m_collectors[old & ct_proxy_mask];

// decode reference count.
std::uint32_t refs = old & ct_ref_mask;

// increment and generate an odd reference count.
std::uint32_t old_refs = c.m_count.fetch_add(refs +
ct_proxy_quiescent, std::memory_order_release);

if (old_refs == 0 - refs)
{
g_debug_dtor_collect.fetch_add(1,
std::memory_order_relaxed);

// odd reference count and drop-to-zero condition detected!
prv_quiesce_complete(c);
}
}
}

void prv_quiesce_complete(collector& c)
{
g_debug_quiesce_complete.fetch_add(1, std::memory_order_relaxed);

// the collector `c' is now in a quiescent state! :^)
std::atomic_thread_fence(std::memory_order_acquire);

// maintain the back link and obtain "fresh" objects from
// this collection.
ct_node* n = m_defer;
m_defer = c.m_defer.load(std::memory_order_relaxed);
c.m_defer.store(0, std::memory_order_relaxed);

// reset the reference count.
c.m_count.store(0, std::memory_order_relaxed);
c.m_defer_count.store(0, std::memory_order_relaxed);

// release the quiesce lock.
m_quiesce.store(false, std::memory_order_release);

// destroy nodes.
std::uint32_t count = prv_destroy(n);

g_debug_quiesce_complete_nodes.fetch_add(count,
std::memory_order_relaxed);
}

public:
collector& acquire()
{
// increment the master count _and_ obtain current collector.
std::uint32_t current =
m_current.fetch_add(ct_ref_inc, std::memory_order_acquire);

// decode the collector index.
return m_collectors[current & ct_proxy_mask];
}

void release(collector& c)
{
// decrement the collector.
std::uint32_t count =
c.m_count.fetch_sub(ct_ref_inc, std::memory_order_release);

// check for the completion of the quiescence process.
if ((count & ct_ref_mask) == ct_ref_complete)
{
// odd reference count and drop-to-zero condition detected!
g_debug_release_collect.fetch_add(1,
std::memory_order_relaxed);

prv_quiesce_complete(c);
}
}

collector& sync(collector& c)
{
// check if the `c' is in the middle of a quiescence process.
if (c.m_count.load(std::memory_order_relaxed) & ct_proxy_quiescent)
{
// drop `c' and get the next collector.
release(c);

return acquire();
}

return c;
}

void collect()
{
prv_quiesce_begin();
}

void collect(collector& c, ct_node* n)
{
if (! n) return;

// link node into the defer list.
ct_node* prev = c.m_defer.exchange(n, std::memory_order_relaxed);
n->m_defer_next = prev;

// bump the defer count and begin quiescence process if over
// the limit.
std::uint32_t count =
c.m_defer_count.fetch_add(1, std::memory_order_relaxed) + 1;

if (count >= (T_defer_limit / 2))
{
prv_quiesce_begin();
}
}
};

typedef ct_proxy<10> ct_proxy_collector;

// you're basic lock-free stack...
// well, minus ABA counter and DWCAS of course! ;^)
class ct_stack
{
std::atomic<ct_node*> m_head;

public:
ct_stack() : m_head(nullptr)
{

}

public:
void push(ct_node* n)
{
ct_node* head = m_head.load(std::memory_order_relaxed);

do
{
n->m_next.store(head, std::memory_order_relaxed);
}

while (! m_head.compare_exchange_weak(
head,
n,
std::memory_order_release));
}

ct_node* flush()
{
return m_head.exchange(nullptr, std::memory_order_acquire);
}

ct_node* get_head()
{
return m_head.load(std::memory_order_acquire);
}

ct_node* pop()
{
ct_node* head = m_head.load(std::memory_order_acquire);
ct_node* xchg;

do
{
if (! head) return nullptr;

xchg = head->m_next.load(std::memory_order_relaxed);
}

while (!m_head.compare_exchange_weak(
head,
xchg,
std::memory_order_acquire));

return head;
}
};

// The shared state
struct ct_shared
{
ct_proxy<10> m_proxy_gc;
ct_stack m_stack;
};

// Reader threads
// Iterates through the lock free stack
void ct_thread_reader(ct_shared& shared)
{
// iterate the lockfree stack
for (unsigned long i = 0; i < ct_reader_iters_n; ++i)
{
ct_proxy_collector::collector& c = shared.m_proxy_gc.acquire();

ct_node* n = shared.m_stack.get_head();

while (n)
{
// need to add in some processing...
// std::this_thread::yield();

n = n->m_next.load(std::memory_order_relaxed);
}

shared.m_proxy_gc.release(c);
}
}

// Writer threads
// Mutates the lock free stack
void ct_thread_writer(ct_shared& shared)
{
for (unsigned long wloop = 0; wloop < 42; ++wloop)
{
shared.m_proxy_gc.collect();

for (unsigned long i = 0; i < ct_writer_iters_n; ++i)
{
shared.m_stack.push(new ct_node());
}

//std::this_thread::yield();

ct_proxy_collector::collector& c = shared.m_proxy_gc.acquire();

for (unsigned long i = 0; i < ct_writer_iters_n; ++i)
{
shared.m_proxy_gc.collect(c, shared.m_stack.pop());
}

shared.m_proxy_gc.release(c);

for (unsigned long i = 0; i < ct_writer_iters_n / 2; ++i)
{
shared.m_proxy_gc.collect();
}

{
ct_proxy_collector::collector& c = shared.m_proxy_gc.acquire();

for (unsigned long i = 0; i < ct_writer_iters_n; ++i)
{
ct_node* n = shared.m_stack.pop();
if (! n) break;

shared.m_proxy_gc.collect(c, n);
}

shared.m_proxy_gc.release(c);
}

if ((wloop % 3) == 0)
{
shared.m_proxy_gc.collect();
}
}
}

int main()
{
std::cout << "Chris M. Thomassons Proxy Collector Port ver
.0.0.2...\n";
std::cout << "_______________________________________\n\n";

{
ct_shared shared;

std::thread readers[ct_reader_threads_n];
std::thread writers[ct_writer_threads_n];

std::cout << "Booting threads...\n";

for (unsigned long i = 0; i < ct_writer_threads_n; ++i)
{
writers[i] = std::thread(ct_thread_writer, std::ref(shared));
}

for (unsigned long i = 0; i < ct_reader_threads_n; ++i)
{
readers[i] = std::thread(ct_thread_reader, std::ref(shared));
}

std::cout << "Threads running...\n";

for (unsigned long i = 0; i < ct_reader_threads_n; ++i)
{
readers[i].join();
}

for (unsigned long i = 0; i < ct_writer_threads_n; ++i)
{
writers[i].join();
}
}

std::cout << "Threads completed!\n\n";

// Sanity check!
{
std::uint32_t node_allocations =
g_debug_node_allocations.load(std::memory_order_relaxed);
std::uint32_t node_deallocations =
g_debug_node_deallocations.load(std::memory_order_relaxed);
std::uint32_t dtor_collect =
g_debug_dtor_collect.load(std::memory_order_relaxed);
std::uint32_t release_collect =
g_debug_release_collect.load(std::memory_order_relaxed);
std::uint32_t quiesce_complete =
g_debug_quiesce_complete.load(std::memory_order_relaxed);
std::uint32_t quiesce_begin =
g_debug_quiesce_begin.load(std::memory_order_relaxed);
std::uint32_t quiesce_complete_nodes =
g_debug_quiesce_complete_nodes.load(std::memory_order_relaxed);

std::cout << "node_allocations = " << node_allocations << "\n";
std::cout << "node_deallocations = " << node_deallocations <<
"\n\n";
std::cout << "dtor_collect = " << dtor_collect << "\n";
std::cout << "release_collect = " << release_collect << "\n";
std::cout << "quiesce_complete = " << quiesce_complete << "\n";
std::cout << "quiesce_begin = " << quiesce_begin << "\n";
std::cout << "quiesce_complete_nodes = " <<
quiesce_complete_nodes << "\n";

if (node_allocations != node_deallocations)
{
std::cout << "OH SHIT! NODE LEAK!!! SHIT! = " <<
node_allocations - node_deallocations << "\n\n";
}

}

std::cout << "\n\nTest Completed!\n\n";

return 0;
}
_______________________________

Paavo Helde <myfirstname@osa.pri.ee>: Feb 09 03:03PM +0200

09.02.2021 09:31 Chris M. Thomasson kirjutas:
> stuff to show how your system is reacting to the algorihtm. I am sorry
> for missing that counting issue in version 0.0.1. If you can please,
> when you get free time, run it again? I need to see your output.

Win10 MSVC2017, Intel Xeon E-2286M, 8 phys cores

> time ../x64/Release/ConsoleTestVS2017.exe
Chris M. Thomassons Proxy Collector Port ver .0.0.2...
_______________________________________

Booting threads...
Threads running...
Threads completed!

node_allocations = 92400000
node_deallocations = 92400000

dtor_collect = 33
release_collect = 753
quiesce_complete = 786
quiesce_begin = 786
quiesce_complete_nodes = 92400000

Test Completed!

real 0m19.027s
user 0m0.015s
sys 0m0.000s

---------------------------------------------------------------------

gcc 7.4.0 on Linux: no issues (2 NUMA nodes, 24 physical cores)

> g++ -std=c++11 -O2 -Wall main.cpp -lpthread
> time ./a.out
Chris M. Thomassons Proxy Collector Port ver .0.0.2...
_______________________________________

Booting threads...
Threads running...
Threads completed!

node_allocations = 92400000
node_deallocations = 92400000

dtor_collect = 11
release_collect = 152
quiesce_complete = 163
quiesce_begin = 163
quiesce_complete_nodes = 92400000

Test Completed!

real 0m43.760s
user 34m31.243s
sys 0m3.295s

I did not measure the times yesterday, but it feels like the new version
is slower.

Manfred <noname@add.invalid>: Feb 09 03:01PM +0100

On 2/9/2021 8:31 AM, Chris M. Thomasson wrote:
> for missing that counting issue in version 0.0.1. If you can please,
> when you get free time, run it again? I need to see your output.

> https://pastebin.com/raw/CYZ78gVj

I'm posting the timed run from both yesterday's and today's versions.
Yesterday's code runs faster, I have no idea if that's due to extra
debugging.

BTW I forgot to mention that these are run on a VM with 8 cores. Later
on I'll run them on non-virtualized hardware.

$ c++ -std=c++11 -Wall -O2 -lpthread rcu_chris.cc && time ./a.out
Chris M. Thomassons Proxy Collector Port ver .0.0.1...
_______________________________________

Booting threads...
Threads running...
Threads completed!

node_allocations = 29400000
node_deallocations = 29400000

Test Completed!

real 0m6.890s
user 0m54.270s
sys 0m0.092s

$ c++ -std=c++11 -Wall -O2 -lpthread rcu_chris.0.0.2.cc && time ./a.out
Chris M. Thomassons Proxy Collector Port ver .0.0.2...
_______________________________________

Booting threads...
Threads running...
Threads completed!

node_allocations = 92400000
node_deallocations = 92400000

dtor_collect = 11
release_collect = 178
quiesce_complete = 189
quiesce_begin = 189
quiesce_complete_nodes = 92400000

Test Completed!

real 0m22.231s
user 2m56.383s
sys 0m0.249s

Manfred <noname@invalid.add>: Feb 09 03:33PM +0100

On 2/9/21 3:01 PM, Manfred wrote:
> debugging.

> BTW I forgot to mention that these are run on a VM with 8 cores. Later
> on I'll run them on non-virtualized hardware.

Here they are, running on bare metal.

GCC 9.3.1

$ c++ -std=c++11 -Wall -O2 -lpthread rcu_chris.0.0.1.cc && time ./a.out
Chris M. Thomassons Proxy Collector Port ver .0.0.1...
_______________________________________

Booting threads...
Threads running...
Threads completed!

node_allocations = 29400000
node_deallocations = 29400000

Test Completed!

real 0m8.814s
user 2m19.340s
sys 0m0.242s

$ c++ -std=c++11 -Wall -O2 -lpthread rcu_chris.0.0.2.cc && time ./a.out
Chris M. Thomassons Proxy Collector Port ver .0.0.2...
_______________________________________

Booting threads...
Threads running...
Threads completed!

node_allocations = 92400000
node_deallocations = 92400000

dtor_collect = 11
release_collect = 171
quiesce_complete = 182
quiesce_begin = 182
quiesce_complete_nodes = 92400000

Test Completed!

real 0m24.975s
user 6m36.626s
sys 0m0.587s

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 09 12:33PM -0800

On 2/9/2021 5:03 AM, Paavo Helde wrote:
> sys 0m3.295s

> I did not measure the times yesterday, but it feels like the new version
> is slower.

Thanks for running it again. Your output means everything worked as it
should. node_allocations == node_deallocations, and
quiesce_complete_nodes is <= node_allocations. Also, quiesce_complete ==
quiesce_begin.

Now, this version 0.0.2 should definitely be slower than 0.0.1 because
of several things. One is that it creates more threads and allocates a
lot more nodes.

Take a careful look at ver 0.0.1 wrt:
_________________________
// Iteration settings
static constexpr unsigned long ct_reader_iters_n = 1000000;
static constexpr unsigned long ct_writer_iters_n = 100000;

// Thread counts
static constexpr unsigned long ct_reader_threads_n = 42;
static constexpr unsigned long ct_writer_threads_n = 7;
_________________________

Vs. ver 0.0.2 wrt:
_________________________
// Iteration settings
static constexpr unsigned long ct_reader_iters_n = 2000000;
static constexpr unsigned long ct_writer_iters_n = 200000;

// Thread counts
static constexpr unsigned long ct_reader_threads_n = 53;
static constexpr unsigned long ct_writer_threads_n = 11;
_________________________

Then there is the debug stuff that will make it go slower as well...

Now, there is another interesting aspect. Ver 0.0.1 can sometimes miss
collection cycles allowing the memory to grow. This means that calls to
delete are far less then they need to be when the threads are running.
Hence memory growing. So, it skips a lot of calls to delete, which makes
it run faster.

Ver 0.0.2 does not miss _any_ collection cycles. So, it will invoke
delete a lot more times when the threads are running.

In the properly working 0.0.2, this can be adjusted with the template
parameter std::size_t T_defer_limit in the ct_proxy class. Basically, it
waits to actually begin a quiescence process prv_quiesce_begin() until
the number of deferred nodes is greater than or equal to T_defer_limit.

Don't ask me why I introduced T_defer_limit as a template parameter.

Uggg.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 09 12:44PM -0800

On 2/9/2021 6:33 AM, Manfred wrote:

> real    0m24.975s
> user    6m36.626s
> sys    0m0.587s

Perfect. version 0.0.2 should be a lot slower. Please read the response
I gave to Paavo. Since ver 0.0.2 creates more threads and doubles the
iterations of ver 0.0.1, adds more debug interference, well, it
basically has to run slower. The interesting part is that ver 0.0.2 does
not miss any collection cycles. So, calls to delete are more frequent
while the threads are churning along. This can actually break down into
the performance of the underlying memory allocator being hammered with a
lot of activity.

Humm... It would be fun to test raw new/delete vs a simple lock-free
pooling allocator in this proxy gc context.

Also, the template parameter std::size_t T_defer_limit in the ct_proxy
class can effect how many nodes are deferred before a collection cycle
is triggered.

Thanks again for giving it a go. Now, I think its time for me to give it
a proper home over on github.

:^)

"The weirdest compiler bug" by Scott Rasmussen

Lynn McGuire <lynnmcguire5@gmail.com>: Feb 08 05:39PM -0600

"The weirdest compiler bug" by Scott Rasmussen
https://blog.zaita.com/mingw64-compiler-bug/

"There are approximately 7.5x10^18 grains of sand on Earth. This story
is about finding changes in an equation that has a difference of
approximately 1e-18 out of hundreds of billions of calculations. That is
7 grains of sand that are different to what we expect across the entire
planet Earth."

"After spending days generating gigabytes of debug logs and GDB
breakpoints, I finally discovered a very peculiar bug in the compiler. I
thought this would be an interesting story to tell."

Lynn

Juha Nieminen <nospam@thanks.invalid>: Feb 09 09:28AM

> "After spending days generating gigabytes of debug logs and GDB
> breakpoints, I finally discovered a very peculiar bug in the compiler. I
> thought this would be an interesting story to tell."

You ain't debugged a compiler error until you have encountered one... on a
compiler for an 8-bit CPU (yes, those are still being used) in an embedded
system where your only way to debug is by writing things to an UART, and
where adding code to the program (to write debug values to that UART)
affects the bug.

Yes, been there, done that. It was a nightmare to find. Quite rewarding
when I finally found it (and was able to circumnavigate it by changing
the code in such a manner that it didn't trigger the bug).

(If you are curious, the compiler in question was sdcc.)

David Brown <david.brown@hesbynett.no>: Feb 09 12:13PM +0100

On 09/02/2021 10:28, Juha Nieminen wrote:
> when I finally found it (and was able to circumnavigate it by changing
> the code in such a manner that it didn't trigger the bug).

> (If you are curious, the compiler in question was sdcc.)

Ha! Young folk today, complaining that they have only a UART for
debugging. When I were a lad, we used a voltmeter on a spare pin as the
debugger. The first debugger I had for assembly programming was
listening to the sound of the computer's power supply as types of work
was done.

(Yes, compiler bugs are a real pain. I've come across a few over the
years, in different compilers.)

Manfred <noname@add.invalid>: Feb 09 04:11PM +0100

On 2/9/2021 12:39 AM, Lynn McGuire wrote:
> breakpoints, I finally discovered a very peculiar bug in the compiler. I
> thought this would be an interesting story to tell."

> Lynn

Even if, as far as I can remember, it is the only Windows port of GCC,
MinGW is not officially supported by the gcc team.

Paavo Helde <myfirstname@osa.pri.ee>: Feb 09 07:50PM +0200

09.02.2021 17:11 Manfred kirjutas:

>> Lynn

> Even if, as far as I can remember, it is the only Windows port of GCC,
> MinGW is not officially supported by the gcc team.

There is at least also Cygwin port of GCC. But I'm quite sure this is
not officially supported by the gcc team either.

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 09 12:14PM -0800

On 2/8/2021 3:39 PM, Lynn McGuire wrote:

> "After spending days generating gigabytes of debug logs and GDB
> breakpoints, I finally discovered a very peculiar bug in the compiler. I
> thought this would be an interesting story to tell."

Fwiw, a long time ago there was a "bug?" in GCC that could prevent POSIX
pthread_mutex_trylock from working correctly:

https://groups.google.com/g/comp.programming.threads/c/Y_Y2DZOWErM/m/JI3i5zlA2H0J

It was an optimization that would screw things up pretty bad. The scary
part is that it could introduce a race-condition. A lot of times the
program would work, other times it could silently corrupt data. Those
are pretty damn hard to debug.

There are a lot of smart people commenting on that thread. Heck, even I
am there. ;^)

negating unsigned....

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 08 10:48PM -0800

On 2/7/2021 3:34 PM, Chris M. Thomasson wrote:
> implementation of mine, only a single thread can collect and its all
> unsigned. Therefore I can just use:

> if (old_refs == refs)

Oh well, shit. I should be really doing this:

if (old_refs == 0 - refs)

The proxy collector does work without the subtraction, but it can miss
collections, and memory can start to grow a bit, when it does not have to.

Need to add this into the next version.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Tuesday, February 9, 2021

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

No comments:

Blog Archive

About Me