- Looking for the tgaimg exe/code - 9 Updates
- Poor Mans RCU... - 9 Updates
- "The weirdest compiler bug" by Scott Rasmussen - 6 Updates
- negating unsigned.... - 1 Update
Lew Pitcher <lew.pitcher@digitalfreehold.ca>: Feb 08 11:38PM On Mon, 08 Feb 2021 22:53:26 +0000, Mike Garcia wrote: > I'm looking for an old unix/win95'ish commandline executable. > "TGA2IMG takes an uncompressed 24-bit Targa file and converts it to a > format that was used by the Vivid raytracer." [snip] > Anyone got a win32 exe or the source code to tga2img? Not me, nor anyone that I know. /But/... Slackware Linux still provides a tgatoppm program that converts Targa files to Portable Pixmap files. And, it appears that the tgatoppm program is part of the NetPBM package available at http://netpbm.sourceforge.net/ Since tgatoppm has an Open Source licence, you can get the sourcecode from the NetPBM site. Would the tgatoppm source code help you? -- Lew Pitcher "In Skills, We Trust" |
Eli the Bearded <*@eli.users.panix.com>: Feb 09 12:03AM > It's an old program witch converted tga images to other formats, including > c code and header files (like bin2c), I can't seem to find the original > exe or code to it on github or SF or a web search. There's a lot of old software on ISOs at archive.org. Unfortunately many of the ISOs don't provide searchable file listings. If I were desparate, I'd start looking there. Elijah ------ archive.org has a real "find this item" problem |
Stef <stef33d@yahooI-N-V-A-L-I-D.com.invalid>: Feb 09 09:26AM +0100 On 2021-02-08 Mike Garcia wrote in comp.lang.c: > It's an old program witch converted tga images to other formats, including > c code and header files (like bin2c), I can't seem to find the original > exe or code to it on github or SF or a web search. Are you looking for this specific program or a way to open Targa files and save them as C header files? If the latter, you can probably use GIMP. It can open targa files and save as C header files. -- Stef (remove caps, dashes and .invalid from e-mail address to reply by mail) Cogito cogito ergo cogito sum -- "I think that I think, therefore I think that I am." -- Ambrose Bierce, "The Devil's Dictionary" |
Anton Shepelev <anton.txt@g{oogle}mail.com>: Feb 09 04:13PM +0300 Mike Garcia: > "TGA2IMG takes an uncompressed 24-bit Targa file and > converts it to a format that was used by the Vivid > raytracer." Is it a quotation from the manual? > github or SF or a web search. > Here's what the usage command looks like: > tga2img -q -r4 -i320,0 -p0,480 file.tga Is it an exact sample invocation of the program? > I've found 2 different programs but it's not it: > [...] I got the same results, but I have also found some alternative software that can convert .tga to the Vivid raytracer .img, but you seem to need that specific utility. -- () ascii ribbon campaign - against html e-mail /\ http://preview.tinyurl.com/qcy6mjc [archived] |
Eli the Bearded <*@eli.users.panix.com>: Feb 09 05:54PM > and save them as C header files? > If the latter, you can probably use GIMP. It can open targa files and > save as C header files. There's "tgatoppm foo.targa | ppmtoxpm > foo.c" using netpbm tools, for at least one flavor of image-as-C-code image. I'd expect the particular flavor of C encoding is significant. Elijah ------ for many purposes PPM is easier to use in C than XPM |
Mike Garcia <mike@mgarcia.nospam>: Feb 09 10:32PM On Mon, 08 Feb 2021 23:38:31 +0000, Lew Pitcher wrote: > Since tgatoppm has an Open Source licence, you can get the sourcecode > from the NetPBM site. > Would the tgatoppm source code help you? Hey, thanks for the reply, It's used in a makefile, I just wanted the same exe, but i'll have a look. thanks Mike. -- Mike Garcia http://mgarcia.org |
Mike Garcia <mike@mgarcia.nospam>: Feb 09 10:37PM On Tue, 09 Feb 2021 00:03:49 +0000, Eli the Bearded wrote: > I'd start looking there. > Elijah ------ > archive.org has a real "find this item" problem Hey, thanks for the reply, yeah, I think this program predates ISOs, so... it's probably on floppy images! That's why I'm asking on newsgroup first.. i'll ask DOS groups next I guess. -- Mike Garcia http://mgarcia.org |
Mike Garcia <mike@mgarcia.nospam>: Feb 09 10:39PM On Tue, 09 Feb 2021 09:26:09 +0100, Stef wrote: > and save them as C header files? > If the latter, you can probably use GIMP. It can open targa files and > save as C header files. Hey, thanks for the reply, sorry I should have been more clear, it's for a few makefiles that use it, it's no big deal, bin2c works. Just would have been nice to use the same exe. -- Mike Garcia http://mgarcia.org |
Mike Garcia <mike@mgarcia.nospam>: Feb 09 11:01PM On Tue, 09 Feb 2021 16:13:28 +0300, Anton Shepelev wrote: >> "TGA2IMG takes an uncompressed 24-bit Targa file and converts it to a >> format that was used by the Vivid raytracer." > Is it a quotation from the manual? It's from the only info I could find on tga2img! https://groups.google.com/g/comp.graphics.apps.paint-shop-pro/c/ XXn09MH2tpM/m/W0f72jKxhCAJ >> Here's what the usage command looks like: >> tga2img -q -r4 -i320,0 -p0,480 file.tga > Is it an exact sample invocation of the program? Yes, from a makefile: stars.img: stars.tga tga2img -q -r8 -i448,0 -p0,482 $< from a dos batch file: tga2img -r8 kf_d0_f0.tga > I got the same results, but I have also found some alternative software > that can convert .tga to the Vivid raytracer .img, but you seem to need > that specific utility. Thanks for your time, I'll keep looking. If I find I'll upload it to archive.org and drop a link in the thread. -- Mike Garcia http://mgarcia.org |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 08 07:20PM -0800 On 2/8/2021 4:50 AM, Manfred wrote: > node_allocations = 29400000 > node_deallocations = 29400000 > Test Completed! Perfect output. That is the exact number I am looking for. Thank you so much for taking the time to give it a go Manfred. Now, its time to move this over to github. I am writing my response to another kind person, Paavo, who also took the time and energy to compile _and_ run the sucker. I am currently writing up some other use cases for my poor mans rcu. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 08 07:35PM -0800 On 2/8/2021 11:14 AM, Paavo Helde wrote: > node_allocations = 29400000 > node_deallocations = 29400000 > Test Completed! Wonderful! First off thanks Paavo, and Manfred. Its nice to see my code being run on a diverse set of arch's and os's. Okay, your output is perfect: you got the numbers I am looking for. Time for me to move it to github, and create other use cases. Actually, this version 0.0.1 is a pretty hardcore test. Usually RCU does not like heavy writer activity. Hence the reader to writer thread ratio, at 42:7. However, those seven writer threads are assaulting the poor mans proxy collector. I need to create a benchmark test wrt a read/write mutex vs. my proxy gc. Since rwmutex is the perfect thing to test against, well, I am currently coding one up. Wrt this type of benchmark, we are going to measure the number of reads-per-second, per-thread with my poor RCU vs rwmutex. Since writers do not block readers, and vise versa, and from past experience, well... The rwmutex does not really stand a chance. So, I will post a link to my new entry into gibhub for my code. Using pastebin is not ideal. Thanks again for taking the time to run my code Paavo. I really do appreciate it. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 08 11:05PM -0800 On 2/7/2021 8:20 PM, Chris M. Thomasson wrote: > https://pastebin.com/raw/nPVYXbWM > _____________________________________ > // Chris M. Thomassons Poor Mans RCU... Example 123... [...] > std::uint32_t old_refs = c.m_count.fetch_add(refs + > ct_proxy_quiescent, std::memory_order_release); > if (old_refs == refs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There is a rather odd issue with this. It can sometimes miss collector cycles. Everything works as far as allowing concurrent reads and writes to the data-structure, but its still a big issue. Correcting it makes the collector more efficient. It will be corrected in my next version. Btw, Manfred and Pavvo, here is the fix: if (old_refs == 0 - refs) Its an odd bug in a sense, because it works either way wrt allowing for lockfree reads and writes to a shared data-structure, except one can hold up some collection cycles. The condition is hard to trip. > } > } > } [...] |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 08 11:31PM -0800 On 2/7/2021 8:20 PM, Chris M. Thomasson wrote: > Well, here is a test program. When you get some free time, can you > please try to run it; give it a go? I need to ask for one more favor, especially to Manfred and Pavvo. Can you please run this version 0.0.2? It corrected a reference counting issue in version 0.0.1. Even though the first version works, it can sometimes miss a collection cycle, and allow memory to grow when it does not have to. Here is the new version and it contains a lot more debug stuff to show how your system is reacting to the algorihtm. I am sorry for missing that counting issue in version 0.0.1. If you can please, when you get free time, run it again? I need to see your output. https://pastebin.com/raw/CYZ78gVj Here is my output from a run: Chris M. Thomassons Proxy Collector Port ver .0.0.2... _______________________________________ Booting threads... Threads running... Threads completed! node_allocations = 92400000 node_deallocations = 92400000 dtor_collect = 4 release_collect = 120 quiesce_complete = 124 quiesce_begin = 124 quiesce_complete_nodes = 92200000 Test Completed! ---- quiesce_complete should always equal quiesce_begin and node_allocations should always equal node_deallocations. Also, quiesce_complete_nodes should always be <= node_allocations This version performs much better. Here is the code: _______________________________ // Chris M. Thomassons Poor Mans RCU... Example 456... #include <iostream> #include <atomic> #include <thread> #include <cstdlib> #include <cstdint> #include <climits> #include <functional> // Masks static constexpr std::uint32_t ct_ref_mask = 0xFFFFFFF0U; static constexpr std::uint32_t ct_ref_complete = 0x30U; static constexpr std::uint32_t ct_ref_inc = 0x20U; static constexpr std::uint32_t ct_proxy_mask = 0xFU; static constexpr std::uint32_t ct_proxy_quiescent = 0x10U; // Iteration settings static constexpr unsigned long ct_reader_iters_n = 2000000; static constexpr unsigned long ct_writer_iters_n = 200000; // Thread counts static constexpr unsigned long ct_reader_threads_n = 53; static constexpr unsigned long ct_writer_threads_n = 11; // Some debug/sanity check things... // Need to make this conditional in compilation with some macros... static std::atomic<std::uint32_t> g_debug_node_allocations(0); static std::atomic<std::uint32_t> g_debug_node_deallocations(0); static std::atomic<std::uint32_t> g_debug_dtor_collect(0); static std::atomic<std::uint32_t> g_debug_release_collect(0); static std::atomic<std::uint32_t> g_debug_quiesce_begin(0); static std::atomic<std::uint32_t> g_debug_quiesce_complete(0); static std::atomic<std::uint32_t> g_debug_quiesce_complete_nodes(0); // Need to align and pad data structures! To do... struct ct_node { std::atomic<ct_node*> m_next; ct_node* m_defer_next; ct_node() : m_next(nullptr), m_defer_next(nullptr) { g_debug_node_allocations.fetch_add(1, std::memory_order_relaxed); } ~ct_node() { g_debug_node_deallocations.fetch_add(1, std::memory_order_relaxed); } }; // The proxy collector itself... :^) template<std::size_t T_defer_limit> class ct_proxy { static std::uint32_t prv_destroy(ct_node* n) { std::uint32_t count = 0; while (n) { ct_node* next = n->m_defer_next; delete n; count++; n = next; } return count; } public: class collector { friend class ct_proxy; private: std::atomic<ct_node*> m_defer; std::atomic<std::uint32_t> m_defer_count; std::atomic<std::uint32_t> m_count; public: collector() : m_defer(nullptr), m_defer_count(0), m_count(0) { } ~collector() { prv_destroy(m_defer.load(std::memory_order_relaxed)); } }; private: std::atomic<std::uint32_t> m_current; std::atomic<bool> m_quiesce; ct_node* m_defer; collector m_collectors[2]; public: ct_proxy() : m_current(0), m_quiesce(false), m_defer(nullptr) { } ~ct_proxy() { prv_destroy(m_defer); } private: void prv_quiesce_begin() { // Try to begin the quiescence process. if (! m_quiesce.exchange(true, std::memory_order_acquire)) { g_debug_quiesce_begin.fetch_add(1, std::memory_order_relaxed); // advance the current collector and grab the old one. std::uint32_t old = m_current.load(std::memory_order_relaxed) & ct_proxy_mask; old = m_current.exchange((old + 1) & 1, std::memory_order_acq_rel); collector& c = m_collectors[old & ct_proxy_mask]; // decode reference count. std::uint32_t refs = old & ct_ref_mask; // increment and generate an odd reference count. std::uint32_t old_refs = c.m_count.fetch_add(refs + ct_proxy_quiescent, std::memory_order_release); if (old_refs == 0 - refs) { g_debug_dtor_collect.fetch_add(1, std::memory_order_relaxed); // odd reference count and drop-to-zero condition detected! prv_quiesce_complete(c); } } } void prv_quiesce_complete(collector& c) { g_debug_quiesce_complete.fetch_add(1, std::memory_order_relaxed); // the collector `c' is now in a quiescent state! :^) std::atomic_thread_fence(std::memory_order_acquire); // maintain the back link and obtain "fresh" objects from // this collection. ct_node* n = m_defer; m_defer = c.m_defer.load(std::memory_order_relaxed); c.m_defer.store(0, std::memory_order_relaxed); // reset the reference count. c.m_count.store(0, std::memory_order_relaxed); c.m_defer_count.store(0, std::memory_order_relaxed); // release the quiesce lock. m_quiesce.store(false, std::memory_order_release); // destroy nodes. std::uint32_t count = prv_destroy(n); g_debug_quiesce_complete_nodes.fetch_add(count, std::memory_order_relaxed); } public: collector& acquire() { // increment the master count _and_ obtain current collector. std::uint32_t current = m_current.fetch_add(ct_ref_inc, std::memory_order_acquire); // decode the collector index. return m_collectors[current & ct_proxy_mask]; } void release(collector& c) { // decrement the collector. std::uint32_t count = c.m_count.fetch_sub(ct_ref_inc, std::memory_order_release); // check for the completion of the quiescence process. if ((count & ct_ref_mask) == ct_ref_complete) { // odd reference count and drop-to-zero condition detected! g_debug_release_collect.fetch_add(1, std::memory_order_relaxed); prv_quiesce_complete(c); } } collector& sync(collector& c) { // check if the `c' is in the middle of a quiescence process. if (c.m_count.load(std::memory_order_relaxed) & ct_proxy_quiescent) { // drop `c' and get the next collector. release(c); return acquire(); } return c; } void collect() { prv_quiesce_begin(); } void collect(collector& c, ct_node* n) { if (! n) return; // link node into the defer list. ct_node* prev = c.m_defer.exchange(n, std::memory_order_relaxed); n->m_defer_next = prev; // bump the defer count and begin quiescence process if over // the limit. std::uint32_t count = c.m_defer_count.fetch_add(1, std::memory_order_relaxed) + 1; if (count >= (T_defer_limit / 2)) { prv_quiesce_begin(); } } }; typedef ct_proxy<10> ct_proxy_collector; // you're basic lock-free stack... // well, minus ABA counter and DWCAS of course! ;^) class ct_stack { std::atomic<ct_node*> m_head; public: ct_stack() : m_head(nullptr) { } public: void push(ct_node* n) { ct_node* head = m_head.load(std::memory_order_relaxed); do { n->m_next.store(head, std::memory_order_relaxed); } while (! m_head.compare_exchange_weak( head, n, std::memory_order_release)); } ct_node* flush() { return m_head.exchange(nullptr, std::memory_order_acquire); } ct_node* get_head() { return m_head.load(std::memory_order_acquire); } ct_node* pop() { ct_node* head = m_head.load(std::memory_order_acquire); ct_node* xchg; do { if (! head) return nullptr; xchg = head->m_next.load(std::memory_order_relaxed); } while (!m_head.compare_exchange_weak( head, xchg, std::memory_order_acquire)); return head; } }; // The shared state struct ct_shared { ct_proxy<10> m_proxy_gc; ct_stack m_stack; }; // Reader threads // Iterates through the lock free stack void ct_thread_reader(ct_shared& shared) { // iterate the lockfree stack for (unsigned long i = 0; i < ct_reader_iters_n; ++i) { ct_proxy_collector::collector& c = shared.m_proxy_gc.acquire(); ct_node* n = shared.m_stack.get_head(); while (n) { // need to add in some processing... // std::this_thread::yield(); n = n->m_next.load(std::memory_order_relaxed); } shared.m_proxy_gc.release(c); } } // Writer threads // Mutates the lock free stack void ct_thread_writer(ct_shared& shared) { for (unsigned long wloop = 0; wloop < 42; ++wloop) { shared.m_proxy_gc.collect(); for (unsigned long i = 0; i < ct_writer_iters_n; ++i) { shared.m_stack.push(new ct_node()); } //std::this_thread::yield(); ct_proxy_collector::collector& c = shared.m_proxy_gc.acquire(); for (unsigned long i = 0; i < ct_writer_iters_n; ++i) { shared.m_proxy_gc.collect(c, shared.m_stack.pop()); } shared.m_proxy_gc.release(c); for (unsigned long i = 0; i < ct_writer_iters_n / 2; ++i) { shared.m_proxy_gc.collect(); } { ct_proxy_collector::collector& c = shared.m_proxy_gc.acquire(); for (unsigned long i = 0; i < ct_writer_iters_n; ++i) { ct_node* n = shared.m_stack.pop(); if (! n) break; shared.m_proxy_gc.collect(c, n); } shared.m_proxy_gc.release(c); } if ((wloop % 3) == 0) { shared.m_proxy_gc.collect(); } } } int main() { std::cout << "Chris M. Thomassons Proxy Collector Port ver .0.0.2...\n"; std::cout << "_______________________________________\n\n"; { ct_shared shared; std::thread readers[ct_reader_threads_n]; std::thread writers[ct_writer_threads_n]; std::cout << "Booting threads...\n"; for (unsigned long i = 0; i < ct_writer_threads_n; ++i) { writers[i] = std::thread(ct_thread_writer, std::ref(shared)); } for (unsigned long i = 0; i < ct_reader_threads_n; ++i) { readers[i] = std::thread(ct_thread_reader, std::ref(shared)); } std::cout << "Threads running...\n"; for (unsigned long i = 0; i < ct_reader_threads_n; ++i) { readers[i].join(); } for (unsigned long i = 0; i < ct_writer_threads_n; ++i) { writers[i].join(); } } std::cout << "Threads completed!\n\n"; // Sanity check! { std::uint32_t node_allocations = g_debug_node_allocations.load(std::memory_order_relaxed); std::uint32_t node_deallocations = g_debug_node_deallocations.load(std::memory_order_relaxed); std::uint32_t dtor_collect = g_debug_dtor_collect.load(std::memory_order_relaxed); std::uint32_t release_collect = g_debug_release_collect.load(std::memory_order_relaxed); std::uint32_t quiesce_complete = g_debug_quiesce_complete.load(std::memory_order_relaxed); std::uint32_t quiesce_begin = g_debug_quiesce_begin.load(std::memory_order_relaxed); std::uint32_t quiesce_complete_nodes = g_debug_quiesce_complete_nodes.load(std::memory_order_relaxed); std::cout << "node_allocations = " << node_allocations << "\n"; std::cout << "node_deallocations = " << node_deallocations << "\n\n"; std::cout << "dtor_collect = " << dtor_collect << "\n"; std::cout << "release_collect = " << release_collect << "\n"; std::cout << "quiesce_complete = " << quiesce_complete << "\n"; std::cout << "quiesce_begin = " << quiesce_begin << "\n"; std::cout << "quiesce_complete_nodes = " << quiesce_complete_nodes << "\n"; if (node_allocations != node_deallocations) { std::cout << "OH SHIT! NODE LEAK!!! SHIT! = " << node_allocations - node_deallocations << "\n\n"; } } std::cout << "\n\nTest Completed!\n\n"; return 0; } _______________________________ |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 09 03:03PM +0200 09.02.2021 09:31 Chris M. Thomasson kirjutas: > stuff to show how your system is reacting to the algorihtm. I am sorry > for missing that counting issue in version 0.0.1. If you can please, > when you get free time, run it again? I need to see your output. Win10 MSVC2017, Intel Xeon E-2286M, 8 phys cores > time ../x64/Release/ConsoleTestVS2017.exe Chris M. Thomassons Proxy Collector Port ver .0.0.2... _______________________________________ Booting threads... Threads running... Threads completed! node_allocations = 92400000 node_deallocations = 92400000 dtor_collect = 33 release_collect = 753 quiesce_complete = 786 quiesce_begin = 786 quiesce_complete_nodes = 92400000 Test Completed! real 0m19.027s user 0m0.015s sys 0m0.000s --------------------------------------------------------------------- gcc 7.4.0 on Linux: no issues (2 NUMA nodes, 24 physical cores) > g++ -std=c++11 -O2 -Wall main.cpp -lpthread > time ./a.out Chris M. Thomassons Proxy Collector Port ver .0.0.2... _______________________________________ Booting threads... Threads running... Threads completed! node_allocations = 92400000 node_deallocations = 92400000 dtor_collect = 11 release_collect = 152 quiesce_complete = 163 quiesce_begin = 163 quiesce_complete_nodes = 92400000 Test Completed! real 0m43.760s user 34m31.243s sys 0m3.295s I did not measure the times yesterday, but it feels like the new version is slower. |
Manfred <noname@add.invalid>: Feb 09 03:01PM +0100 On 2/9/2021 8:31 AM, Chris M. Thomasson wrote: > for missing that counting issue in version 0.0.1. If you can please, > when you get free time, run it again? I need to see your output. > https://pastebin.com/raw/CYZ78gVj I'm posting the timed run from both yesterday's and today's versions. Yesterday's code runs faster, I have no idea if that's due to extra debugging. BTW I forgot to mention that these are run on a VM with 8 cores. Later on I'll run them on non-virtualized hardware. $ c++ -std=c++11 -Wall -O2 -lpthread rcu_chris.cc && time ./a.out Chris M. Thomassons Proxy Collector Port ver .0.0.1... _______________________________________ Booting threads... Threads running... Threads completed! node_allocations = 29400000 node_deallocations = 29400000 Test Completed! real 0m6.890s user 0m54.270s sys 0m0.092s $ c++ -std=c++11 -Wall -O2 -lpthread rcu_chris.0.0.2.cc && time ./a.out Chris M. Thomassons Proxy Collector Port ver .0.0.2... _______________________________________ Booting threads... Threads running... Threads completed! node_allocations = 92400000 node_deallocations = 92400000 dtor_collect = 11 release_collect = 178 quiesce_complete = 189 quiesce_begin = 189 quiesce_complete_nodes = 92400000 Test Completed! real 0m22.231s user 2m56.383s sys 0m0.249s |
Manfred <noname@invalid.add>: Feb 09 03:33PM +0100 On 2/9/21 3:01 PM, Manfred wrote: > debugging. > BTW I forgot to mention that these are run on a VM with 8 cores. Later > on I'll run them on non-virtualized hardware. Here they are, running on bare metal. GCC 9.3.1 $ c++ -std=c++11 -Wall -O2 -lpthread rcu_chris.0.0.1.cc && time ./a.out Chris M. Thomassons Proxy Collector Port ver .0.0.1... _______________________________________ Booting threads... Threads running... Threads completed! node_allocations = 29400000 node_deallocations = 29400000 Test Completed! real 0m8.814s user 2m19.340s sys 0m0.242s $ c++ -std=c++11 -Wall -O2 -lpthread rcu_chris.0.0.2.cc && time ./a.out Chris M. Thomassons Proxy Collector Port ver .0.0.2... _______________________________________ Booting threads... Threads running... Threads completed! node_allocations = 92400000 node_deallocations = 92400000 dtor_collect = 11 release_collect = 171 quiesce_complete = 182 quiesce_begin = 182 quiesce_complete_nodes = 92400000 Test Completed! real 0m24.975s user 6m36.626s sys 0m0.587s |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 09 12:33PM -0800 On 2/9/2021 5:03 AM, Paavo Helde wrote: > sys 0m3.295s > I did not measure the times yesterday, but it feels like the new version > is slower. Thanks for running it again. Your output means everything worked as it should. node_allocations == node_deallocations, and quiesce_complete_nodes is <= node_allocations. Also, quiesce_complete == quiesce_begin. Now, this version 0.0.2 should definitely be slower than 0.0.1 because of several things. One is that it creates more threads and allocates a lot more nodes. Take a careful look at ver 0.0.1 wrt: _________________________ // Iteration settings static constexpr unsigned long ct_reader_iters_n = 1000000; static constexpr unsigned long ct_writer_iters_n = 100000; // Thread counts static constexpr unsigned long ct_reader_threads_n = 42; static constexpr unsigned long ct_writer_threads_n = 7; _________________________ Vs. ver 0.0.2 wrt: _________________________ // Iteration settings static constexpr unsigned long ct_reader_iters_n = 2000000; static constexpr unsigned long ct_writer_iters_n = 200000; // Thread counts static constexpr unsigned long ct_reader_threads_n = 53; static constexpr unsigned long ct_writer_threads_n = 11; _________________________ Then there is the debug stuff that will make it go slower as well... Now, there is another interesting aspect. Ver 0.0.1 can sometimes miss collection cycles allowing the memory to grow. This means that calls to delete are far less then they need to be when the threads are running. Hence memory growing. So, it skips a lot of calls to delete, which makes it run faster. Ver 0.0.2 does not miss _any_ collection cycles. So, it will invoke delete a lot more times when the threads are running. In the properly working 0.0.2, this can be adjusted with the template parameter std::size_t T_defer_limit in the ct_proxy class. Basically, it waits to actually begin a quiescence process prv_quiesce_begin() until the number of deferred nodes is greater than or equal to T_defer_limit. Don't ask me why I introduced T_defer_limit as a template parameter. Uggg. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 09 12:44PM -0800 On 2/9/2021 6:33 AM, Manfred wrote: > real 0m24.975s > user 6m36.626s > sys 0m0.587s Perfect. version 0.0.2 should be a lot slower. Please read the response I gave to Paavo. Since ver 0.0.2 creates more threads and doubles the iterations of ver 0.0.1, adds more debug interference, well, it basically has to run slower. The interesting part is that ver 0.0.2 does not miss any collection cycles. So, calls to delete are more frequent while the threads are churning along. This can actually break down into the performance of the underlying memory allocator being hammered with a lot of activity. Humm... It would be fun to test raw new/delete vs a simple lock-free pooling allocator in this proxy gc context. Also, the template parameter std::size_t T_defer_limit in the ct_proxy class can effect how many nodes are deferred before a collection cycle is triggered. Thanks again for giving it a go. Now, I think its time for me to give it a proper home over on github. :^) |
Lynn McGuire <lynnmcguire5@gmail.com>: Feb 08 05:39PM -0600 "The weirdest compiler bug" by Scott Rasmussen https://blog.zaita.com/mingw64-compiler-bug/ "There are approximately 7.5x10^18 grains of sand on Earth. This story is about finding changes in an equation that has a difference of approximately 1e-18 out of hundreds of billions of calculations. That is 7 grains of sand that are different to what we expect across the entire planet Earth." "After spending days generating gigabytes of debug logs and GDB breakpoints, I finally discovered a very peculiar bug in the compiler. I thought this would be an interesting story to tell." Lynn |
Juha Nieminen <nospam@thanks.invalid>: Feb 09 09:28AM > "After spending days generating gigabytes of debug logs and GDB > breakpoints, I finally discovered a very peculiar bug in the compiler. I > thought this would be an interesting story to tell." You ain't debugged a compiler error until you have encountered one... on a compiler for an 8-bit CPU (yes, those are still being used) in an embedded system where your only way to debug is by writing things to an UART, and where adding code to the program (to write debug values to that UART) affects the bug. Yes, been there, done that. It was a nightmare to find. Quite rewarding when I finally found it (and was able to circumnavigate it by changing the code in such a manner that it didn't trigger the bug). (If you are curious, the compiler in question was sdcc.) |
David Brown <david.brown@hesbynett.no>: Feb 09 12:13PM +0100 On 09/02/2021 10:28, Juha Nieminen wrote: > when I finally found it (and was able to circumnavigate it by changing > the code in such a manner that it didn't trigger the bug). > (If you are curious, the compiler in question was sdcc.) Ha! Young folk today, complaining that they have only a UART for debugging. When I were a lad, we used a voltmeter on a spare pin as the debugger. The first debugger I had for assembly programming was listening to the sound of the computer's power supply as types of work was done. (Yes, compiler bugs are a real pain. I've come across a few over the years, in different compilers.) |
Manfred <noname@add.invalid>: Feb 09 04:11PM +0100 On 2/9/2021 12:39 AM, Lynn McGuire wrote: > breakpoints, I finally discovered a very peculiar bug in the compiler. I > thought this would be an interesting story to tell." > Lynn Even if, as far as I can remember, it is the only Windows port of GCC, MinGW is not officially supported by the gcc team. |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 09 07:50PM +0200 09.02.2021 17:11 Manfred kirjutas: >> Lynn > Even if, as far as I can remember, it is the only Windows port of GCC, > MinGW is not officially supported by the gcc team. There is at least also Cygwin port of GCC. But I'm quite sure this is not officially supported by the gcc team either. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 09 12:14PM -0800 On 2/8/2021 3:39 PM, Lynn McGuire wrote: > "After spending days generating gigabytes of debug logs and GDB > breakpoints, I finally discovered a very peculiar bug in the compiler. I > thought this would be an interesting story to tell." Fwiw, a long time ago there was a "bug?" in GCC that could prevent POSIX pthread_mutex_trylock from working correctly: https://groups.google.com/g/comp.programming.threads/c/Y_Y2DZOWErM/m/JI3i5zlA2H0J It was an optimization that would screw things up pretty bad. The scary part is that it could introduce a race-condition. A lot of times the program would work, other times it could silently corrupt data. Those are pretty damn hard to debug. There are a lot of smart people commenting on that thread. Heck, even I am there. ;^) |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Feb 08 10:48PM -0800 On 2/7/2021 3:34 PM, Chris M. Thomasson wrote: > implementation of mine, only a single thread can collect and its all > unsigned. Therefore I can just use: > if (old_refs == refs) Oh well, shit. I should be really doing this: if (old_refs == 0 - refs) The proxy collector does work without the subtraction, but it can miss collections, and memory can start to grow a bit, when it does not have to. Need to add this into the next version. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment