- An argument *against* (the liberal use of) references - 13 Updates
- Why is that usually faster than a normal string_view == comparison ? - 9 Updates
- A ℙ≠ℕℙ proof for review (v1.0) - 1 Update
- Brilliant - 1 Update
- Pause event handler, go enter another one, then come back - 1 Update
Michael S <already5chosen@yahoo.com>: Dec 02 04:11AM -0800 On Thursday, December 1, 2022 at 11:54:02 PM UTC+2, Paavo Helde wrote: > > inconsistent vocabulary. > Thanks, I had to look up what is "uncacheable memory". I guess the 80186 > processor where I learned my basics did not have such a thing. 80186 was "embedded" microprocessor similar at core to 8086. It was typically used with no cache so didn't need a concept of uncacheable regions. 80286 and especially i386 was used with (external) cache quite often, but according to my understanding their caches were what we call today "memory-side caches" associated with main memory. From system (both CPU and other bus masters) perspective such caches are totally transparent (except for entering/leaving deep sleep states, but back then they didn't do it) so there still was no need for uncacheable regions. System-side caches and associated problem first appear in x86 world in i486. Still, the in original i486 the system cache had strict write-through policy, so the problems were minor. Then came Pentium with 8 KB of write-back Data cache and a little later came new models of i486 with even bigger write-back cache and problems became quite real, especially because approximately in the same time PCI took over I/O bus role and suddenly multiple bus masters that were high-end curiosity before then, became common in consumer PC hardware. But even then x86 architecture lacked adequate answer to a new challenge. The first reasonable answer (MTRR registers) came only in PPro but it still had problems of scalability - too few regions. Later (P-III) they invented PAT which from theoretical point of view is inferior to MTRRs because in PAT scheme cachability is an attribute of virtual address rather than of physical address. But PAT is ultimately scalable and is one solution that, as long as OS does a proper plumbing, is one solution that can rule over all aspects of cachabilty. So it won. |
Michael S <already5chosen@yahoo.com>: Dec 02 05:53AM -0800 On Thursday, December 1, 2022 at 9:09:39 PM UTC+2, Scott Lurndal wrote: > It also depends on how the processor atomic instructions are implemented. > In legacy Intel/AMD systems, where the LOCK prefix is being used, the > systemwide lock is the only possibility [*]. Huh? There is absolute no relationship between what prefix is used (instruction encoding issue) and implementation. > the processing elements and the memory controllers and PCI root port > bridges, in which case, like ARM64, they can push the atomic op > all the way out to the endpoint/controller. x86 atomic operations have global order, which is somewhat stronger than "total order" of the rest of normal* x86 stores. The difference is that unlike "total order" global order makes no exceptions for store-to-load forwarding from core's local store queue. BTW, it means that your claim in post above "no overhead at all unless ..." is incorrect in the absolute sense. There is an overhead even without "unless". But the overhead in uncontended case is small - order of dozen or two of CPU clocks. So, undetectable in Paavo's case of only 1000 updates per second. For 1M updates per second impact would me detectable with precise time measurements and for 100M per second there would be big slowdown. Last September Intel published this manual: https://cdrdv2-public.intel.com/671368/architecture-instruction-set-extensions-programming-reference.pdf Manual contains new atomic instruction AADD/AAND/AOR/AXOR that provide weaker (WC) ordering in WB memory regions. The manual does not say when this instructions are going to be implemented nor if they will be implemented at all. It also does not explain in which situation they are expected to be useful. However one thing is clear: they will *not* be useful in typical user-mode code that deals with reference counting. May be, usable in userland in extreme fire-and-foget situations like counting events where counter is updated not too often, but often enough to matter, typically not from the same core as the last one and read approximately never. My guess is that this instructions were invented to help Optane DIMMs. So today, with Optane DIMMs officially dead, it would be logical for Intel to never implement this strange instructions that could easily lead to programmer's mistakes. [*] normal in this case means WB or UC. For WC stores it's more relaxed. |
scott@slp53.sl.home (Scott Lurndal): Dec 02 02:36PM >Huh? >There is absolute no relationship between what prefix is used (instruction >encoding issue) and implementation. The only way to specify an atomic access in those chips was to use the LOCK prefix (e.g. LOCK ADD generates an atomic add, et alia). |
Paavo Helde <eesnimi@osa.pri.ee>: Dec 02 06:33PM +0200 02.12.2022 15:53 Michael S kirjutas: > clocks. So, undetectable in Paavo's case of only 1000 updates per second. > For 1M updates per second impact would me detectable with precise time > measurements and for 100M per second there would be big slowdown. Just a minor note, earlier I posted numbers for only a single smartpointer, whereas in the full program there are probably tens or hundreds of thousands of them. I just measured it and the total rate of refcount changes is something like 5M per second. With this 5M/s rate I cannot see any slowdown (of using std::atomic<int> instead of int) in my measurements (with no contention), variations caused by other uncontrollable factors seem to be much larger. Maybe I should rerun this on a Linux box where the things are more stable. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 02 12:29PM -0800 On 12/2/2022 6:36 AM, Scott Lurndal wrote: > The only way to specify an atomic access in those chips was > to use the LOCK prefix (e.g. LOCK ADD generates an atomic > add, et alia). Iirc, XCHG has an implicit LOCK prefix? |
scott@slp53.sl.home (Scott Lurndal): Dec 02 08:55PM >> to use the LOCK prefix (e.g. LOCK ADD generates an atomic >> add, et alia). >Iirc, XCHG has an implicit LOCK prefix? Not sure about XCHG, but CMPXCHG requires the prefix when used in a multiprocessor system; on a uniprocessor it will be atomic because interrupts are always taken between instructions (unlike, the VAX, for instance, where certain instructions (MOVC3/5) can be interrupted and restarted). I suspect that XCHG has similar characteristics. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 02 12:58PM -0800 On 12/2/2022 12:55 PM, Scott Lurndal wrote: > where certain instructions (MOVC3/5) can be interrupted > and restarted). I suspect that XCHG has similar > characteristics. Iirc, XCHG is the _only_ atomic RMW instruction that has an _implicit_ LOCK prefix. CMPXCHG _needs_ the programmer to put in a LOCK prefix. |
Branimir Maksimovic <branimir.maksimovic@icloud.com>: Dec 02 09:09PM > where certain instructions (MOVC3/5) can be interrupted > and restarted). I suspect that XCHG has similar > characteristics. "lock" prefix causes the processor's bus-lock signal to be asserted during execution of the accompanying instruction. In a multiprocessor environment, the bus-lock signal insures that the processor has exclusive use of any shared memory while the signal is asserted. The "lock" prefix can be prepended only to the following instructions and only to those forms of the instructions where the destination operand is a memory operand: "add", "adc", "and", "btc", "btr", "bts", "cmpxchg", "cmpxchg8b", "dec", "inc", "neg", "not", "or", "sbb", "sub", "xor", "xadd" and "xchg". If the "lock" prefix is used with one of these instructions and the source operand is a memory operand, an undefined opcode exception may be generated. An undefined opcode exception will also be generated if the "lock" prefix is used with any instruction not in the above list. The "xchg" instruction always asserts the bus-lock signal regardless of the presence or absence of the "lock" prefix. -- 7-77-777 Evil Sinner! with software, you repeat same experiment, expecting different results... |
scott@slp53.sl.home (Scott Lurndal): Dec 02 09:18PM >> characteristics. >Iirc, XCHG is the _only_ atomic RMW instruction that has an _implicit_ >LOCK prefix. CMPXCHG _needs_ the programmer to put in a LOCK prefix. Yes, that is the case XCHG: "If a memory operand is referenced, the processor's locking protocol is automatically implemented for the duration of the exchange operation, regardless of the presence or absence of the LOCK prefix or of the value of the IOPL. (See the LOCK prefix description in this chapter for more information on the locking protocol.) |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 02 01:22PM -0800 On 12/2/2022 1:09 PM, Branimir Maksimovic wrote: > list. > The "xchg" instruction always asserts the bus-lock signal regardless of > the presence or absence of the "lock" prefix. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ BINGO! I had a strong feeling I was right. https://youtu.be/TnZrWWUFl8I |
Branimir Maksimovic <branimir.maksimovic@icloud.com>: Dec 02 09:36PM > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > BINGO! I had a strong feeling I was right. > https://youtu.be/TnZrWWUFl8I Nice music :p -- 7-77-777 Evil Sinner! with software, you repeat same experiment, expecting different results... |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 02 02:13PM -0800 On 12/2/2022 1:36 PM, Branimir Maksimovic wrote: >> BINGO! I had a strong feeling I was right. >> https://youtu.be/TnZrWWUFl8I > Nice music :p Thanks again, Branimir, for the quote of the docs. https://youtu.be/UZ2-FfXZlAU (super mario music in a live big band format? Nice... :^) My hat is off to you. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Dec 02 02:16PM -0800 On 12/2/2022 1:18 PM, Scott Lurndal wrote: > implemented for the duration of the exchange operation, regardless of the presence > or absence of the LOCK prefix or of the value of the IOPL. (See the LOCK prefix > description in this chapter for more information on the locking protocol.) Indeed. When I remembered this I kind of doubted myself for a moment. Then, I said well, I know its true, and if I am wrong then I must of fried my brain a bit. |
Mr Flibble <flibble@reddwarf.jmc.corp>: Dec 01 11:36PM On Thu, 01 Dec 2022 21:54:55 +0100, Bonita Montero wrote: > #include <atomic> > template<typename CharType, typename TraitsType> > requires std::same_as<std::make_unsigned_t<CharType>, unsigned char> || > #if defined(_WIN32) > __try { > return !str[sv.length()] && memcmp( str, sv.data(), sv.length() * > { > using sv_cit = std::string_view::const_iterator; > for( sv_cit itSv = sv.cbegin(), itSvEnd = sv.cend(); itSv ! = itSvEnd; > cStr( N, '*' ), > svStr( cStr ); > auto cmpStrcmp = []( char const *str, string_view const &sv ) -> bool { > return strcmp( str, sv.data() ) == 0; }; > auto cmpSvCmp = []( char const *str, string_view const &sv ) -> bool { > return svCmp( str, sv ); }; > using cmp_sig_t = bool (*)( char const *str, string_view const &sv ); > I'm using a volatile pointer to the comparison function to prevent any > optimizations on my own function to have a fair comparison against > memcmp(). Your code is terrible, egregiously terrible: buffer overflows are NEVER a good idea even in example code; and "using namespace std;" is bad too. /Flibble |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 02 05:30AM +0100 Am 02.12.2022 um 00:36 schrieb Mr Flibble: > Your code is terrible, egregiously terrible: buffer overflows are NEVER a > good idea even in example code; and "using namespace std;" is bad too. There's nothing wrong with my code and I gues you don't even understand it; you're just a bad programmer. |
Muttley@dastardlyhq.com: Dec 02 10:09AM On Fri, 2 Dec 2022 05:30:45 +0100 >> good idea even in example code; and "using namespace std;" is bad too. >There's nothing wrong with my code and I gues you >don't even understand it; you're just a bad programmer. You think everyone is a bad programmer if they don't recognise your coding genuise. I don't agree with Flibble about much but he's right about this. Your code is almost always an overcomplicated mess. I don't know whether its because you're showing off or you simply can't think clearly but I'm glad I'll never have to maintain it. |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 02 03:22PM +0100 > You think everyone is a bad programmer if they don't recognise your coding > genuise. ... The code is reliable and 15 times faster than stramp on my machine. > Your code is almost always an overcomplicated mess. ... It's rather simple for this kind of improvement. |
Mr Flibble <flibble@reddwarf.jmc.corp>: Dec 02 02:48PM On Fri, 02 Dec 2022 05:30:45 +0100, Bonita Montero wrote: >> too. > There's nothing wrong with my code and I gues you don't even understand > it; you're just a bad programmer. Of course I understand the mess that you think is good code: you are relying on undefined behaviour (buffer overruns and page faults) in the fucktarded belief you can create a faster string comparison function; if we ignore the undefined behaviour there is still an issue: most strings are shorter than a typical page size which is a problem if the length of "str" is less than the length of "sv"; if you cannot see what that problem is then it is YOU who is "just a bad programmer". /Flibble |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 02 06:39PM +0100 Am 02.12.2022 um 15:48 schrieb Mr Flibble: > Of course I understand the mess that you think is good code: you are > relying on undefined behaviour (buffer overruns and page faults) in the > fucktarded belief you can create a faster string comparison function; ... On Windows this style of programming is safe. The speedup I measured is factor 15. > most strings are shorter than a typical page size ... Of course, but I would still get a speedup by comparing eight chars in each chunk. > which is a problem if the length of "str" is less than the length of "sv"; The mistake is the opposite, i.e. if str is longer than sv. But I've fixed that by returning !*str. |
Mr Flibble <flibble@reddwarf.jmc.corp>: Dec 02 06:17PM On Fri, 02 Dec 2022 18:39:51 +0100, Bonita Montero wrote: >> "sv"; > The mistake is the opposite, i.e. if str is longer than sv. > But I've fixed that by returning !*str. No it isn't and you haven't fixed anything: you are a bad programmer indeed. /Flibble |
Bonita Montero <Bonita.Montero@gmail.com>: Dec 02 07:25PM +0100 Am 02.12.2022 um 19:17 schrieb Mr Flibble: > No it isn't and you haven't fixed anything: you are a bad programmer > indeed. You don't understand your bug assumptions. If the length of str is less than that of sv this is detected inside the loop. If it is larger this is detected by looking if the charater inside str at the end-index of sv is zero. Who's the bad programmer here ? |
Mr Flibble <flibble@reddwarf.jmc.corp>: Dec 02 06:50PM On Fri, 02 Dec 2022 19:25:50 +0100, Bonita Montero wrote: > than that of sv this is detected inside the loop. If it is larger this > is detected by looking if the charater inside str at the end-index of sv > is zero. Who's the bad programmer here ? No it isn't. You are the bad programmer here. /Flibble |
wij <wyniijj5@gmail.com>: Dec 02 10:10AM -0800 Using C-like notation: ℙ::= Set of decision problem which can be computed by TM in P-time. ℕℙ::= Set of decision problem q of which the positive answer can be verified by a certificate c. I.e. a decision function v(q,c) yields true in P-time. Spec: Function "bool S(Prog,UInt)" decides whether or not the argument Prog program (equal-powerful as TM) is satisfiable in the range specified by UInt, and the algorithmic steps is bounded by a given polynomial formula P. IOW: S(f,n)==true iff ∃v,x, x<=n, v(f,x) proves f(x)==true (e.g. v simulates f(x)) and Time(v(f,x))<=P(|f|+|x|) proves a P-time certificate. From the specification, Problem(S)∈ℕℙ (precisely, ℕℙℂ). S can be implemented by a DTM. And, a Prog u of which the function is defined as follow will also exist (The implement of u is very different. The implement of S and u as existence proof are omitted, because they are supposedly understood immediately): bool u(UInt x) { return !S(u,x); } From Liar's paradox and the HP proof, u is a never-terminating (undecidable) instance. If S(u,n) computes in greater than P time, the answer S(u,n)==false is fine. But, if S is in P time, u(x) can be in P time, then, the only condition for S to return false in this case is u(x)==false which will not happen. P-time S is unimplementable. Therefore, Problem(S)∉ℙ, ℙ≠ℕℙ. QED. |
Mr Flibble <flibble@reddwarf.jmc.corp>: Dec 02 04:41PM https://www.tiktok.com/@looneytony4/video/7166638178647035137? is_from_webapp=v1&item_id=7166638178647035137 |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Dec 02 12:43AM On Wed, 30 Nov 2022 05:46:54 -0800 (PST) > raised -- but there are major restrictions on what can happen inside a > signal handler. > Does anyone know how this works? I don't know about windows, but wxWidgets when running on unix-like systems uses glib/GTK+ as the underlying toolkit, and for modal dialogs of the kind you have described that generally means using recursive main loop invocations (in other words, recursive blocking calls to gtk_main()) which has the effect of multiplexing the single-threaded event loop: see https://docs.gtk.org/glib/main-loop.html. All GTK+ events (including I believe all GDK rendering) execute in the main loop thread: GTK+ itself is single threaded (GTK+2 did make some limited provision for it to run under multiple threads but this was deprecated in GTK+3 and removed in GTK4). However glib/gio provides mechanisms for launching tasks on worker threads, which when complete will post an event to the main loop so that its continuation runs on the main loop thread. Some asynchronous gio i/o operates in this way. I presume (but do not know) that wxWidgets operates in a similar way when using windows as it backend. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment