Bonita Montero <Bonita.Montero@gmail.com>: Jan 28 02:07PM +0100
Am 28.01.2024 um 11:32 schrieb Marcel Mueller: > Reverse memory access is typically slower simply because the > last data of a cache line (after a cache miss) arrives at last. I tested this and for all offsets within a cacheline I get thes same timing for all three of my computers: #include <iostream> #include <vector> #include <chrono> #include <atomic> using namespace std; using namespace chrono; #if defined(__cpp_lib_hardware_interference_size) constexpr size_t CL_SIZE = hardware_constructive_interference_size; #else constexpr size_t CL_SIZE = 64;
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 26 05:15PM -0800
On 1/25/2024 8:08 PM, Bonita Montero wrote: >> [...] >> Just make sure to take the time to model it in a race detector. > Idiot ... Sigh. I don't have the time to look over your code and find any potential issues right now. I will wait for one of your infamous corrections instead. At least if you said here are some test units and they pass, well, that would be a good sign, right? :^) | "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 26 07:23PM -0800
On 1/25/2024 8:08 PM, Bonita Montero wrote: >> [...] >> Just make sure to take the time to model it in a race detector. > Idiot ... Don't be ashamed of creating a test unit. If it find any errors, just correct them, right? Notice how I formulated my xchg algortihm in a test unit first! https://groups.google.com/g/comp.lang.c++/c/Skv1PoQsUZo/m/bZoTXWDkAAAJ No shame in that! Right? :^) | "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 26 07:24PM -0800
On 1/26/2024 7:23 PM, Chris M. Thomasson wrote: > unit first! > https://groups.google.com/g/comp.lang.c++/c/Skv1PoQsUZo/m/bZoTXWDkAAAJ > No shame in that! Right? :^) Give it a go? https://github.com/dvyukov/relacy | Bonita Montero <Bonita.Montero@gmail.com>: Jan 27 09:38AM +0100
Am 25.01.2024 um 20:31 schrieb Chris M. Thomasson: >> This is the implementation > [...] > Just make sure to take the time to model it in a race detector. The synchronization part is trivial. It's the state the synchronization manages that is complex. | "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 27 01:05PM -0800
On 1/25/2024 9:25 AM, Bonita Montero wrote: > #include <functional> > #include <chrono> > struct thread_pool [...] > #pragma clang diagnostic ignored "-Wparentheses" > #pragma clang diagnostic ignored "-Wunqualified-std-cast-call" >
Bonita Montero <Bonita.Montero@gmail.com>: Jan 26 05:08AM +0100
Am 25.01.2024 um 20:31 schrieb Chris M. Thomasson: >> This is the implementation > [...] > Just make sure to take the time to model it in a race detector. Idiot ... |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 25 06:25PM +0100
Once I've written a thread pool that has an upper limit of the number threads and a timeout when idle threads end theirselfes. If you have sth userpace CPU bound you'd specify the number of hardware-threads as the upper limit, if you have much threads doing I/O you may go far beyond since the hardware-threads aren't fully occupied anyway. The problem with my initial thread pool class was that there may be a large number of idle threads which could be used by other pools. So I wrote a thread pool class where each pool has an upper limit of the number of executing threads and there are no idle threads within each pool. Instead the threads go idling in a global singleton pool and attach to each pool which needs a new thread, thereby minimizing the total number of threads. This is the implementation // header #pragma once #include <thread> #include <mutex> #include <condition_variable> #include <deque> #include <functional> #include <chrono> struct thread_pool { using void_fn = std::function<void ()>; thread_pool( size_t maxThreads = 0 ); thread_pool( thread_pool const & ) = delete; void operator =( thread_pool const & ) = delete; ~thread_pool(); uint64_t enqueue_task( void_fn &&task ); void_fn cancel( uint64_t queueId ); void wait_idle(); size_t max_threads(); size_t resize( size_t maxThreads ); bool clear_queue(); void_fn idle_callback( void_fn &&fn = {} ); std::pair<size_t, size_t> processing(); static typename std::chrono::milliseconds timeout( std::chrono::milliseconds timeout ); private: struct idle_node { idle_node *next; bool notify; }; using queue_item = std::pair<uint64_t, void_fn>; using task_queue_t = std::deque<queue_item>; bool m_quit; size_t m_maxThreads, m_nThreadsExecuting; uint64_t m_lastIdleQueueId, m_nextQueueId; task_queue_t m_queue; std::condition_variable m_idleCv; std::shared_ptr<void_fn> m_idleCallback; idle_node *m_idleList; inline static struct global_t { std::mutex m_mtx; std::chrono::milliseconds m_timeout = std::chrono::seconds( 1 ); std::condition_variable m_cv, m_quitCv; bool m_quit; size_t m_nThreads, m_nThreadsActive; std::deque<thread_pool *> m_initiate; void theThread(); global_t(); ~global_t(); } global; void processIdle( std::unique_lock<std::mutex> &lock ); std::unique_lock<std::mutex> waitIdle(); }; // translation unit #include <cassert> #include "thread_pool.h" #include "invoke_on_destruct.h" #if defined(_WIN32) #pragma warning(disable: 26110) // Caller failing to hold lock 'lock' before calling function 'func'. #pragma warning(disable: 26111) // Caller failing to release lock 'lock' before calling function 'func'. #pragma warning(disable: 26115) // Failing to release lock 'lock' in function 'func'. #pragma warning(disable: 26117) // Releasing unheld lock 'lock' in function 'func'. #pragma warning(disable: 26800) // Use of a moved from object: 'object'.
immibis <news@immibis.com>: Jan 22 01:22AM +0100
On 1/19/24 19:17, Malcolm McLean wrote: > be the same structures or incompatible structures. > But a simple standardisation would mean the end of pointless editing of > code just to conform to whatever the host program has chosen. And what should be the data type of the coefficients of the vector? And what should? Why not also have matrices? What is the maximum dimension supported? Are homogeneous coordinates a built-in feature? No, leave the graphics stuff to a graphics team. | Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Jan 22 11:16AM
On 22/01/2024 00:22, immibis wrote: > what should? Why not also have matrices? What is the maximum dimension > supported? Are homogeneous coordinates a built-in feature? No, leave the > graphics stuff to a graphics team. It should take a template, so any type can be used for the coefficients. Unless you have some weird and wonderful ideas, it will of course be scalar. I'd recommend a 2D with x and y and a 3D with x, y and z. Humanity is not going to be elevated to a higher dimension any time soon. No homogenous co-ordinates. No angle / magnitude notation. No need for matrices because we already have a natural representation of the these, since C++ supports 2 dimensional fixed size array. Needing to store points in 2D or 3D space is a common requirement, and code needs to communicate with other modules. One of which will be the graphics system, which may well have requirements beyond simple points in space, but will include such a requirement. -- Check out Basic Algorithms and my other books: https://www.lulu.com/spotlight/bgy1mm | Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Jan 22 11:22AM
On 21/01/2024 04:06, Kaz Kylheku wrote: > And that's just > [ c -d ] [ a ] = [ ca - db ] > [ d c ] [ b ] = [ da + cb ] Yes I know. I did complex numbers at high school. But whilst you could use the Argand plane as your graphics surface and thus represent all points as complex numbers, I've never actually seen anyone do so, and the axes are always given different labels. Except of course in Mandelbrots or other programs concerned with complex numbers themselves. -- Check out Basic Algorithms and my other books: https://www.lulu.com/spotlight/bgy1mm | "Fred. Zwarts" <F.Zwarts@HetNet.nl>: Jan 22 12:34PM +0100
Op 22.jan.2024 om 12:16 schreef Malcolm McLean: > code needs to communicate with other modules. One of which will be the > graphics system, which may well have requirements beyond simple points > in space, but will include such a requirement. According to Einstein, humanity lives already in a four dimensional space; time is the fourth dimension. There are many problems in physics and other fields with even more than 4 dimensions, so it would be short-sighted to limit the library to 3 dimensions. In addition one could ask how far the standard library must go. What operations must be supported? Calculate the length of a vector, allowing non-Euclidian spaces? | Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Jan 22 12:31PM
On 22/01/2024 11:34, Fred. Zwarts wrote: > In addition one could ask how far the standard library must go. What > operations must be supported? Calculate the length of a vector, allowing > non-Euclidian spaces? No-one is saying that you can't devise your own structures if you want to write programs to solve problems in general relativity. The idea is to have a common standard for the common requirement to represent pints and vectors in 2d and 3d spaces, so that routines writen in C++ can communicate with each other without the need for adapter code or rewriting. However having decided on a representation for points, there is also a very strong case for a standard library for basic operations on those points, such as taking the length of a vector. However probably not non-Euclidian spaces. Again, some people will want to write software that operates in Hilbert space or other non-Euclidean space, but it's likely to be specialised, and so you can't expect much support from the standard library. -- Check out Basic Algorithms and my other books: https://www.lulu.com/spotlight/bgy1mm | "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 22 12:22PM -0800
On 1/22/2024 3:22 AM, Malcolm McLean wrote: > anyone do so, and the axes are always given different labels. Except of > course in Mandelbrots or other programs concerned with complex numbers > themselves. Usually a vector, say 2-ary (x, y), x is the horizontal axis and y is the vertical axis. This matches a complex number x + yi: +y | -x--0--+x | -y x is real, y is imaginary. :^) | "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 22 12:24PM -0800
On 1/22/2024 3:16 AM, Malcolm McLean wrote: > code needs to communicate with other modules. One of which will be the > graphics system, which may well have requirements beyond simple points > in space, but will include such a requirement. And 4-ary with (x, y, z, w) Again I am quite fond of the GLM library. It's just nice to me. |
wij <wyniijj5@gmail.com>: Jan 21 06:22AM +0800
On Tue, 2024-01-16 at 21:29 +0000, bubu wrote: > ? > Sorry. I have a lot of problems. I will have others questions after. > Thanks a lot. Check out this site. They are willingly to answer questions about Qt. ttps://www.qtcentre.org/content/ | Malcolm McLean <malcolm.arthur.mclean@gmail.com>: Jan 20 01:59PM
On 19/01/2024 18:35, Kaz Kylheku wrote: >> representation of a point or a vector. Whilst generally it's just a POD >> structure with x and y members, the name varies, and sometimes the > For code working with 2D vectors, designers should consider complex numbers. That's a nice idea. But I've never seen code where the horizontal axis is "real" and the vertical "imaginary", except of course in code designed to demonstrate complex numbers as such. Mandelbrot is my favourite test program when getting a new system. -- Check out Basic Algorithms and my other books: https://www.lulu.com/spotlight/bgy1mm | "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 20 11:22AM -0800
On 1/20/2024 5:59 AM, Malcolm McLean wrote: > vertical "imaginary", except of course in code designed to demonstrate > complex numbers as such. Mandelbrot is my favourite test program when > getting a new system. Same here! :^D |
Lynn McGuire <lynnmcguire5@gmail.com>: Jan 16 04:03PM -0600
"We are doomed" https://www.carette.xyz/posts/we_are_doomed/ "The only system with a good software compatibility that I know is Windows, and this explains a ton of things keeping very old UI/UX frameworks, software and APIs to run, for example, Windows 95 compatible games like "Roller Coaster Tycoon"." "Otherwise, you are doomed." The C++ committee has screwed up and continues to screw up by not creating a graphics standard for C++. Lynn | bubu <bruno.donati@hotmail.fr>: Jan 16 08:52PM
Hi, Sorry for my bad english and sorry for my bad level in qt. I would like to use a qml program and c++ libraries (with import). I use Visual Studio on windows. I have several problems (and I put an example here-after) : - how to use c++ with qml - how to use a library written in c++ to use in a program in QML with IMPORT statement. - how to build executable and librarie with cmake? My example is here : main.qml ApplicationWindow { visible: true width: 400 height: 300 title: "Calculator" Calculatrice { id: calculatrice } Column { anchors.centerIn: parent spacing: 10 TextField { id: input1 placeholderText: "Entrez le premier nombre" validator: DoubleValidator { bottom: -1000000000.0 top: 1000000000.0 } } TextField { id: input2 placeholderText: "Entrez le deuxième nombre" validator: DoubleValidator { bottom: -1000000000.0 top: 1000000000.0 } } Row { spacing: 10 Button { text: "Additionner" onClicked: { resultLabel.text = "Résultat: " + calculator.add(parseFloat(input1.text), parseFloat(input2.text)) } } Button { text: "Soustraire" onClicked: { resultLabel.text = "Résultat: " + calculator.subtract(parseFloat(input1.text), parseFloat(input2.text)) } } Button { text: "Multiplier" onClicked: { resultLabel.text = "Résultat: " + calculator.multiply(parseFloat(input1.text), parseFloat(input2.text)) } } } Label { id: resultLabel text: "Résultat: " } } } main.cpp #include <QGuiApplication>#include <QQmlApplicationEngine>#include <QtCore>//#include "calculator.cpp"int main(int argc, char* argv[]){ QCoreApplication::setAttribute(Qt::AA_EnableHighDpiScaling); QGuiApplication app(argc, argv); QQmlApplicationEngine engine; //qmlRegisterType<Calculator>("Calculator", 1, 0, "Calculator"); const QUrl url(QStringLiteral("qrc:/main.qml")); QObject::connect(&engine, &QQmlApplicationEngine::objectCreated, &app, [url](QObject* obj, const QUrl& objUrl) { if (!obj && url == objUrl) QCoreApplication::exit(-1); }, Qt::QueuedConnection); engine.load(url); return app.exec();} calculator.h // calculator.h#ifndef CALCULATOR_H#define CALCULATOR_Hclass Calculator {public: double add(double a, double b) const; double subtract(double a, double b) const; double multiply(double a, double b) const;};
Tim Rentsch <tr.17687@z991.linuxsc.com>: Jan 13 09:31PM -0800
>> Does that all make sense? > Right now, no. But that's me. I'll flag it to read again when I've > had a better night's sleep. I'm posting to nudge you into looking at this again, if you haven't already. I have now had a chance to get your source and run some comparisons. A program along the lines I outlined can run much faster than the code you posted (as well as needing less memory). A good target is to find all primes less than 1e11, which needs less than 4gB of ram. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 09 11:33PM -0800
On 1/8/2024 5:14 PM, red floyd wrote: >>> Absolutely not, not with four way associativity. >> Whatever you say; Sigh. I am done with this. > Intel just needs to call Bonita whenever they have an issue. Okay. You just made me laugh so hard I started to cough a bit! Wow. Cleaned out the pipes, so to speak. Thanks. ROFL! Cough... :^D |
Bonita Montero <Bonita.Montero@gmail.com>: Jan 08 06:48AM +0100
Am 07.01.2024 um 21:46 schrieb Chris M. Thomasson: > I know that they had a problem and the provided workaround from Intel > really did help out. ... Absolutely not, not with four way associativity. | "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 08 12:18PM -0800
On 1/7/2024 9:48 PM, Bonita Montero wrote: >> I know that they had a problem and the provided workaround from Intel >> really did help out. ... > Absolutely not, not with four way associativity. Whatever you say; Sigh. I am done with this. |
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 05 07:21PM -0800
On 1/3/2024 7:37 PM, Bonita Montero wrote: > The Pentium 4's L1 data cache is between 16 and 32kB, so there > can't be a 64kB aliasing. And aliasing can be only on a set basis > and the sets are 4kB or 8kB large. Are you trying to tell me that the aliasing problem on those older Intel hyperthreaded processors and the workaround (from Intel) was a myth? lol. ;^) | Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 08:18AM +0100
Am 06.01.2024 um 04:21 schrieb Chris M. Thomasson: > Are you trying to tell me that the aliasing problem on those older Intel > hyperthreaded processors and the workaround (from Intel) was a myth? > lol. ;^) Intel just made a nerd-suggestion. With four-way associativity there's no frequent aliasing problem in the L1 data dache of Pentium 4. | Kaz Kylheku <433-929-6894@kylheku.com>: Jan 06 08:31AM
> Intel just made a nerd-suggestion. With four-way associativity > there's no frequent aliasing problem in the L1 data dache of > Pentium 4. I think the L1 cache was 8K on that thing, and the blocks are 32 bytes. I think how it works on the P4 is that the address is structured is like this: 31 11 10 5 4 0 | | | | | | [ 21 bit tag ] [ 6 bit cache set ] [ 5 bit offset into 32 bit block ] Thus say we have an area of the stack with the address range nnnnFF80 to nnnnFFFF (128 bytes, 4 x 32 byte cache blocks). These four blocks all map to the same set: they have the same six bits in the "cache set" part of the address. So if a thread is accessing something in all four blocks, it will completely use that cache set, all by itself. If any other thread has a similar block in its stack, with the same cache set ID, it will cause evictions against this thread. Sure, if each of these threads confines itself to working with just one cacheline-sized aperture of the stack, it looks better. You're forgetting that the sets are very small and that groups of adjacent four 32 byte blocks map to the same set. Touch four adjacent cache blocks that are aligned on a 128 byte boundary, and you have hit full occupancy in the cache set corresponding to that block! (I suspect the references to 64K should not be kilobytes but sets. The 8K cache has 64 sets.) In memory, 128 byte blocks that is aligned maps to, and precisely covers a cache set. If two such blocks addresses that are equal modulo 8K, they collide to the same cache set. If one of those blocks is fully present in the cache, the other must be fully evicted. It's really easy to see how things can go south under hyperthreading. If two hyperthreads are working with clashing 128 byte areas that each want to hog the same cache set, and the core is switching between them on a fine-grained basis, ... you get the picture. It's very easy for the memory mapping allocations used for thread stacks to produce addresses such tha the delta between them is a multiple of 8K. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca NOTE: If you use Google Groups, I don't see you, unless you're whitelisted. | Bonita Montero <Bonita.Montero@gmail.com>: Jan 06 10:30AM +0100
Am 06.01.2024 um 09:31 schrieb Kaz Kylheku: > It's very easy for the memory mapping allocations used for thread > stacks to produce addresses such tha the delta between them is a > multiple of 8K. Of course it's easy to intentionally provoke frequent aliasing with the P4's L1 cache, but actually this doesn't happen often. | "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 06 01:15PM -0800
On 1/6/2024 1:30 AM, Bonita Montero wrote: >> multiple of 8K. > Of course it's easy to intentionally provoke frequent aliasing > with the P4's L1 cache, but actually this doesn't happen often. Fwiw, some people were complaining about bad performance using hyperthreading. Turning it off in bios improved performance. Hence the paper was written to show them how to vastly improve performance when hyperthreading was turned on. You call it nerd stuff, and I still cannot figure out why? | "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>: Jan 06 01:19PM -0800
On 1/6/2024 1:15 PM, Chris M. Thomasson wrote: > paper was written to show them how to vastly improve performance when > hyperthreading was turned on. You call it nerd stuff, and I still cannot > figure out why? Humm... I can see it know. Bonita works for Intel and received the complaints... Bonita says shut up you stupid nerds! Humm... ;^o |
| | | |