- Does std::regex need to be so large? - 12 Updates
- incompatible types in assignment - 6 Updates
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 21 05:30PM +0200 On 21.08.2020 16:37, Juha Nieminen wrote: > I have found std::regex to be surprisingly useful and handy > for many situations. Yes yes. Unfortunately they're going to deprecate it, with the aim of replacing it with something more Unicode-aware. > [snip] - Alf |
jacobnavia <jacob@jacob.remcomp.fr>: Aug 21 07:20PM +0200 Le 21/08/2020 à 17:30, Alf P. Steinbach a écrit : > Yes yes. > Unfortunately they're going to deprecate it, with the aim of replacing > it with something more Unicode-aware. GREAT! Then, it won't be 250K but 500K! It will pull all Unicode librairies and maybve a few other (icu, etc). toupper, tolower, etc aren't very simple in chinese. You can't stop progress. |
Bonita Montero <Bonita.Montero@gmail.com>: Aug 21 07:25PM +0200 > Then, it won't be 250K but 500K! It will pull all Unicode librairies and > maybve a few other (icu, etc). toupper, tolower, etc aren't very simple > in chinese. The code-size doesn't matter. |
Paavo Helde <eesnimi@osa.pri.ee>: Aug 21 08:50PM +0300 21.08.2020 17:37 Juha Nieminen kirjutas: > I have found std::regex to be surprisingly useful and handy > for many situations. Well, if it were a small trivial piece of software doing some trivial thing only, it would not be so useful. And regex is far from trivial. > stripping all debug info from it), and increase the compilation time > of that one source file by several seconds (eg. about 2.5 seconds > in this computer I'm using.) Things depend on the perspective. For example, we have got a third-party library which basically does nothing except trivial data copy and which takes ages to build and ca 30 MB in final executable. Everybody seems to be happy with it, makes our products more enterprisy I guess. Regex is also there in our product, but nobody has bothered to measure its size. > (in this case) is about 16 kB, that's quite a huge chunk. > Does std::regex really need to be that big? Is there really no way > for them to optimize it to be smaller and faster to compile? Google tells me there are partial regex libraries optimized for size which take 3 kB, so I'm sure a full regex could also be done in much less than 100 kB. However, I guess nobody cares, 100 kB is less than 0.001% of a typical desktop RAM. C++ libraries tend to be optimized for runtime speed nowadays. If this conflicts with size or compilation speed the latters will suffer. |
Jorgen Grahn <grahn+nntp@snipabacken.se>: Aug 21 07:15PM On Fri, 2020-08-21, Juha Nieminen wrote: > (in this case) is about 16 kB, that's quite a huge chunk. > Does std::regex really need to be that big? Is there really no way > for them to optimize it to be smaller and faster to compile? I don't have the answer, but I share your observations. I used it as a convenience in one of my programs, but when the (small) binary doubled in size and compilation time was more than doubled, I got angry and wrote my own manual parser for whatever it was. I thought maybe it's a quality of implementation thing which will improve over time. I also briefly wondered if Boost.Regex would have had the same effect. I know that POSIX regexec(3) doesn't. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
Cholo Lennon <chololennon@hotmail.com>: Aug 21 05:00PM -0300 On 21/8/20 11:37, Juha Nieminen wrote: > (in this case) is about 16 kB, that's quite a huge chunk. > Does std::regex really need to be that big? Is there really no way > for them to optimize it to be smaller and faster to compile? I suffered the same problem with clang/llvm outputting WASM. I needed the class to parse some stuff in a smart contract (running inside the EOS blockchain), but my surprise was huge when the size of the contract increased from 40 Kb to 300 Kb, just for using std::regex. In a constrained RAM environment like a blockchain, that was a no no. I ended up using methods from std::string to parse the data. The resulting code was awful, but the size stayed around 40 Kb. -- Cholo Lennon Bs.As. ARG |
Christian Gollwitzer <auriocus@gmx.de>: Aug 21 10:40PM +0200 Am 21.08.20 um 16:37 schrieb Juha Nieminen: > stripping all debug info from it), and increase the compilation time > of that one source file by several seconds (eg. about 2.5 seconds > in this computer I'm using.) There should be ways to make that much smaller. For example, the PCRE library jit-compiles the regex pattern into machine code (see here: https://zherczeg.github.io/sljit/ ). The only reason for JIT is that a typical program gets the pattern from the user input. If, however, there is a static pattern only, you could run this compiler and store the output in the executable - I doubt very much that this will increase the code size by 100kB unless you have extremely complicated patterns. There might still be library functions to add, like isspace() etc. but you'd need those also for handwritten code. Christian |
jacobnavia <jacob@jacob.remcomp.fr>: Aug 21 10:59PM +0200 Le 21/08/2020 à 19:25, Bonita Montero a écrit : > The code-size doesn't matter. THAT'S IT IT! What you do not understand is that it DOES matter. Code size slows down the program because it fills the code cache with irrelevant stuff! It takes time to load it from slow RAM, time to decode it and execute it. And it does the same thing, essentially, as a C compiler does with a few dozens Kbytes of a regular expression library (5636 bytes to be accurate). A factor of 50 for doing WHAT? It is really doing 50 times more? |
jacobnavia <jacob@jacob.remcomp.fr>: Aug 21 11:00PM +0200 Le 21/08/2020 à 19:50, Paavo Helde a écrit : > Well, if it were a small trivial piece of software doing some trivial > thing only, it would not be so useful. And regex is far from trivial. The C equiivalent is 5636 bytes. Is this code doing 50 times more? |
jacobnavia <jacob@jacob.remcomp.fr>: Aug 21 11:01PM +0200 Le 21/08/2020 à 22:00, Cholo Lennon a écrit : > constrained RAM environment like a blockchain, that was a no no. I ended > up using methods from std::string to parse the data. The resulting code > was awful, but the size stayed around 40 Kb. The C library is 5636 bytes. |
Cholo Lennon <chololennon@hotmail.com>: Aug 21 07:53PM -0300 On 8/21/20 6:01 PM, jacobnavia wrote: >> ended up using methods from std::string to parse the data. The >> resulting code was awful, but the size stayed around 40 Kb. > The C library is 5636 bytes. Well, due to the way an EOS smart contract is built/used, introducing external dependencies to EOS SDK (CDT)/C++17 (restricted, not all standard library features are available) is not the ideal (and also the problem was solved using std::string and some library algorithms). But yes, any regex library (especially the one from standard library) would have been better. -- Cholo Lennon Bs.As. ARG |
Lynn McGuire <lynnmcguire5@gmail.com>: Aug 21 06:07PM -0500 On 8/21/2020 10:30 AM, Alf P. Steinbach wrote: > it with something more Unicode-aware. >> [snip] > - Alf Wow, that will be interesting. UTF-8 or UTF-16 ? Thanks, Lynn |
Juha Nieminen <nospam@thanks.invalid>: Aug 21 04:11PM > template< class Enum_wrapper > > auto n_values() -> int { return Enum_wrapper::_; } > Yes? If you are making a switch block for that enumerated type, many compilers, given your normal warning flags, will warn if not all the enumerated names have a case (and there is no default branch). That can be slightly annoying in the case of that "amount" name there. Another issue is that now that "amount" entry becomes a valid value for anything that takes a parameter of that enumerated type. Sometimes that's perfectly ok and can in fact be useful. Othertimes it can be a nuisance because it's not a valid value and you need to add a check for it. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 21 06:20PM +0200 On 21.08.2020 18:11, Juha Nieminen wrote: > the enumerated names have a case (and there is no default branch). > That can be slightly annoying in the case of that "amount" name > there. Yes. I wonder if there is some C++17 or C++20 attribute that can help tell the compiler to shut up? I have a kind of feeling that I've seen that, but I can't remember. > Sometimes that's perfectly ok and can in fact be useful. Othertimes > it can be a nuisance because it's not a valid value and you need > to add a check for it. No. If you need to check for `_` then you need to check for any possible value, including unnamed ones. Presence or not of `_` does not matter for that, but `_` can help the checking. - Alf |
David Brown <david.brown@hesbynett.no>: Aug 21 08:00PM +0200 On 21/08/2020 14:53, Alf P. Steinbach wrote: > template< class Enum_wrapper > > auto n_values() -> int { return Enum_wrapper::_; } > Yes? No. This is just an alternative (though perhaps a convenient one at times) to the more traditional: enum days { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday, noOfDays }; It suffers from the same problem - "noOfDays" is a valid enumeration constant for "days" - syntactically it is a "day", but is not logically a day. Amongst the other disadvantages is that it spoils the very useful -Wswitch and -Wswitch-enum warnings in gcc (and, I think, clang) that check that switches on enum types are complete. |
David Brown <david.brown@hesbynett.no>: Aug 21 08:01PM +0200 On 21/08/2020 17:07, Jorgen Grahn wrote: >> Yes? > Surely he meant one you wouldn't have to define yourself; it would > always be there, with the same name and syntax. Yes, that's what I would like. |
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Aug 21 11:41AM -0700 > Amongst the other disadvantages is that it spoils the very useful > -Wswitch and -Wswitch-enum warnings in gcc (and, I think, clang) that > check that switches on enum types are complete. enum days { Monday, firstday=Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday, lastday=Sunday }; -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Philips Healthcare void Void(void) { Void(); } /* The recursive call of the void */ |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 22 12:40AM +0200 On 21.08.2020 20:00, David Brown wrote: > It suffers from the same problem - "noOfDays" is a valid enumeration > constant for "days" - syntactically it is a "day", but is not logically > a day. That's a problem with using an ordinary name directly as an enumerator. That's why I used `_` as enumerator name, and `n_values` as name used elsewhere. > Amongst the other disadvantages Only one has been mentioned, and that's for your own rewrite. > is that it spoils the very useful > -Wswitch and -Wswitch-enum warnings in gcc (and, I think, clang) that > check that switches on enum types are complete. No, it doesn't. #include <iostream> using std::cout, std::endl; #include <iterator> using std::size; template< class Enum_wrapper > constexpr int n_enum_values = Enum_wrapper::_; using C_str = const char*; [[noreturn]] void ub() { throw; } struct Weekdays{ enum Enum: int{ monday, tuesday, wednesday, thursday, friday, saturday, sunday, _ }; }; void operator++( Weekdays::Enum& value ) { value = static_cast<Weekdays::Enum>( value + 1 ); } auto to_english_via_switch_on( const Weekdays::Enum day ) -> C_str { switch( day ) { case Weekdays::monday: return "Monday"; case Weekdays::tuesday: return "Tuesday"; case Weekdays::wednesday: return "Wednesday"; case Weekdays::thursday: return "Thursday"; case Weekdays::friday: return "Friday"; case Weekdays::saturday: return "Saturday"; #ifndef WARN case Weekdays::sunday: return "Sunday";
Subscribe to:
Post Comments (Atom)
|
No comments:
Post a Comment