soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Does std::regex need to be so large? - 12 Updates
incompatible types in assignment - 6 Updates

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 21 05:30PM +0200

On 21.08.2020 16:37, Juha Nieminen wrote:
> I have found std::regex to be surprisingly useful and handy
> for many situations.

Yes yes.

Unfortunately they're going to deprecate it, with the aim of replacing
it with something more Unicode-aware.

> [snip]

- Alf

jacobnavia <jacob@jacob.remcomp.fr>: Aug 21 07:20PM +0200

Le 21/08/2020 à 17:30, Alf P. Steinbach a écrit :

> Yes yes.

> Unfortunately they're going to deprecate it, with the aim of replacing
> it with something more Unicode-aware.

GREAT!

Then, it won't be 250K but 500K! It will pull all Unicode librairies and
maybve a few other (icu, etc). toupper, tolower, etc aren't very simple
in chinese.

You can't stop progress.

Bonita Montero <Bonita.Montero@gmail.com>: Aug 21 07:25PM +0200

> Then, it won't be 250K but 500K! It will pull all Unicode librairies and
> maybve a few other (icu, etc). toupper, tolower, etc aren't very simple
> in chinese.

The code-size doesn't matter.

Paavo Helde <eesnimi@osa.pri.ee>: Aug 21 08:50PM +0300

21.08.2020 17:37 Juha Nieminen kirjutas:
> I have found std::regex to be surprisingly useful and handy
> for many situations.

Well, if it were a small trivial piece of software doing some trivial
thing only, it would not be so useful. And regex is far from trivial.

> stripping all debug info from it), and increase the compilation time
> of that one source file by several seconds (eg. about 2.5 seconds
> in this computer I'm using.)

Things depend on the perspective. For example, we have got a third-party
library which basically does nothing except trivial data copy and which
takes ages to build and ca 30 MB in final executable. Everybody seems to
be happy with it, makes our products more enterprisy I guess. Regex is
also there in our product, but nobody has bothered to measure its size.

> (in this case) is about 16 kB, that's quite a huge chunk.

> Does std::regex really need to be that big? Is there really no way
> for them to optimize it to be smaller and faster to compile?

Google tells me there are partial regex libraries optimized for size
which take 3 kB, so I'm sure a full regex could also be done in much
less than 100 kB. However, I guess nobody cares, 100 kB is less than
0.001% of a typical desktop RAM.

C++ libraries tend to be optimized for runtime speed nowadays. If this
conflicts with size or compilation speed the latters will suffer.

Jorgen Grahn <grahn+nntp@snipabacken.se>: Aug 21 07:15PM

On Fri, 2020-08-21, Juha Nieminen wrote:
> (in this case) is about 16 kB, that's quite a huge chunk.

> Does std::regex really need to be that big? Is there really no way
> for them to optimize it to be smaller and faster to compile?

I don't have the answer, but I share your observations. I used it as a
convenience in one of my programs, but when the (small) binary doubled
in size and compilation time was more than doubled, I got angry and
wrote my own manual parser for whatever it was.

I thought maybe it's a quality of implementation thing which will
improve over time. I also briefly wondered if Boost.Regex would have
had the same effect. I know that POSIX regexec(3) doesn't.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Cholo Lennon <chololennon@hotmail.com>: Aug 21 05:00PM -0300

On 21/8/20 11:37, Juha Nieminen wrote:
> (in this case) is about 16 kB, that's quite a huge chunk.

> Does std::regex really need to be that big? Is there really no way
> for them to optimize it to be smaller and faster to compile?

I suffered the same problem with clang/llvm outputting WASM. I needed
the class to parse some stuff in a smart contract (running inside the
EOS blockchain), but my surprise was huge when the size of the contract
increased from 40 Kb to 300 Kb, just for using std::regex. In a
constrained RAM environment like a blockchain, that was a no no. I ended
up using methods from std::string to parse the data. The resulting code
was awful, but the size stayed around 40 Kb.

--
Cholo Lennon
Bs.As.
ARG

Christian Gollwitzer <auriocus@gmx.de>: Aug 21 10:40PM +0200

Am 21.08.20 um 16:37 schrieb Juha Nieminen:
> stripping all debug info from it), and increase the compilation time
> of that one source file by several seconds (eg. about 2.5 seconds
> in this computer I'm using.)

There should be ways to make that much smaller. For example, the PCRE
library jit-compiles the regex pattern into machine code (see here:
https://zherczeg.github.io/sljit/ ). The only reason for JIT is that a
typical program gets the pattern from the user input. If, however, there
is a static pattern only, you could run this compiler and store the
output in the executable - I doubt very much that this will increase the
code size by 100kB unless you have extremely complicated patterns. There
might still be library functions to add, like isspace() etc. but you'd
need those also for handwritten code.

Christian

jacobnavia <jacob@jacob.remcomp.fr>: Aug 21 10:59PM +0200

Le 21/08/2020 à 19:25, Bonita Montero a écrit :
> The code-size doesn't matter.

THAT'S IT IT!

What you do not understand is that it DOES matter. Code size slows down
the program because it fills the code cache with irrelevant stuff!

It takes time to load it from slow RAM, time to decode it and execute
it. And it does the same thing, essentially, as a C compiler does with a
few dozens Kbytes of a regular expression library (5636 bytes to be
accurate).

A factor of 50 for doing WHAT?

It is really doing 50 times more?

jacobnavia <jacob@jacob.remcomp.fr>: Aug 21 11:00PM +0200

Le 21/08/2020 à 19:50, Paavo Helde a écrit :
> Well, if it were a small trivial piece of software doing some trivial
> thing only, it would not be so useful. And regex is far from trivial.

The C equiivalent is 5636 bytes. Is this code doing 50 times more?

jacobnavia <jacob@jacob.remcomp.fr>: Aug 21 11:01PM +0200

Le 21/08/2020 à 22:00, Cholo Lennon a écrit :
> constrained RAM environment like a blockchain, that was a no no. I ended
> up using methods from std::string to parse the data. The resulting code
> was awful, but the size stayed around 40 Kb.

The C library is 5636 bytes.

Cholo Lennon <chololennon@hotmail.com>: Aug 21 07:53PM -0300

On 8/21/20 6:01 PM, jacobnavia wrote:
>> ended up using methods from std::string to parse the data. The
>> resulting code was awful, but the size stayed around 40 Kb.

> The C library is 5636 bytes.

Well, due to the way an EOS smart contract is built/used, introducing
external dependencies to EOS SDK (CDT)/C++17 (restricted, not all
standard library features are available) is not the ideal (and also the
problem was solved using std::string and some library algorithms). But
yes, any regex library (especially the one from standard library) would
have been better.

--
Cholo Lennon
Bs.As.
ARG

Lynn McGuire <lynnmcguire5@gmail.com>: Aug 21 06:07PM -0500

On 8/21/2020 10:30 AM, Alf P. Steinbach wrote:
> it with something more Unicode-aware.

>> [snip]

> - Alf

Wow, that will be interesting. UTF-8 or UTF-16 ?

Thanks,
Lynn

incompatible types in assignment

Juha Nieminen <nospam@thanks.invalid>: Aug 21 04:11PM

> template< class Enum_wrapper >
> auto n_values() -> int { return Enum_wrapper::_; }

> Yes?

If you are making a switch block for that enumerated type, many
compilers, given your normal warning flags, will warn if not all
the enumerated names have a case (and there is no default branch).
That can be slightly annoying in the case of that "amount" name
there.

Another issue is that now that "amount" entry becomes a valid
value for anything that takes a parameter of that enumerated type.
Sometimes that's perfectly ok and can in fact be useful. Othertimes
it can be a nuisance because it's not a valid value and you need
to add a check for it.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 21 06:20PM +0200

On 21.08.2020 18:11, Juha Nieminen wrote:
> the enumerated names have a case (and there is no default branch).
> That can be slightly annoying in the case of that "amount" name
> there.

Yes.

I wonder if there is some C++17 or C++20 attribute that can help tell
the compiler to shut up?

I have a kind of feeling that I've seen that, but I can't remember.

> Sometimes that's perfectly ok and can in fact be useful. Othertimes
> it can be a nuisance because it's not a valid value and you need
> to add a check for it.

No. If you need to check for `_` then you need to check for any possible
value, including unnamed ones. Presence or not of `_` does not matter
for that, but `_` can help the checking.

- Alf

David Brown <david.brown@hesbynett.no>: Aug 21 08:00PM +0200

On 21/08/2020 14:53, Alf P. Steinbach wrote:

> template< class Enum_wrapper >
> auto n_values() -> int { return Enum_wrapper::_; }

> Yes?

No.

This is just an alternative (though perhaps a convenient one at times)
to the more traditional:

enum days {
Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday,
noOfDays
};

It suffers from the same problem - "noOfDays" is a valid enumeration
constant for "days" - syntactically it is a "day", but is not logically
a day.

Amongst the other disadvantages is that it spoils the very useful
-Wswitch and -Wswitch-enum warnings in gcc (and, I think, clang) that
check that switches on enum types are complete.

David Brown <david.brown@hesbynett.no>: Aug 21 08:01PM +0200

On 21/08/2020 17:07, Jorgen Grahn wrote:

>> Yes?

> Surely he meant one you wouldn't have to define yourself; it would
> always be there, with the same name and syntax.

Yes, that's what I would like.

Keith Thompson <Keith.S.Thompson+u@gmail.com>: Aug 21 11:41AM -0700

> Amongst the other disadvantages is that it spoils the very useful
> -Wswitch and -Wswitch-enum warnings in gcc (and, I think, clang) that
> check that switches on enum types are complete.

enum days {
Monday, firstday=Monday,
Tuesday,
Wednesday,
Thursday,
Friday,
Saturday,
Sunday, lastday=Sunday
};

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Aug 22 12:40AM +0200

On 21.08.2020 20:00, David Brown wrote:

> It suffers from the same problem - "noOfDays" is a valid enumeration
> constant for "days" - syntactically it is a "day", but is not logically
> a day.

That's a problem with using an ordinary name directly as an enumerator.

That's why I used `_` as enumerator name, and `n_values` as name used
elsewhere.

> Amongst the other disadvantages

Only one has been mentioned, and that's for your own rewrite.

> is that it spoils the very useful
> -Wswitch and -Wswitch-enum warnings in gcc (and, I think, clang) that
> check that switches on enum types are complete.

No, it doesn't.

#include <iostream>
using std::cout, std::endl;

#include <iterator>
using std::size;

template< class Enum_wrapper >
constexpr int n_enum_values = Enum_wrapper::_;

using C_str = const char*;

[[noreturn]] void ub() { throw; }

struct Weekdays{ enum Enum: int{
monday, tuesday, wednesday, thursday, friday, saturday, sunday, _
}; };

void operator++( Weekdays::Enum& value )
{
value = static_cast<Weekdays::Enum>( value + 1 );
}

auto to_english_via_switch_on( const Weekdays::Enum day )
-> C_str
{
switch( day ) {
case Weekdays::monday: return "Monday";
case Weekdays::tuesday: return "Tuesday";
case Weekdays::wednesday: return "Wednesday";
case Weekdays::thursday: return "Thursday";
case Weekdays::friday: return "Friday";
case Weekdays::saturday: return "Saturday";
#ifndef WARN
case Weekdays::sunday: return "Sunday";

soft and program

Friday, August 21, 2020

Digest for comp.lang.c++@googlegroups.com - 18 updates in 2 topics

No comments:

Blog Archive

About Me