soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

is there a C++ version of the strtok() function? - 17 Updates
string_view problem - 6 Updates
"Simplify Your Code With Rocket Science: C++20's Spaceship Operator" - 1 Update
Undefined Behaviour - 1 Update

is there a C++ version of the strtok() function?

alexo <alelvb@inwind.it>: Jun 28 07:14PM +0200

Hello,

I would like to improve a chemical formula parser that I wrote from
scratch without using tokenizing functions that correctly handles the
formula:

Fe4[Fe(CN)6]3*6H2O

but not the following:

[Be(N(CH3)2)2]3

my asking is:
is there a purely C++ function that behaves like the C strtok() ?

thank you

alessandro

Thiago Adams <thiago.adams@gmail.com>: Jun 28 10:33AM -0700

On Friday, June 28, 2019 at 2:15:09 PM UTC-3, alexo wrote:

> my asking is:
> is there a purely C++ function that behaves like the C strtok() ?

> thank you

Maybe std::regex can help you in the way you want to do.
https://en.cppreference.com/w/cpp/regex

I don't remember how chemical formulas are expressed, but
I believe the best thing to do is write you own tokenizer
and parser.
If strtok was helping you, that means that the tokenizer
you need to do is simple.

scott@slp53.sl.home (Scott Lurndal): Jun 28 05:41PM

>[Be(N(CH3)2)2]3

>my asking is:
>is there a purely C++ function that behaves like the C strtok() ?

If it works, why "improve" it? strtok is perfectly legal C++.

Manfred <noname@add.invalid>: Jun 28 07:47PM +0200

On 6/28/2019 7:14 PM, alexo wrote:

> [Be(N(CH3)2)2]3

> my asking is:
> is there a purely C++ function that behaves like the C strtok() ?

If you want a function that behaves like strtok, why not use strtok
itself? This, like all C standard functions, is allowed in C++.
Besides, I am pretty sure that the (pure) C++ standard library does not
include a function that behave identically to a C standard function.

That said, from what you are trying to achieve probably strtok is not
the best tokenizer for the purpose - most notably it does not handle
nesting and paired (open/close) parentheses by itself, not to mention
that it overwrites delimiters with 0's.

A long time ago I wrote a math expression parser, but that was pure C,
and not using strtok either.

Looking at C++ and the kind of problem, you probably won't be best off
with a /function/, possibly you may use some combination of string_view
with some recursive logic.
Others may give more detailed hints.

Bonita Montero <Bonita.Montero@gmail.com>: Jun 28 07:54PM +0200

> Maybe std::regex can help you in the way you want to do.
> https://en.cppreference.com/w/cpp/regex

This will work also, but I'm sure that's not nearly as fast
as strtok().

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 08:17PM +0200

On 28.06.2019 19:14, alexo wrote:

> [Be(N(CH3)2)2]3

> my asking is:
> is there a purely C++ function that behaves like the C strtok() ?

The requirements are not clear.

But the main problem with `strtok` is that it isn't thread safe.

If thread safety isn't a concern then just (continue to) use it.

Otherwise, consider either a regular expression (standard library
solution) or a parsing framework like Boost Spirit (3rd party library).

Cheers & hth.,

- Alf

alexo <alelvb@inwind.it>: Jun 28 09:11PM +0200

Il 28/06/19 20:17, Alf P. Steinbach ha scritto:

>> my asking is:
>> is there a purely C++ function that behaves like the C strtok() ?

> The requirements are not clear.

what is not clear in my question? I was wondering if thre exists
a C++ std function that replaces strtok.

> But the main problem with `strtok` is that it isn't thread safe.

I don't need threads, so strtok is ok.

thank you

James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 28 12:13PM -0700

On Friday, June 28, 2019 at 2:17:16 PM UTC-4, Alf P. Steinbach wrote:
> On 28.06.2019 19:14, alexo wrote:
...
> > is there a purely C++ function that behaves like the C strtok() ?

> The requirements are not clear.

> But the main problem with `strtok` is that it isn't thread safe.

If an implementation pre#defines __STDC_LIB_EXT1__, you can use
std::strtok_s(), declared in <cstring>, which is thread safe.

Bonita Montero <Bonita.Montero@gmail.com>: Jun 28 09:22PM +0200

> If an implementation pre#defines __STDC_LIB_EXT1__, you can use
> std::strtok_s(), declared in <cstring>, which is thread safe.

Why they don't simply re-specify strtok() for newer language
-versions with internal buffers which are thread-local?

Christian Gollwitzer <auriocus@gmx.de>: Jun 28 09:20PM +0200

Am 28.06.19 um 19:33 schrieb Thiago Adams:

>> [Be(N(CH3)2)2]3

> Maybe std::regex can help you in the way you want to do.
> https://en.cppreference.com/w/cpp/regex

A regex cannot express this grammar. Simple proof: There are nested
parentheses and for that you need a stack automaton. But generally using
a parser generator might be good advice. There are many to choose from,
I like PEG grammars. There is https://github.com/yhirose/cpp-peglib for
example.

Christian

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 09:26PM +0200

On 28.06.2019 21:13, James Kuyper wrote:

>> But the main problem with `strtok` is that it isn't thread safe.

> If an implementation pre#defines __STDC_LIB_EXT1__, you can use
> std::strtok_s(), declared in <cstring>, which is thread safe.

This was news to me, and I'm unable to find much info about it.

What I've found, scattered here & there in discussions, is that

* Microsoft submitted their silly *_s bounds checking functions for
standardization in C.
* The C committee fixed the worst problems and created a technical
report, TR 24731-1.
* That TR was included as normative annex K in C11.
* To use it you're apparently supposed to #define __STDC_WANT_LIB_EXT1__
as 1 before including any C library header.

I don't have the C11 standard, unfortunately.

And nothing I've found indicates a connection with threading?

Cheers!,

- Alf

James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 28 12:37PM -0700

On Friday, June 28, 2019 at 1:15:09 PM UTC-4, alexo wrote:

> [Be(N(CH3)2)2]3

> my asking is:
> is there a purely C++ function that behaves like the C strtok() ?

Yes. It is declared in <cstring>, and it's called std::strtok().

If the length of the string which is the second argument to your
strtok() calls is 1, you might want to look into std::getline<>(),
declared in <string>, which takes an argument which is a delimiter
character.

alexo <alelvb@inwind.it>: Jun 28 09:45PM +0200

Il 28/06/19 19:47, Manfred ha scritto:
> the best tokenizer for the purpose - most notably it does not handle
> nesting and paired (open/close) parentheses by itself, not to mention
> that it overwrites delimiters with 0's.

I thought it could help, but if I use something like this:

tokens = strtok("Na[Fe(CN)6]", "()[]*");

I get: Na Fe CN 6

that is a correct but useless decomposition, because as you stated, I
can't match the '6' referring to both the 'Fe' and the 'CN' group.

> with a /function/, possibly you may use some combination of string_view
> with some recursive logic.
> Others may give more detailed hints.

The program that I've written uses a 'manual' jump from an open
parentheses to the corresponding closing, but can't handle trickier
formulas.

for example:

[Be(N(CH3)2)2]3

in my program is seen as having:

3 Be atoms -> correct
6 N atoms -> correct
1 C atom -> it should be 12
3 H atoms -> it should be 36

thank you,
alessandro

scott@slp53.sl.home (Scott Lurndal): Jun 28 07:50PM

>> std::strtok_s(), declared in <cstring>, which is thread safe.

>Why they don't simply re-specify strtok() for newer language
>-versions with internal buffers which are thread-local?

Because it makes much more sense for the caller to provide the
storage for the metadata, as POSIX realized two decades ago:

$ man strtok |head -20
STRTOK(3) Linux Programmer's Manual STRTOK(3)

NAME
strtok, strtok_r - extract tokens from strings

SYNOPSIS
#include <string.h>

char *strtok(char *str, const char *delim);

char *strtok_r(char *str, const char *delim, char **saveptr);

Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

strtok_r(): _SVID_SOURCE || _BSD_SOURCE || _POSIX_C_SOURCE >= 1 ||
_XOPEN_SOURCE || _POSIX_SOURCE

James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 28 01:35PM -0700

On Friday, June 28, 2019 at 3:26:31 PM UTC-4, Alf P. Steinbach wrote:
> On 28.06.2019 21:13, James Kuyper wrote:
...
> * That TR was included as normative annex K in C11.
> * To use it you're apparently supposed to #define __STDC_WANT_LIB_EXT1__
> as 1 before including any C library header.

That's implementation-specific. The standard does not specify how to enable support for annex K, only how to check whether support has been enabled.

> I don't have the C11 standard, unfortunately.

> And nothing I've found indicates a connection with threading?

When I said that strtok_s() is thread-safe, I should instead have said
that it can be used in a thread-safe fashion.

"The strtok function is not required to avoid data races with other
calls to the strtok function.311)" (n1570.pdf 7.24.5.8p3)

311 is a reference to the following footnote:
"The strtok_s function can be used instead to avoid data races."

The fact that the functions described in Annex K can be used in a thread
safe fashion is something you must derive from the descriptions, it's
never said explicitly in Annex K itself.

The key feature of strtok_s() that improves thread safety over strtok()
is that strtok() uses it's own data area to store information about the
string it's parsing. With strtok_s(), you define your own char* pointer,
and then pass the address of that pointer to strtok_s() as it's fourth
argument. That pointer will contain the information that strtok_s()
needs to continue it's parsing when you call it with a null first
argument.

All of the data memory used by strtok_s() is under your control. If you
manage that data in a thread-safe fashion, then your calls to
strtok_s() will be thread safe.

Keith Thompson <kst-u@mib.org>: Jun 28 02:57PM -0700

> * To use it you're apparently supposed to #define __STDC_WANT_LIB_EXT1__
> as 1 before including any C library header.

> I don't have the C11 standard, unfortunately.

N1570 is the last pre-standard draft of C11. It's close enough for most
purposes.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

Annex K is normative but optional. An implementation that supports it
must pre#define __STDC_WANT_EXT1__.

User code should have
#define __STDC_WANT_LIB_EXT1__ 1
to enable the features of Annex K (or 0 to disable them). It's
implementation-defined whether they're enabled or not if
__STDC_WANT_LIB_EXT1__ is not defined.

> And nothing I've found indicates a connection with threading?

strtok() modifies the string passed to it as an argument and it
maintains internal state, so it can't be used in parallel to parse two
different strings. (That could happen either with separate threads or
with interspersed calls in non-threaded code.) strtok_s() requires the
caller to provide space for any internal state, so two different threads
should be able to use it safely as long as the storage they provide is
distinct.

None of the implementations I use support Annex K, and there's a serious
proposal to remove it from C2X.
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1969.htm

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson <kst-u@mib.org>: Jun 28 03:01PM -0700

>> The requirements are not clear.

> what is not clear in my question? I was wondering if thre exists
> a C++ std function that replaces strtok.

What's not clear to me is how strtok() would solve your stated problem.

strtok() splits a string on a specified delimiter. How would that parse
your sample formulas?

I suggest that you have an XY problem.
https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem

lex/flex and yacc/bison could do the job, but might be overkill.

[...]

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

string_view problem

James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 27 08:17PM -0400

On 6/27/19 3:06 PM, Keith Thompson wrote:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
...
> compound statements to allow zero or more declarations followed by
> zero or more statements.

> https://www.bell-labs.com/usr/dmr/www/cman.pdf

In addition to allowing declarations in compound-statements, K&R C also
made a confusing step forward toward the modern syntax rules for a
function definitions.

Looking over cman.pdf, I see that it specified the following grammar
productions in section 10.1 "External function definitions". and
duplicated in Appendix 1 "Syntax Summary", section 4:

function-definition:
type-specifier opt function-declarator function-body

function-body:
type-decl-list function-statement

function-statement:
{ declaration-list opt statement-list }

In K&R C, under section 18 "Syntax summary", you can find exactly that
same grammar in sub-section 4. However, in the main text, in section
10.1 "External function definitions", the middle rule in that chain is
different:

function-body:
declaration-list compound-statement

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 03:31AM +0200

On 27.06.2019 20:10, Ralf Goertz wrote:
> } while (next_permutation(permuter.begin()+1,permuter.end()));
> cout<<sum<<endl;
> }

I cooked up this code, you can check if it's faaaster or slooower:

(The $use_std macro invocation expands to `using std::string, ...;`, and
ditto for the $use_cppx invocation. The library just reduces verbosity.)

#include <cppx-core/all.hpp> // <url:
https://github.com/alf-p-steinbach/cppx-core>
using namespace std::literals;
$use_std( string, string_view, next_permutation, cout, endl );
$use_cppx( string_repeat::operator*, Range );

auto is_first_of_rotations( const string& s, string& rotations_buffer )
-> bool
{
const int n = s.length();
rotations_buffer = s;
rotations_buffer += s;
for( const int i : Range( 1, n - 1 ) ) {
if( string_view( &rotations_buffer[i], n ) < s ) {
return false;
}
}
return true;
}

auto main() -> int
{
const int k = 14;
string s = k*"A"s + k*"B"s;
auto rotations_buffer = string( 2*s.length(), '.' );
int count = 0;
do
{
if( is_first_of_rotations( s, rotations_buffer ) ) {
//cout << s << endl;
++count;
}
} while( next_permutation( $items_of( s ) ) );
cout << count << " circular permutations of " << k << "A+" << k <<
"B." << endl;
}

Cheers!,

- Alf

Juha Nieminen <nospam@thanks.invalid>: Jun 28 06:43AM

> (The $use_std macro invocation expands to `using std::string, ...;`, and
> ditto for the $use_cppx invocation. The library just reduces verbosity.)

When you are posting here, why do you insist in using such non-standard code
that only makes it harder for somebody to test it? Or even understand it?

Are you, perhaps, naive enough to think that if you keep using your pet
library in your usenet posts, it will become popular?

This newsgroup is about *standard* C++. How about we keep all code
standard as well, and preferably not needlessly dependent on some
third-party libraries, especially when the point of the code has
absolutely nothing to do with them? Is that too much to ask?

Tim Rentsch <tr.17687@z991.linuxsc.com>: Jun 27 11:55PM -0700

> Great, thanks! I still need 2.7 seconds for 14 14, but that was not
> doable before. Maybe you've got some insight as to why I still seem to
> be much slower than you (assuming our hardware is comparable).

Briefly:

(1) My code had only one string, twice as long as the string
being permuted, with next_permutation() being done on the first
half, then copying the first half into the second half (using
memcpy()) to set up the rotations checks.

(2) Except for initializing and next_permutation(), my code did
everything with 'const char *', not strings or string_views.
Here are the functions that do the rotations check:

int
is_first( const char *s, unsigned k ){
unsigned i;
for( i = 1; i < k; i++ ){
if( is_before( s+i, s, k ) ) return 0;
}
return 1;
}

int
is_before( const char *a, const char *b, unsigned k ){
return
k == 0 || *a > *b ? 0 :
*a < *b ? 1 :
is_before( a+1, b+1, k-1 );
}

(3) Optimization level -O3 gave maybe a 5% improvement over -O2.

(4) What seemed to work best was "inlining" is_before() into the
body of is_first(), but not is_first() into its caller. I don't
know why, but since it seemed to help that's what I did.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 09:54AM +0200

On 28.06.2019 08:43, Juha Nieminen wrote:
> standard as well, and preferably not needlessly dependent on some
> third-party libraries, especially when the point of the code has
> absolutely nothing to do with them? Is that too much to ask?

You're asking that the relevant library code be duplicated in each
posting here.

That would be idiotic, if you think about it.

There's even a bunch of acronyms designed to help programmers avoid the
ungood practice of duplicating code.

One of them is "DRY": Don't Repeat Yourself.

I'm not going to add more code to postings when that common code is much
better referred to on GitHub.

Cheers!,

- Alf

Geoff <geoff@invalid.invalid>: Jun 28 01:14PM -0700

On Fri, 28 Jun 2019 09:54:54 +0200, "Alf P. Steinbach"
>> absolutely nothing to do with them? Is that too much to ask?

>You're asking that the relevant library code be duplicated in each
>posting here.

He's not asking that at all. He's asking you to stop posting code
that's dependent on your non-standard library.

>That would be idiotic, if you think about it.

The OP's code was fairly standard C++, your code is not.
That's idiotic.

>One of them is "DRY": Don't Repeat Yourself.

>I'm not going to add more code to postings when that common code is much
>better referred to on GitHub.

There's another one - GCIGC: Garbage Code Is Garbage Code.

The OP would be better off ignoring your posts.

"Simplify Your Code With Rocket Science: C++20's Spaceship Operator"

Lynn McGuire <lynnmcguire5@gmail.com>: Jun 28 02:05PM -0500

"Simplify Your Code With Rocket Science: C++20's Spaceship Operator"

https://devblogs.microsoft.com/cppblog/simplify-your-code-with-rocket-science-c20s-spaceship-operator/

"C++20 adds a new operator, affectionately dubbed the "spaceship"
operator: <=>. There was a post awhile back by our very own Simon Brand
detailing some information regarding this new operator along with some
conceptual information about what it is and does. The goal of this post
is to explore some concrete applications of this strange new operator
and its associated counterpart, the operator== (yes it has been changed,
for the better!), all while providing some guidelines for its use in
everyday code."

Hat tip to:

https://www.codeproject.com/script/Mailouts/View.aspx?mlid=14431&_z=1988477

Lynn

Undefined Behaviour

Tim Rentsch <tr.17687@z991.linuxsc.com>: Jun 27 11:23PM -0700

> [...]

Let me make one more run at trying to achieve some sort of
closure in this discussion. Can you explain what you think
is the point of view that I am espousing? I'm not looking
for any type of argument or rebuttal, just a statement of
what you think my views are, as directly as you can make
it.

(Direct is not the same as short here. My viewpoint has more
than a few different aspects, which is to say too many to be
conveyed in only one or two sentences, so please be thorough.)

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Friday, June 28, 2019

Digest for comp.lang.c++@googlegroups.com - 25 updates in 4 topics

No comments:

Blog Archive

About Me