- is there a C++ version of the strtok() function? - 17 Updates
- string_view problem - 6 Updates
- "Simplify Your Code With Rocket Science: C++20's Spaceship Operator" - 1 Update
- Undefined Behaviour - 1 Update
alexo <alelvb@inwind.it>: Jun 28 07:14PM +0200 Hello, I would like to improve a chemical formula parser that I wrote from scratch without using tokenizing functions that correctly handles the formula: Fe4[Fe(CN)6]3*6H2O but not the following: [Be(N(CH3)2)2]3 my asking is: is there a purely C++ function that behaves like the C strtok() ? thank you alessandro |
Thiago Adams <thiago.adams@gmail.com>: Jun 28 10:33AM -0700 On Friday, June 28, 2019 at 2:15:09 PM UTC-3, alexo wrote: > my asking is: > is there a purely C++ function that behaves like the C strtok() ? > thank you Maybe std::regex can help you in the way you want to do. https://en.cppreference.com/w/cpp/regex I don't remember how chemical formulas are expressed, but I believe the best thing to do is write you own tokenizer and parser. If strtok was helping you, that means that the tokenizer you need to do is simple. |
scott@slp53.sl.home (Scott Lurndal): Jun 28 05:41PM >[Be(N(CH3)2)2]3 >my asking is: >is there a purely C++ function that behaves like the C strtok() ? If it works, why "improve" it? strtok is perfectly legal C++. |
Manfred <noname@add.invalid>: Jun 28 07:47PM +0200 On 6/28/2019 7:14 PM, alexo wrote: > [Be(N(CH3)2)2]3 > my asking is: > is there a purely C++ function that behaves like the C strtok() ? If you want a function that behaves like strtok, why not use strtok itself? This, like all C standard functions, is allowed in C++. Besides, I am pretty sure that the (pure) C++ standard library does not include a function that behave identically to a C standard function. That said, from what you are trying to achieve probably strtok is not the best tokenizer for the purpose - most notably it does not handle nesting and paired (open/close) parentheses by itself, not to mention that it overwrites delimiters with 0's. A long time ago I wrote a math expression parser, but that was pure C, and not using strtok either. Looking at C++ and the kind of problem, you probably won't be best off with a /function/, possibly you may use some combination of string_view with some recursive logic. Others may give more detailed hints. |
Bonita Montero <Bonita.Montero@gmail.com>: Jun 28 07:54PM +0200 > Maybe std::regex can help you in the way you want to do. > https://en.cppreference.com/w/cpp/regex This will work also, but I'm sure that's not nearly as fast as strtok(). |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 08:17PM +0200 On 28.06.2019 19:14, alexo wrote: > [Be(N(CH3)2)2]3 > my asking is: > is there a purely C++ function that behaves like the C strtok() ? The requirements are not clear. But the main problem with `strtok` is that it isn't thread safe. If thread safety isn't a concern then just (continue to) use it. Otherwise, consider either a regular expression (standard library solution) or a parsing framework like Boost Spirit (3rd party library). Cheers & hth., - Alf |
alexo <alelvb@inwind.it>: Jun 28 09:11PM +0200 Il 28/06/19 20:17, Alf P. Steinbach ha scritto: >> my asking is: >> is there a purely C++ function that behaves like the C strtok() ? > The requirements are not clear. what is not clear in my question? I was wondering if thre exists a C++ std function that replaces strtok. > But the main problem with `strtok` is that it isn't thread safe. I don't need threads, so strtok is ok. thank you |
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 28 12:13PM -0700 On Friday, June 28, 2019 at 2:17:16 PM UTC-4, Alf P. Steinbach wrote: > On 28.06.2019 19:14, alexo wrote: ... > > is there a purely C++ function that behaves like the C strtok() ? > The requirements are not clear. > But the main problem with `strtok` is that it isn't thread safe. If an implementation pre#defines __STDC_LIB_EXT1__, you can use std::strtok_s(), declared in <cstring>, which is thread safe. |
Bonita Montero <Bonita.Montero@gmail.com>: Jun 28 09:22PM +0200 > If an implementation pre#defines __STDC_LIB_EXT1__, you can use > std::strtok_s(), declared in <cstring>, which is thread safe. Why they don't simply re-specify strtok() for newer language -versions with internal buffers which are thread-local? |
Christian Gollwitzer <auriocus@gmx.de>: Jun 28 09:20PM +0200 Am 28.06.19 um 19:33 schrieb Thiago Adams: >> [Be(N(CH3)2)2]3 > Maybe std::regex can help you in the way you want to do. > https://en.cppreference.com/w/cpp/regex A regex cannot express this grammar. Simple proof: There are nested parentheses and for that you need a stack automaton. But generally using a parser generator might be good advice. There are many to choose from, I like PEG grammars. There is https://github.com/yhirose/cpp-peglib for example. Christian |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 09:26PM +0200 On 28.06.2019 21:13, James Kuyper wrote: >> But the main problem with `strtok` is that it isn't thread safe. > If an implementation pre#defines __STDC_LIB_EXT1__, you can use > std::strtok_s(), declared in <cstring>, which is thread safe. This was news to me, and I'm unable to find much info about it. What I've found, scattered here & there in discussions, is that * Microsoft submitted their silly *_s bounds checking functions for standardization in C. * The C committee fixed the worst problems and created a technical report, TR 24731-1. * That TR was included as normative annex K in C11. * To use it you're apparently supposed to #define __STDC_WANT_LIB_EXT1__ as 1 before including any C library header. I don't have the C11 standard, unfortunately. And nothing I've found indicates a connection with threading? Cheers!, - Alf |
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 28 12:37PM -0700 On Friday, June 28, 2019 at 1:15:09 PM UTC-4, alexo wrote: > [Be(N(CH3)2)2]3 > my asking is: > is there a purely C++ function that behaves like the C strtok() ? Yes. It is declared in <cstring>, and it's called std::strtok(). If the length of the string which is the second argument to your strtok() calls is 1, you might want to look into std::getline<>(), declared in <string>, which takes an argument which is a delimiter character. |
alexo <alelvb@inwind.it>: Jun 28 09:45PM +0200 Il 28/06/19 19:47, Manfred ha scritto: > the best tokenizer for the purpose - most notably it does not handle > nesting and paired (open/close) parentheses by itself, not to mention > that it overwrites delimiters with 0's. I thought it could help, but if I use something like this: tokens = strtok("Na[Fe(CN)6]", "()[]*"); I get: Na Fe CN 6 that is a correct but useless decomposition, because as you stated, I can't match the '6' referring to both the 'Fe' and the 'CN' group. > with a /function/, possibly you may use some combination of string_view > with some recursive logic. > Others may give more detailed hints. The program that I've written uses a 'manual' jump from an open parentheses to the corresponding closing, but can't handle trickier formulas. for example: [Be(N(CH3)2)2]3 in my program is seen as having: 3 Be atoms -> correct 6 N atoms -> correct 1 C atom -> it should be 12 3 H atoms -> it should be 36 thank you, alessandro |
scott@slp53.sl.home (Scott Lurndal): Jun 28 07:50PM >> std::strtok_s(), declared in <cstring>, which is thread safe. >Why they don't simply re-specify strtok() for newer language >-versions with internal buffers which are thread-local? Because it makes much more sense for the caller to provide the storage for the metadata, as POSIX realized two decades ago: $ man strtok |head -20 STRTOK(3) Linux Programmer's Manual STRTOK(3) NAME strtok, strtok_r - extract tokens from strings SYNOPSIS #include <string.h> char *strtok(char *str, const char *delim); char *strtok_r(char *str, const char *delim, char **saveptr); Feature Test Macro Requirements for glibc (see feature_test_macros(7)): strtok_r(): _SVID_SOURCE || _BSD_SOURCE || _POSIX_C_SOURCE >= 1 || _XOPEN_SOURCE || _POSIX_SOURCE |
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 28 01:35PM -0700 On Friday, June 28, 2019 at 3:26:31 PM UTC-4, Alf P. Steinbach wrote: > On 28.06.2019 21:13, James Kuyper wrote: ... > * That TR was included as normative annex K in C11. > * To use it you're apparently supposed to #define __STDC_WANT_LIB_EXT1__ > as 1 before including any C library header. That's implementation-specific. The standard does not specify how to enable support for annex K, only how to check whether support has been enabled. > I don't have the C11 standard, unfortunately. > And nothing I've found indicates a connection with threading? When I said that strtok_s() is thread-safe, I should instead have said that it can be used in a thread-safe fashion. "The strtok function is not required to avoid data races with other calls to the strtok function.311)" (n1570.pdf 7.24.5.8p3) 311 is a reference to the following footnote: "The strtok_s function can be used instead to avoid data races." The fact that the functions described in Annex K can be used in a thread safe fashion is something you must derive from the descriptions, it's never said explicitly in Annex K itself. The key feature of strtok_s() that improves thread safety over strtok() is that strtok() uses it's own data area to store information about the string it's parsing. With strtok_s(), you define your own char* pointer, and then pass the address of that pointer to strtok_s() as it's fourth argument. That pointer will contain the information that strtok_s() needs to continue it's parsing when you call it with a null first argument. All of the data memory used by strtok_s() is under your control. If you manage that data in a thread-safe fashion, then your calls to strtok_s() will be thread safe. |
Keith Thompson <kst-u@mib.org>: Jun 28 02:57PM -0700 > * To use it you're apparently supposed to #define __STDC_WANT_LIB_EXT1__ > as 1 before including any C library header. > I don't have the C11 standard, unfortunately. N1570 is the last pre-standard draft of C11. It's close enough for most purposes. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf Annex K is normative but optional. An implementation that supports it must pre#define __STDC_WANT_EXT1__. User code should have #define __STDC_WANT_LIB_EXT1__ 1 to enable the features of Annex K (or 0 to disable them). It's implementation-defined whether they're enabled or not if __STDC_WANT_LIB_EXT1__ is not defined. > And nothing I've found indicates a connection with threading? strtok() modifies the string passed to it as an argument and it maintains internal state, so it can't be used in parallel to parse two different strings. (That could happen either with separate threads or with interspersed calls in non-threaded code.) strtok_s() requires the caller to provide space for any internal state, so two different threads should be able to use it safely as long as the storage they provide is distinct. None of the implementations I use support Annex K, and there's a serious proposal to remove it from C2X. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1969.htm -- Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst> Will write code for food. void Void(void) { Void(); } /* The recursive call of the void */ |
Keith Thompson <kst-u@mib.org>: Jun 28 03:01PM -0700 >> The requirements are not clear. > what is not clear in my question? I was wondering if thre exists > a C++ std function that replaces strtok. What's not clear to me is how strtok() would solve your stated problem. strtok() splits a string on a specified delimiter. How would that parse your sample formulas? I suggest that you have an XY problem. https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem lex/flex and yacc/bison could do the job, but might be overkill. [...] -- Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst> Will write code for food. void Void(void) { Void(); } /* The recursive call of the void */ |
James Kuyper <jameskuyper@alumni.caltech.edu>: Jun 27 08:17PM -0400 On 6/27/19 3:06 PM, Keith Thompson wrote: > James Kuyper <jameskuyper@alumni.caltech.edu> writes: ... > compound statements to allow zero or more declarations followed by > zero or more statements. > https://www.bell-labs.com/usr/dmr/www/cman.pdf In addition to allowing declarations in compound-statements, K&R C also made a confusing step forward toward the modern syntax rules for a function definitions. Looking over cman.pdf, I see that it specified the following grammar productions in section 10.1 "External function definitions". and duplicated in Appendix 1 "Syntax Summary", section 4: function-definition: type-specifier opt function-declarator function-body function-body: type-decl-list function-statement function-statement: { declaration-list opt statement-list } In K&R C, under section 18 "Syntax summary", you can find exactly that same grammar in sub-section 4. However, in the main text, in section 10.1 "External function definitions", the middle rule in that chain is different: function-body: declaration-list compound-statement |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 03:31AM +0200 On 27.06.2019 20:10, Ralf Goertz wrote: > } while (next_permutation(permuter.begin()+1,permuter.end())); > cout<<sum<<endl; > } I cooked up this code, you can check if it's faaaster or slooower: (The $use_std macro invocation expands to `using std::string, ...;`, and ditto for the $use_cppx invocation. The library just reduces verbosity.) #include <cppx-core/all.hpp> // <url: https://github.com/alf-p-steinbach/cppx-core> using namespace std::literals; $use_std( string, string_view, next_permutation, cout, endl ); $use_cppx( string_repeat::operator*, Range ); auto is_first_of_rotations( const string& s, string& rotations_buffer ) -> bool { const int n = s.length(); rotations_buffer = s; rotations_buffer += s; for( const int i : Range( 1, n - 1 ) ) { if( string_view( &rotations_buffer[i], n ) < s ) { return false; } } return true; } auto main() -> int { const int k = 14; string s = k*"A"s + k*"B"s; auto rotations_buffer = string( 2*s.length(), '.' ); int count = 0; do { if( is_first_of_rotations( s, rotations_buffer ) ) { //cout << s << endl; ++count; } } while( next_permutation( $items_of( s ) ) ); cout << count << " circular permutations of " << k << "A+" << k << "B." << endl; } Cheers!, - Alf |
Juha Nieminen <nospam@thanks.invalid>: Jun 28 06:43AM > (The $use_std macro invocation expands to `using std::string, ...;`, and > ditto for the $use_cppx invocation. The library just reduces verbosity.) When you are posting here, why do you insist in using such non-standard code that only makes it harder for somebody to test it? Or even understand it? Are you, perhaps, naive enough to think that if you keep using your pet library in your usenet posts, it will become popular? This newsgroup is about *standard* C++. How about we keep all code standard as well, and preferably not needlessly dependent on some third-party libraries, especially when the point of the code has absolutely nothing to do with them? Is that too much to ask? |
Tim Rentsch <tr.17687@z991.linuxsc.com>: Jun 27 11:55PM -0700 > Great, thanks! I still need 2.7 seconds for 14 14, but that was not > doable before. Maybe you've got some insight as to why I still seem to > be much slower than you (assuming our hardware is comparable). Briefly: (1) My code had only one string, twice as long as the string being permuted, with next_permutation() being done on the first half, then copying the first half into the second half (using memcpy()) to set up the rotations checks. (2) Except for initializing and next_permutation(), my code did everything with 'const char *', not strings or string_views. Here are the functions that do the rotations check: int is_first( const char *s, unsigned k ){ unsigned i; for( i = 1; i < k; i++ ){ if( is_before( s+i, s, k ) ) return 0; } return 1; } int is_before( const char *a, const char *b, unsigned k ){ return k == 0 || *a > *b ? 0 : *a < *b ? 1 : is_before( a+1, b+1, k-1 ); } (3) Optimization level -O3 gave maybe a 5% improvement over -O2. (4) What seemed to work best was "inlining" is_before() into the body of is_first(), but not is_first() into its caller. I don't know why, but since it seemed to help that's what I did. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 28 09:54AM +0200 On 28.06.2019 08:43, Juha Nieminen wrote: > standard as well, and preferably not needlessly dependent on some > third-party libraries, especially when the point of the code has > absolutely nothing to do with them? Is that too much to ask? You're asking that the relevant library code be duplicated in each posting here. That would be idiotic, if you think about it. There's even a bunch of acronyms designed to help programmers avoid the ungood practice of duplicating code. One of them is "DRY": Don't Repeat Yourself. I'm not going to add more code to postings when that common code is much better referred to on GitHub. Cheers!, - Alf |
Geoff <geoff@invalid.invalid>: Jun 28 01:14PM -0700 On Fri, 28 Jun 2019 09:54:54 +0200, "Alf P. Steinbach" >> absolutely nothing to do with them? Is that too much to ask? >You're asking that the relevant library code be duplicated in each >posting here. He's not asking that at all. He's asking you to stop posting code that's dependent on your non-standard library. >That would be idiotic, if you think about it. The OP's code was fairly standard C++, your code is not. That's idiotic. >One of them is "DRY": Don't Repeat Yourself. >I'm not going to add more code to postings when that common code is much >better referred to on GitHub. There's another one - GCIGC: Garbage Code Is Garbage Code. The OP would be better off ignoring your posts. |
Lynn McGuire <lynnmcguire5@gmail.com>: Jun 28 02:05PM -0500 "Simplify Your Code With Rocket Science: C++20's Spaceship Operator" https://devblogs.microsoft.com/cppblog/simplify-your-code-with-rocket-science-c20s-spaceship-operator/ "C++20 adds a new operator, affectionately dubbed the "spaceship" operator: <=>. There was a post awhile back by our very own Simon Brand detailing some information regarding this new operator along with some conceptual information about what it is and does. The goal of this post is to explore some concrete applications of this strange new operator and its associated counterpart, the operator== (yes it has been changed, for the better!), all while providing some guidelines for its use in everyday code." Hat tip to: https://www.codeproject.com/script/Mailouts/View.aspx?mlid=14431&_z=1988477 Lynn |
Tim Rentsch <tr.17687@z991.linuxsc.com>: Jun 27 11:23PM -0700 > [...] Let me make one more run at trying to achieve some sort of closure in this discussion. Can you explain what you think is the point of view that I am espousing? I'm not looking for any type of argument or rebuttal, just a statement of what you think my views are, as directly as you can make it. (Direct is not the same as short here. My viewpoint has more than a few different aspects, which is to say too many to be conveyed in only one or two sentences, so please be thorough.) |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment