- "CppCast interview about the Oulu ISO C++ meeting" by Herb Sutter - 1 Update
- A case insensitive substring search? - 3 Updates
- A case insensitive substring search? - 8 Updates
- list of operating systems that can run C++? - 2 Updates
Lynn McGuire <lmc@winsim.com>: Jun 29 04:54PM -0500 "CppCast interview about the Oulu ISO C++ meeting" by Herb Sutter https://herbsutter.com/2016/06/27/cppcast-interview-about-the-oulu-iso-c-meeting/ "On Saturday afternoon, at the ISO C++ meeting in Oulu, Finland, we completed the feature set of C++17 and approved sending out the feature-complete document for its primary international comment ballot (aka "CD" or Committee Draft ballot)." Lynn |
ram@zedat.fu-berlin.de (Stefan Ram): Jun 29 04:33AM > There's no way to handle this situation >using only the standard library, where case conversion is assumed to map >one character to one character. No way? It was always possible to use a custom character encoding, where »SS« is represented by a single code point (character); it still can be rendered »SS«, which is a question of I/O. Especially, Unicode already has »U+1E9E LATIN CAPITAL LETTER SHARP S«. ¯¯¯¯¯¯¯ |
ram@zedat.fu-berlin.de (Stefan Ram): Jun 29 05:24PM >It was always possible to use a custom character encoding, where >»SS« is represented by a single code point (character); it still >can be rendered »SS«, which is a question of I/O. And there even is a precedent for it: »"\n"« (one character), which sometimes is output as »"\015\012"« (two characters). |
ram@zedat.fu-berlin.de (Stefan Ram): Jun 29 08:11PM >>which sometimes is output as »"\015\012"« (two characters). >But that would break the Unix-land convention for narrow streams, that >(in Unix-land) they only shuffle bytes, with no interpretation. In C++ we have 27.5.3.1.4p1 »binary« »perform input and output in binary mode (as opposed to text mode)« to shuffle bytes with no interpretation. Or, we have »text mode«, which is what I was thinking about above. However, since on input under »DOS« »\015\012« is being converted to »\n«, and then on output »\n« is being converted to »\015\012« again, a plain copy, even in text mode, would not modify the bytes »\015\012«. |
Nobody <nobody@nowhere.invalid>: Jun 29 12:39AM +0100 On Tue, 28 Jun 2016 13:16:55 -0700, James Moe wrote: > A case insensitive substring search: How hard can it be?. Substring search is provided by the std::basic_string::find() method. Comparisons are performed using the eq() member of the traits class, so if you use a traits class where eq() is case-insensitive, you get a case-insensitive substring comparison. But that requires using an explicit specialisation of std::basic_string rather than just using std::string or std::wstring (which use std::char_traits implicitly). A cast would almost certainly work, but (AFAIK) it's not guaranteed. Whatever method you choose, there's the issue that it isn't always possible to perform a case-insensitive comparison character-by-character. The classic example of where this fails is that the German "sharp s" character ("ß") doesn't have an equivalent upper-case character; the upper-case equivalent is "SS". There's no way to handle this situation using only the standard library, where case conversion is assumed to map one character to one character. |
James Moe <jimoeDESPAM@sohnen-moe.com>: Jun 28 10:25PM -0700 On 06/28/2016 03:48 PM, Ben Bacarisse wrote: > You need search, not search_n. Tried that. No luck. bool strstricmp (string & s1, string & sub2) { string::iterator pos; pos = search(s1.begin, s1.end(), sub2.begin(), sub2.end(), icompare); return (pos != s1.end()); } Complains that there is no match for the predicate. > And if you are using modern C++ you can [...] Alas, no. g++ 4.7.3. -- James Moe jmm-list at sohnen-moe dot com Think. |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Jun 29 10:56AM +0100 > return (pos != s1.end()); > } > Complains that there is no match for the predicate. Seems odd. I'd expect more problems to come from the missing () after s1.begin. When I fix that, it works here. You should consider making s1 and sub2 references to const. >> And if you are using modern C++ you can [...] > Alas, no. g++ 4.7.3. The [...] went on to suggest using an anonymous function. I thought these were introduced in gcc 4.5. -- Ben. |
"Öö Tiib" <ootiib@hot.ee>: Jun 29 04:19AM -0700 On Wednesday, 29 June 2016 12:56:32 UTC+3, Ben Bacarisse wrote: > > Complains that there is no match for the predicate. > Seems odd. I'd expect more problems to come from the missing () after > s1.begin. When I fix that, it works here. May be that James has some user-defined things named 'search' and/or 'string'. With 'std::string' and 'std::search' his code will be rejected because of missing '()' after 's1.begin'. > > Alas, no. g++ 4.7.3. > The [...] went on to suggest using an anonymous function. I thought > these were introduced in gcc 4.5. Yes, lambdas were available since gcc 4.5. |
"Öö Tiib" <ootiib@hot.ee>: Jun 29 06:22AM -0700 On Wednesday, 29 June 2016 02:38:24 UTC+3, Nobody wrote: > The classic example of where this fails is that the German "sharp s" > character ("ß") doesn't have an equivalent upper-case character; the > upper-case equivalent is "SS". So some capitalized word in text (say "RUSSEN") may be one German word ("rußen") or other German word ("Russen") or same-looking word or name from some other language? > There's no way to handle this situation using only the standard > library, where case conversion is assumed to map one character > to one character. If the issue is like I described above then it can't be handled with whatever software and even for real human operator it is possible to construct ambiguous input that the operator can't decide. If treating "ß" and "ss" as equivalents is good enough hack around it then however it is doable. |
Ralf Goertz <me@myprovider.invalid>: Jun 29 05:16PM +0200 Am Wed, 29 Jun 2016 06:22:48 -0700 (PDT) > So some capitalized word in text (say "RUSSEN") may be one German word > ("rußen") or other German word ("Russen") or same-looking word or name > from some other language? If there is going to be an ambiguity like in your example a capitalized ß becomes SZ which is what it really is. AFAIK the origin of the ß is a ligature of ſ (long s) and z. Another more common example of this ambiguity would be „Maße" (measures). It will become „MASZE" not „MASSE" (mass). |
James Moe <jimoeDESPAM@sohnen-moe.com>: Jun 29 10:48AM -0700 On 06/28/2016 10:25 PM, James Moe wrote: > pos = search(s1.begin, s1.end(), Bzzt! User error. That should be "s1.begin()", not "s1.begin". -- James Moe jmm-list at sohnen-moe dot com Think. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 29 09:32PM +0200 On 29.06.2016 19:24, Stefan Ram wrote: >> can be rendered »SS«, which is a question of I/O. > And there even is a precedent for it: »"\n"« (one character), > which sometimes is output as »"\015\012"« (two characters). But that would break the Unix-land convention for narrow streams, that (in Unix-land) they only shuffle bytes, with no interpretation. Also keep in mind that in Unix-land the narrow streams are now usually UTF-8 encoded, with a variable number of bytes per Unicode code point. But I like the idea of the uppercase sharp S. I didn't know about it until your posting. Thanks! Cheers!, - Alf |
"J. Clarke" <j.clarke.873638@gmail.com>: Jun 28 08:13PM -0400 In article <nkttkg$81t$1@dont-email.me>, no@spam.net says... > nonprofessionals programming academic stuff like myself) are not aware > of the many new additions to the standard, already available in many > compilers. The trouble we are having with Fortran is that we have to train every new hire on it. It's not something that people know coming in the door and if they do it's one of those "modern" Fortrans and not the extended 77 which is what runs on the Z. Between that an JCL, it's often a long time before somebody is productive. As for linking with C, if C couldn't link with Fortran it would have been a non-starter for us, but this is IBM software and IBM hardware and however painful it may be, there's usually a way to get it to work together. |
Jerry Stuckle <jstucklex@attglobal.net>: Jun 28 08:45PM -0400 On 6/28/2016 8:13 PM, J. Clarke wrote: > couldn't link with Fortran it would have been a non-starter for us, but > this is IBM software and IBM hardware and however painful it may be, > there's usually a way to get it to work together. Gee, how times have changed. Fortran was the first language I learned :) But it seems it's only commonly used where the heavy math is required. Not as much used for general purpose things like C is. -- ================== Remove the "x" from my email address Jerry Stuckle jstucklex@attglobal.net ================== |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |