soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

"CppCast interview about the Oulu ISO C++ meeting" by Herb Sutter - 1 Update
A case insensitive substring search? - 3 Updates
A case insensitive substring search? - 8 Updates
list of operating systems that can run C++? - 2 Updates

"CppCast interview about the Oulu ISO C++ meeting" by Herb Sutter

Lynn McGuire <lmc@winsim.com>: Jun 29 04:54PM -0500

"CppCast interview about the Oulu ISO C++ meeting" by Herb Sutter
https://herbsutter.com/2016/06/27/cppcast-interview-about-the-oulu-iso-c-meeting/

"On Saturday afternoon, at the ISO C++ meeting in Oulu, Finland, we completed the feature set of C++17 and approved sending out the
feature-complete document for its primary international comment ballot (aka "CD" or Committee Draft ballot)."

Lynn

A case insensitive substring search?

ram@zedat.fu-berlin.de (Stefan Ram): Jun 29 04:33AM

> There's no way to handle this situation
>using only the standard library, where case conversion is assumed to map
>one character to one character.

No way?

It was always possible to use a custom character encoding, where
»SS« is represented by a single code point (character); it still
can be rendered »SS«, which is a question of I/O. Especially,
Unicode already has »U+1E9E LATIN CAPITAL LETTER SHARP S«.
¯¯¯¯¯¯¯

ram@zedat.fu-berlin.de (Stefan Ram): Jun 29 05:24PM

>It was always possible to use a custom character encoding, where
>»SS« is represented by a single code point (character); it still
>can be rendered »SS«, which is a question of I/O.

And there even is a precedent for it: »"\n"« (one character),
which sometimes is output as »"\015\012"« (two characters).

ram@zedat.fu-berlin.de (Stefan Ram): Jun 29 08:11PM

>>which sometimes is output as »"\015\012"« (two characters).
>But that would break the Unix-land convention for narrow streams, that
>(in Unix-land) they only shuffle bytes, with no interpretation.

In C++ we have 27.5.3.1.4p1 »binary« »perform input and
output in binary mode (as opposed to text mode)« to
shuffle bytes with no interpretation.

Or, we have »text mode«, which is what I was thinking
about above.

However, since on input under »DOS« »\015\012« is
being converted to »\n«, and then on output »\n« is
being converted to »\015\012« again, a plain copy,
even in text mode, would not modify the bytes »\015\012«.

A case insensitive substring search?

Nobody <nobody@nowhere.invalid>: Jun 29 12:39AM +0100

On Tue, 28 Jun 2016 13:16:55 -0700, James Moe wrote:

> A case insensitive substring search: How hard can it be?.

Substring search is provided by the std::basic_string::find() method.
Comparisons are performed using the eq() member of the traits class,
so if you use a traits class where eq() is case-insensitive, you get a
case-insensitive substring comparison.

But that requires using an explicit specialisation of std::basic_string
rather than just using std::string or std::wstring (which use
std::char_traits implicitly). A cast would almost certainly work, but
(AFAIK) it's not guaranteed.

Whatever method you choose, there's the issue that it isn't always
possible to perform a case-insensitive comparison character-by-character.

The classic example of where this fails is that the German "sharp s"
character ("ß") doesn't have an equivalent upper-case character; the
upper-case equivalent is "SS". There's no way to handle this situation
using only the standard library, where case conversion is assumed to map
one character to one character.

James Moe <jimoeDESPAM@sohnen-moe.com>: Jun 28 10:25PM -0700

On 06/28/2016 03:48 PM, Ben Bacarisse wrote:
> You need search, not search_n.

Tried that. No luck.
bool strstricmp (string & s1, string & sub2)
{
string::iterator pos;

pos = search(s1.begin, s1.end(),
sub2.begin(), sub2.end(),
icompare);
return (pos != s1.end());
}
Complains that there is no match for the predicate.

> And if you are using modern C++ you can [...]

Alas, no. g++ 4.7.3.

--
James Moe
jmm-list at sohnen-moe dot com
Think.

Ben Bacarisse <ben.usenet@bsb.me.uk>: Jun 29 10:56AM +0100

> return (pos != s1.end());
> }
> Complains that there is no match for the predicate.

Seems odd. I'd expect more problems to come from the missing () after
s1.begin. When I fix that, it works here.

You should consider making s1 and sub2 references to const.

>> And if you are using modern C++ you can [...]

> Alas, no. g++ 4.7.3.

The [...] went on to suggest using an anonymous function. I thought
these were introduced in gcc 4.5.

--
Ben.

"Öö Tiib" <ootiib@hot.ee>: Jun 29 04:19AM -0700

On Wednesday, 29 June 2016 12:56:32 UTC+3, Ben Bacarisse wrote:
> > Complains that there is no match for the predicate.

> Seems odd. I'd expect more problems to come from the missing () after
> s1.begin. When I fix that, it works here.

May be that James has some user-defined things named 'search'
and/or 'string'. With 'std::string' and 'std::search' his code will be
rejected because of missing '()' after 's1.begin'.

> > Alas, no. g++ 4.7.3.

> The [...] went on to suggest using an anonymous function. I thought
> these were introduced in gcc 4.5.

Yes, lambdas were available since gcc 4.5.

"Öö Tiib" <ootiib@hot.ee>: Jun 29 06:22AM -0700

On Wednesday, 29 June 2016 02:38:24 UTC+3, Nobody wrote:

> The classic example of where this fails is that the German "sharp s"
> character ("ß") doesn't have an equivalent upper-case character; the
> upper-case equivalent is "SS".

So some capitalized word in text (say "RUSSEN") may be one German
word ("rußen") or other German word ("Russen") or same-looking
word or name from some other language?

> There's no way to handle this situation using only the standard
> library, where case conversion is assumed to map one character
> to one character.

If the issue is like I described above then it can't be handled with
whatever software and even for real human operator it is possible
to construct ambiguous input that the operator can't decide. If
treating "ß" and "ss" as equivalents is good enough hack around
it then however it is doable.

Ralf Goertz <me@myprovider.invalid>: Jun 29 05:16PM +0200

Am Wed, 29 Jun 2016 06:22:48 -0700 (PDT)

> So some capitalized word in text (say "RUSSEN") may be one German word
> ("rußen") or other German word ("Russen") or same-looking word or name
> from some other language?

If there is going to be an ambiguity like in your example a capitalized
ß becomes SZ which is what it really is. AFAIK the origin of the ß is a
ligature of ſ (long s) and z. Another more common example of this
ambiguity would be „Maße" (measures). It will become „MASZE" not „MASSE"
(mass).

James Moe <jimoeDESPAM@sohnen-moe.com>: Jun 29 10:48AM -0700

On 06/28/2016 10:25 PM, James Moe wrote:
> pos = search(s1.begin, s1.end(),

Bzzt! User error.
That should be "s1.begin()", not "s1.begin".

--
James Moe
jmm-list at sohnen-moe dot com
Think.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Jun 29 09:32PM +0200

On 29.06.2016 19:24, Stefan Ram wrote:
>> can be rendered »SS«, which is a question of I/O.

> And there even is a precedent for it: »"\n"« (one character),
> which sometimes is output as »"\015\012"« (two characters).

But that would break the Unix-land convention for narrow streams, that
(in Unix-land) they only shuffle bytes, with no interpretation.

Also keep in mind that in Unix-land the narrow streams are now usually
UTF-8 encoded, with a variable number of bytes per Unicode code point.

But I like the idea of the uppercase sharp S. I didn't know about it
until your posting. Thanks!

Cheers!,

- Alf

list of operating systems that can run C++?

"J. Clarke" <j.clarke.873638@gmail.com>: Jun 28 08:13PM -0400

In article <nkttkg$81t$1@dont-email.me>, no@spam.net says...
> nonprofessionals programming academic stuff like myself) are not aware
> of the many new additions to the standard, already available in many
> compilers.

The trouble we are having with Fortran is that we have to train every
new hire on it. It's not something that people know coming in the door
and if they do it's one of those "modern" Fortrans and not the extended
77 which is what runs on the Z. Between that an JCL, it's often a long
time before somebody is productive. As for linking with C, if C
couldn't link with Fortran it would have been a non-starter for us, but
this is IBM software and IBM hardware and however painful it may be,
there's usually a way to get it to work together.

Jerry Stuckle <jstucklex@attglobal.net>: Jun 28 08:45PM -0400

On 6/28/2016 8:13 PM, J. Clarke wrote:
> couldn't link with Fortran it would have been a non-starter for us, but
> this is IBM software and IBM hardware and however painful it may be,
> there's usually a way to get it to work together.

Gee, how times have changed. Fortran was the first language I learned
:) But it seems it's only commonly used where the heavy math is
required. Not as much used for general purpose things like C is.

--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Wednesday, June 29, 2016

Digest for comp.lang.c++@googlegroups.com - 14 updates in 4 topics

No comments:

Blog Archive

About Me