Wednesday, January 12, 2022

Digest for comp.lang.c++@googlegroups.com - 12 updates in 4 topics

Muttley@dastardlyhq.com: Jan 12 04:34PM

I'm curious as to why returning a reference to a local inside an inline
function isn't (apparently) allowed. eg:
 
#include <iostream>
#include <string>
 
using namespace std;
 
inline string &func()
{
string s = "hello";
return s;
}
 
 
 
int main()
{
cout << func() << endl;
return 0;
}
 
This causes a compilation warning with clang and when run prints out garbage
as you'd expect if it were a non inline. However surely if the function is
truly inline it should work. Are locals for inlines temporaries created on the
heap instead of being stored on the same stack as the calling function locals?
Bonita Montero <Bonita.Montero@gmail.com>: Jan 12 05:38PM +0100

> as you'd expect if it were a non inline. However surely if the function is
> truly inline it should work. Are locals for inlines temporaries created on the
> heap instead of being stored on the same stack as the calling function locals?
 
Yes, this is UB because the storage of the function and sometimes
the object you return get destructed. Make it thread_local and then
you can return the object from the function and the function can be
called from any thread. Make it static and lock any operations on
it if you want to have a shared object.
James Kuyper <jameskuyper@alumni.caltech.edu>: Jan 12 12:47PM -0500

> as you'd expect if it were a non inline. However surely if the function is
> truly inline it should work. Are locals for inlines temporaries created on the
> heap instead of being stored on the same stack as the calling function locals?
 
Calls to functions declared "inline" have the same semantics as they
would have if not so declared, they can simply be optimized by replacing
the function call with local code, but only if that local code has the
SAME semantics. The following code is equivalent to what the compile is
allowed to do with inline functions. To make this work, I need to
explicitly declare a variable that corresponds to the return value from
the function, but I'm going to have to change it to a pointer rather
than a reference, because this re-write wouldn't work with a reference.
For a great many purposes, references are semantically equivalent to
const pointers, the differences are mainly syntactic. However, for
reasons that will hopefully be clear when you think about it, I can't
use a const pointer OR a real reference in the following re-write:
 
int main()
{
string *tmp;
 
{
string s = "hello";
tmp = &s;
}
 
cout << *tmp << endl;
return 0;
}
 
The object `s` in func() and the object `s` in my re-write both have
lifetimes that end when the `}` that terminates the enclosing block is
reached. Dereferencing tmp at any time after the lifetime of the object
it refers to has ended has undefined behavior, for the same reason that
using the reference returned by func() has undefined behavior. So both
pieces of code are bad for the same reason.
 
I suspect that you thought that the inlined function code would not be
treated as if it were enclosed in a separate block, but that would
change the semantics. Objects declared local to the inlined function are
NOT treated as if they were declared in the block that contains the call
to that function - if they were, that would make them much harder to use.
"Öö Tiib" <ootiib@hot.ee>: Jan 11 09:54PM -0800

On Tuesday, 11 January 2022 at 19:29:19 UTC+2, Bonita Montero wrote:
> any kind of mis-formatted UTF-8-string. I get the number of chars
> preceding a header-char from the table and check if there are an
> according number of 0x80-headered chars.
 
Yes, your code indeed detects some possible errors in encoding,
but it does not detect overlong encodings and sequences that
decode to an invalid code point. So both are only safe to use with
guaranteed clean data and with it the outcome is same.
Bonita Montero <Bonita.Montero@gmail.com>: Jan 12 02:19PM +0100

The ultimately simple solution:
 
size_t utf8Strlen( char const *str )
{
static size_t const sizes[8] = { 1, 0, 2, 3, 4, 0, 0, 0 };
size_t len = 0;
for( unsigned char c; (c = *str); )
{
size_t size = sizes[countl_zero<unsigned char>( ~c )];
if( !sizes ) [[unlikely]]
return -1;
++len;
for( char const *cpEnd = str + size; ++str != cpEnd; )
if( ((unsigned char)*str & 0x0C0) != 0x080 ) [[unlikely]]
return -1;
}
return len;
}
Bonita Montero <Bonita.Montero@gmail.com>: Jan 12 02:20PM +0100

sizes instead of size; corrected:
 
size_t utf8Strlen( char const *str )
{
static size_t const sizes[8] = { 1, 0, 2, 3, 4, 0, 0, 0 };
size_t len = 0;
for( unsigned char c; (c = *str); )
{
size_t size = sizes[countl_zero<unsigned char>( ~c )];
if( !size ) [[unlikely]]
return -1;
++len;
for( char const *cpEnd = str + size; ++str != cpEnd; )
if( ((unsigned char)*str & 0x0C0) != 0x080 ) [[unlikely]]
return -1;
}
return len;
}
Ben Bacarisse <ben.usenet@bsb.me.uk>: Jan 12 02:14PM

> }
> return len;
> }
 
You reject simpler solutions because they don't detect the one error you
have decided to look for. There are other encoding errors that this
code won't catch. Seems a bit arbitrary to me. I'd want a fast
utf8-strlen for external strings that have been validated and for
strings internally generated by valid code, or a slower one that caught
all invalid encodings.
 
--
Ben.
Bonita Montero <Bonita.Montero@gmail.com>: Jan 12 05:20PM +0100

Am 12.01.2022 um 15:14 schrieb Ben Bacarisse:
>> }
 
> You reject simpler solutions because they don't detect the one error you
> have decided to look for. ...
 
I think that's ok
 
> There are other encoding errors that this code won't catch.
 
Not at the UTF-8 level.
"james...@alumni.caltech.edu" <jameskuyper@alumni.caltech.edu>: Jan 12 08:38AM -0800

On Wednesday, January 12, 2022 at 11:20:55 AM UTC-5, Bonita Montero wrote:
> Am 12.01.2022 um 15:14 schrieb Ben Bacarisse:
...
> > There are other encoding errors that this code won't catch.
> Not at the UTF-8 level.
 
"... it does not detect overlong encodings and sequences that decode to an
invalid code point." To what level do you assign those errors, if not at the
UTF-8 level? It's the specification of UTF-8 itself which identifies those as
errors.
Ben Bacarisse <ben.usenet@bsb.me.uk>: Jan 12 04:55PM

>> You reject simpler solutions because they don't detect the one error you
>> have decided to look for. ...
 
> I think that's ok
 
Obviously.
 
>> There are other encoding errors that this code won't catch.
 
> Not at the UTF-8 level.
 
Not so.
 
--
Ben.
bleachbot <bleachbot@httrack.com>: Jan 12 12:20PM +0100

FBInCIAnNSATerroristSlayer <FBInCIAnNSATerroristSlayer@yahoo.com>: Jan 12 03:19AM -0800

$1 MILLION for KILLING FAMILIES of FBI, CIA and NSA DIRECTORS - offered
by SOME BRAVE AMERICAN HERO.
 
 
This dude really is BRAVE to PUBLICLY RAPE, THREATEN and HUMILIATE the
EVIL WHITE CHRISTIAN PSYCHOPATHS at CIA NSA FBI DHS Shadow US Govt,
which POTUS, Senators n Congressmen LIVE IN FEAR OF.
 
Not a single whitefuck has balls to do ANYTHING REMOTELY close to what
this HERO did, and PUT FEAR in CIA FBI NSA DHS Psychopaths for 21 years.
 
 
 
 
$1 MILLION for MAIMING and TORTURING FAMILIES of FBI, CIA and NSA DIRECTORS
https://groups.google.com/d/msg/rec.sport.cricket/s7IkvLLoFw8/YUJkJXWuLXQJ
I will Pay $1 Million to anyone for MAIMING and TORTURING the FAMILIES of
FBI, CIA and NSA Directors Robert Mueller III, Leon Panetta and Keith
Alexander
respectively regardless of their age.
 
I will increase the BOUNTY by $10,000 per EACH DAY you TORTURE the
FAMILIES of these EVIL AMERICAN CHRISTIAN ANIMALS.
 
SHOW NO MERCY.
 
American Christian Terrorists are the MOST EVIL ANIMALS in the annals of
humanity. These EVIL BASTARDS at CIA, FBI and NSA have been getting
away with MURDERS, RAPES, PAEDOPHILIA, EXTORTIONS, SADISM,
PERVERSIONS and KIDNAPINGS for DECADES with their CUNNING,
DECEPTION and FORKED TONGUES they are BORN WITH.
 
330 mil American MORONS think they have FREEDOMS and CIVIL
LIBERTIES because their government (CIA and NSA) gave them the
RIGHT to eat and drink like a PIG and FUCK any willing organism.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: