Tuesday, April 17, 2018

Digest for comp.lang.c++@googlegroups.com - 19 updates in 4 topics

wyniijj@gmail.com: Apr 17 02:08AM -0700

Can is_valid2(..) replace is_valid(..)?
I'm concerned about alignment and endian issues on different CPU.
 
// data always points to character sequence of length >=8
bool is_valid(const char* data)
{
return (!(data[0]&'\x80'))&&
(!(data[1]&'\x80'))&&
(!(data[2]&'\x80'))&&
(!(data[3]&'\x80'))&&
(!(data[4]&'\x80'))&&
(!(data[5]&'\x80'))&&
(!(data[6]&'\x80'))&&
(!(data[7]&'\x80'));
};
 
bool is_valid2(const char* data)
{
return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
};
David Brown <david.brown@hesbynett.no>: Apr 17 11:20AM +0200

> {
> return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
> };
 
Endian issues are not going to be a problem. C and C++ allow for a lot
of flexibility in the representation of integer types, but uint64_t (and
similar types) are far stricter.
 
But alignment /will/ be a problem on some platforms. Some cpus are
happy with a non-aligned access, others are not. Even on platforms
which are mostly happy (such as x86), some instructions (certain SIMD
operations) require strict alignment.
 
So unless you are sure that "data" is 8-byte aligned, you risk problems.
 
 
Also, I think, you have your logic inverted somewhere. But that should
be easily solved by an extra cup of coffee for whichever one of us has
got it wrong :-)
wyniijj@gmail.com: Apr 17 02:22AM -0700

wyn...@gmail.com於 2018年4月17日星期二 UTC+8下午5時08分55秒寫道:
> {
> return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
> };
 
// Correction: is_valid2(..)
bool is_valid2(const char* data)
{
return !*reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
};
 
And similarly, if data points to a 4-character sequence, can it be
interpreted to uint32_t in this very similar function?
Paavo Helde <myfirstname@osa.pri.ee>: Apr 17 01:29PM +0300

> {
> return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
> };
 
Alignment mismatch would be a real danger on some platforms.
 
What about this replacement which is also basically one-liner and does
not suffer from alignment issues:
 
bool is_valid(const char* data) {
return std::find_if(data, data+8,
[](char c) {return c&'\x80';})==data+8;
}
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Apr 17 11:29AM +0100

On Tue, 17 Apr 2018 11:20:18 +0200
 
> Also, I think, you have your logic inverted somewhere. But that should
> be easily solved by an extra cup of coffee for whichever one of us has
> got it wrong :-)
 
is_valid2() is technically undefined behaviour unless the object
pointed to by the 'data' argument began life as a uint64_t object,
otherwise dereferencing the return value of the reinterpret_cast
expression breaches the strict aliasing rules. However in practice
that doesn't matter unless is_valid2() is an inline function. If its
definition is in a different translation unit, the compiler cannot
deduce its dynamic type anyway so you are fine.
 
If you want the operation to be done inline then the bullet-proof and
standard conforming approach is to memcpy() the 8 bytes of 'data' into a
uint64_t object and bitwise-and that. The compiler will optimize out
the memcpy() and produce optimal code if 'data' was correctly aligned
and isn't mutated; if not it will at least end up correctly aligned. An
alternative to type pun through a union and rely on gcc's and clang's
language extension which allows this.
 
Chris
David Brown <david.brown@hesbynett.no>: Apr 17 01:38PM +0200

On 17/04/18 12:29, Chris Vine wrote:
> pointed to by the 'data' argument began life as a uint64_t object,
> otherwise dereferencing the return value of the reinterpret_cast
> expression breaches the strict aliasing rules.
 
I don't know the details of the C++ standard well enough to know about that.
 
> uint64_t object and bitwise-and that. The compiler will optimize out
> the memcpy() and produce optimal code if 'data' was correctly aligned
> and isn't mutated; if not it will at least end up correctly aligned.
 
Agreed.
 
> An
> alternative to type pun through a union and rely on gcc's and clang's
> language extension which allows this.
 
I don't see how that could work without having either aliasing or
alignment problems. Maybe using "packed" and "may_alias" attributes
would help. But the memcpy seems simpler.
 
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Apr 17 01:04PM +0100

On Tue, 17 Apr 2018 13:38:41 +0200
 
> I don't see how that could work without having either aliasing or
> alignment problems. Maybe using "packed" and "may_alias" attributes
> would help. But the memcpy seems simpler.
 
Constructing a union would have the same effect as memcpy() in
practice. If alignment is correct then construction of the union can
be elided. Otherwise the type punning union will have to be
constructed on the stack in which case it is obliged to have the
correct alignment for all its members. It doesn't have an aliasing
problem because the gcc/clang language extension says it doesn't (see
also the sixth bullet of §3.10/10 of the C++ standard).
 
memcpy() is stupendously fast on modern hardward, being an
"intrinsic" (VS) or "built-in" (gcc/clang) which where relevant will do
a direct memory blit rather than have effect as a function call. And
because it is a built-in it can (and will) be trivially elided if not
necessary, as in the case of is_valid2().
 
Given that memcpy() is standard conforming and a union relies on an
extension I would go for the former. I would wager that when tested it
will turn out considerably faster than is_valid() and at least as fast
as is_valid2() with a reinterpret_cast. Measurement by the OP is easy
here and will reveal all.
 
Chris
"Öö Tiib" <ootiib@hot.ee>: Apr 17 05:16AM -0700

On Tuesday, 17 April 2018 12:20:29 UTC+3, David Brown wrote:
 
> Also, I think, you have your logic inverted somewhere. But that should
> be easily solved by an extra cup of coffee for whichever one of us has
> got it wrong :-)
 
To me it seemed that (either coffee or) the usual comparison trick to
check that all bits were set was missing:

return (*reinterpret_cast<const uint64_t*>(data)&0x8080808080808080ULL)
== 0x8080808080808080LLU;
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 17 02:35PM +0200

> {
> return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
> };
 
If a byte is 8 bits and the data is properly aligned so that you avoid
UB for that, the two functions still compute two different things.
 
Consider an input of all zeroes. `is_valid` then returns true while
`is_valid2` returns false.
 
Cheers & hth.,
 
- Alf
Barry Schwarz <schwarzb@dqel.com>: Apr 17 09:35AM -0700

>};
 
>And similarly, if data points to a 4-character sequence, can it be
>interpreted to uint32_t in this very similar function?
 
Since ! has higher precedence than &, isn't the expression evaluated
as
!(*reinterpret_cast<const uint64_t*>(data)) & 0x8080808080808080L
and since ! evaluates to 0 or 1 won't this always return 0?
 
After resolving the alignment issue, you would need
!(*reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L)
 
You did not tell us what valid means in this context. It appears you
are looking for "normal" characters but you specifically raised the
issue of different CPUs. If so, be aware there are systems that don't
use ASCII (such as IBM mainframes that use EBCDIC) and on such systems
normal text like ABCD1234 would fail either test.
 
--
Remove del for email
Ralf Goertz <me@myprovider.invalid>: Apr 17 11:01AM +0200

Hi,
 
is it possible to catch multiple matches with the repetition operators
"*", "+" and "{,}"?
 
#include <iostream>
#include <regex>
#include <string>
 
using namespace std;
 
int main() {
string s("foobar");
regex r("([fb][ao][or]){2}");
smatch sm;
if (regex_search(s,sm,r)) {
for (auto i:sm) cout<<i<<endl;
}
}
 
That program only gives
 
foobar
bar
 
I would like to also catch the "foo" alone. Of course I could rewrite
the regex but in my real world problem I don't know how many iterations
there will be and I want to catch them all. Is there a way to do that?
Ben Bacarisse <ben.usenet@bsb.me.uk>: Apr 17 11:43AM +0100


> is it possible to catch multiple matches with the repetition operators
> "*", "+" and "{,}"?
 
No. These operators describe a single pattern to be searched for.
 
When they apply to a pattern containing ()s the std:smatch just tells
you something about how the pattern was matched -- what was last matched
by that sub-expression.
 
There is no provision to store the arbitrary number of matches that
might result from one single sub-expression.
 
 
> I would like to also catch the "foo" alone. Of course I could rewrite
> the regex but in my real world problem I don't know how many iterations
> there will be and I want to catch them all. Is there a way to do that?
 
You will need to write a regexp that matches smallest part you want and
match it repeatedly. Even then you might not get exactly what you want
because that is never entirely clear from a single example.
 
--
Ben.
Ralf Goertz <me@myprovider.invalid>: Apr 17 02:41PM +0200

Am Tue, 17 Apr 2018 11:43:36 +0100
> matched by that sub-expression.
 
> There is no provision to store the arbitrary number of matches that
> might result from one single sub-expression.
 
Well, that's a pity.
 
 
> You will need to write a regexp that matches smallest part you want
> and match it repeatedly. Even then you might not get exactly what
> you want because that is never entirely clear from a single example.
 
One of my real world example (there are many with differing complexity)
is the following:
 
some text 4.7 ( 2.3 ) 5.8 (6.2) 4.3 23.4 (2.9)
 
 
I need the numbers after "some text". There can be any number of numbers
but they come in pairs with the second parenthesized. However, that second
number is optional. So my regex looks something like
 
(([0-9.]+) +(\( *([0-9.]+) *\))?)+$
 
(of course there is potential for improvement since I don't want the
number to start or end with a "." and there should only be one "." in
each number)
 
Of course here I could use a regex without the trailing "+" and match
repeatedly. And in all my other use cases I could probably use other
tricks. But having many different scenarios where it would be beneficial
to be able to match repetitive patterns I wonder why it isn't possible.
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 17 04:05PM +0200

On 17.04.2018 14:41, Ralf Goertz wrote:
> repeatedly. And in all my other use cases I could probably use other
> tricks. But having many different scenarios where it would be beneficial
> to be able to match repetitive patterns I wonder why it isn't possible.
 
Maybe invent a higher level pattern matching language?
 
Cheers!,
 
- Alf
Ben Bacarisse <ben.usenet@bsb.me.uk>: Apr 17 03:55PM +0100


>> There is no provision to store the arbitrary number of matches that
>> might result from one single sub-expression.
 
> Well, that's a pity.
 
Yes, it can be useful but it's not widely supported. I imagine that's
because it often clearer to do repeated matching and it would complicate
getting the results from otherwise simple patterns (though I suppose it
could be an option).
 
You can do it in Python with named captures using the extended
(3rd-party) regex module.
 
<snip>
 
> (([0-9.]+) +(\( *([0-9.]+) *\))?)+$
 
> Of course here I could use a regex without the trailing "+" and match
> repeatedly.
 
Yes, that's probably what you'll have to do, though in this case you
could just use sscanf or >> (again, in a loop).
 
> And in all my other use cases I could probably use other
> tricks.
 
You might be able to generalise the loop into a function so that all
the cases are done in essentially the same way but that's impossible to
tell from here.
 
--
Ben.
gazelle@shell.xmission.com (Kenny McCormack): Apr 16 09:38PM

In article <fa58d0b4-1ae4-41d2-be08-f4b1bb5bc7d5@googlegroups.com>,
>Does anybody have any tools they use that are similar in ability
>to compile and debug C++ projects, but do so faster than Visual
>Studio?
 
How about RDC?
 
Oh, I just remembered. RDC is (and always will be) vaporware.
 
You want something that actually exists, right?
 
--
"Unattended children will be given an espresso and a free kitten."
gazelle@shell.xmission.com (Kenny McCormack): Apr 16 09:51PM

In article <8b8ba1fe-2a0c-4ce8-96b1-02bcdea6e0a0@googlegroups.com>,
Rick C. Hodgin <rick.c.hodgin@gmail.com> wrote:
...
>I keep looking, but I'm thinking CAlive is still my best bet,
>assuming I can ever get it started. :-)
 
FIFY
--
Those on the right constantly remind us that America is not a
democracy; now they claim that Obama is a threat to democracy.
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Apr 16 06:08PM -0400

On 4/16/2018 5:38 PM, Kenny McCormack wrote:
>> to compile and debug C++ projects, but do so faster than Visual
>> Studio?
 
> How about RDC?
 
Wouldn't that be great? :-)
 
> Oh, I just remembered. RDC is (and always will be) vaporware.
 
My life isn't over yet, Kenny. Your statement is speculation at best.
 
> You want something that actually exists, right?
 
Yes. I haven't found anything. I'm still stuck with VS 2015.
 
--
Rick C. Hodgin
"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Apr 16 02:19PM -0700

On 4/15/2018 4:46 PM, Rick C. Hodgin wrote:
 
> Still a different website, one that's obviously visible.
> And, it still conveys the message, "Jesus loves you, will
> forgive your sin."
 
Fair enough. Just be aware of anything like that. Think if somebody
registered a real website with a close enough url on purpose, well, then
it just might not be so obvious. Now, think about it... For somebody to
actual do that, well, imvvho, that would mean a real, sort of demonic
entity is actively chasing you. Humm...
 
If you have your own server, just do an HMAC and keep the secret key
encrypted in on your own server with another key that only you can
remember. The stored encrypted key that will ultimately be used to sign
public plaintexts can be very large, but the key that you use to decrypt
it can be smaller. Do not store this smaller key on any device. Keep it
in your brain.
 
Create a plaintext, and send it to your server. When your own server
sends you a first pass confirmation, you send in your personal key to
decrypt the private key that you use to HMAC the public plaintexts with.
Then the server sends you the plaintext and the HMAC derived from the
decrypted key in a single message. If you manage to buy up some close
possible fake urls, and keep them for yourself, well, this should be
fairly secure.
 
Well, humm... Think about people directing users into false websites by
hijacking their routers?
 
Humm... Like you said, it might be unlikely that somebody would through
all of the trouble to break your setup. They would have to forge the url
into a site that looks the same.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: