- alignment and endian issues - 10 Updates
- regex_(search)|(match) with repetition operators - 5 Updates
- Alternatives to Visual Studio 2015 or later - 3 Updates
- [Jesus Loves You] Confirmation system - 1 Update
wyniijj@gmail.com: Apr 17 02:08AM -0700 Can is_valid2(..) replace is_valid(..)? I'm concerned about alignment and endian issues on different CPU. // data always points to character sequence of length >=8 bool is_valid(const char* data) { return (!(data[0]&'\x80'))&& (!(data[1]&'\x80'))&& (!(data[2]&'\x80'))&& (!(data[3]&'\x80'))&& (!(data[4]&'\x80'))&& (!(data[5]&'\x80'))&& (!(data[6]&'\x80'))&& (!(data[7]&'\x80')); }; bool is_valid2(const char* data) { return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L; }; |
David Brown <david.brown@hesbynett.no>: Apr 17 11:20AM +0200 > { > return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L; > }; Endian issues are not going to be a problem. C and C++ allow for a lot of flexibility in the representation of integer types, but uint64_t (and similar types) are far stricter. But alignment /will/ be a problem on some platforms. Some cpus are happy with a non-aligned access, others are not. Even on platforms which are mostly happy (such as x86), some instructions (certain SIMD operations) require strict alignment. So unless you are sure that "data" is 8-byte aligned, you risk problems. Also, I think, you have your logic inverted somewhere. But that should be easily solved by an extra cup of coffee for whichever one of us has got it wrong :-) |
wyniijj@gmail.com: Apr 17 02:22AM -0700 wyn...@gmail.com於 2018年4月17日星期二 UTC+8下午5時08分55秒寫道: > { > return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L; > }; // Correction: is_valid2(..) bool is_valid2(const char* data) { return !*reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L; }; And similarly, if data points to a 4-character sequence, can it be interpreted to uint32_t in this very similar function? |
Paavo Helde <myfirstname@osa.pri.ee>: Apr 17 01:29PM +0300 > { > return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L; > }; Alignment mismatch would be a real danger on some platforms. What about this replacement which is also basically one-liner and does not suffer from alignment issues: bool is_valid(const char* data) { return std::find_if(data, data+8, [](char c) {return c&'\x80';})==data+8; } |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Apr 17 11:29AM +0100 On Tue, 17 Apr 2018 11:20:18 +0200 > Also, I think, you have your logic inverted somewhere. But that should > be easily solved by an extra cup of coffee for whichever one of us has > got it wrong :-) is_valid2() is technically undefined behaviour unless the object pointed to by the 'data' argument began life as a uint64_t object, otherwise dereferencing the return value of the reinterpret_cast expression breaches the strict aliasing rules. However in practice that doesn't matter unless is_valid2() is an inline function. If its definition is in a different translation unit, the compiler cannot deduce its dynamic type anyway so you are fine. If you want the operation to be done inline then the bullet-proof and standard conforming approach is to memcpy() the 8 bytes of 'data' into a uint64_t object and bitwise-and that. The compiler will optimize out the memcpy() and produce optimal code if 'data' was correctly aligned and isn't mutated; if not it will at least end up correctly aligned. An alternative to type pun through a union and rely on gcc's and clang's language extension which allows this. Chris |
David Brown <david.brown@hesbynett.no>: Apr 17 01:38PM +0200 On 17/04/18 12:29, Chris Vine wrote: > pointed to by the 'data' argument began life as a uint64_t object, > otherwise dereferencing the return value of the reinterpret_cast > expression breaches the strict aliasing rules. I don't know the details of the C++ standard well enough to know about that. > uint64_t object and bitwise-and that. The compiler will optimize out > the memcpy() and produce optimal code if 'data' was correctly aligned > and isn't mutated; if not it will at least end up correctly aligned. Agreed. > An > alternative to type pun through a union and rely on gcc's and clang's > language extension which allows this. I don't see how that could work without having either aliasing or alignment problems. Maybe using "packed" and "may_alias" attributes would help. But the memcpy seems simpler. |
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Apr 17 01:04PM +0100 On Tue, 17 Apr 2018 13:38:41 +0200 > I don't see how that could work without having either aliasing or > alignment problems. Maybe using "packed" and "may_alias" attributes > would help. But the memcpy seems simpler. Constructing a union would have the same effect as memcpy() in practice. If alignment is correct then construction of the union can be elided. Otherwise the type punning union will have to be constructed on the stack in which case it is obliged to have the correct alignment for all its members. It doesn't have an aliasing problem because the gcc/clang language extension says it doesn't (see also the sixth bullet of §3.10/10 of the C++ standard). memcpy() is stupendously fast on modern hardward, being an "intrinsic" (VS) or "built-in" (gcc/clang) which where relevant will do a direct memory blit rather than have effect as a function call. And because it is a built-in it can (and will) be trivially elided if not necessary, as in the case of is_valid2(). Given that memcpy() is standard conforming and a union relies on an extension I would go for the former. I would wager that when tested it will turn out considerably faster than is_valid() and at least as fast as is_valid2() with a reinterpret_cast. Measurement by the OP is easy here and will reveal all. Chris |
"Öö Tiib" <ootiib@hot.ee>: Apr 17 05:16AM -0700 On Tuesday, 17 April 2018 12:20:29 UTC+3, David Brown wrote: > Also, I think, you have your logic inverted somewhere. But that should > be easily solved by an extra cup of coffee for whichever one of us has > got it wrong :-) To me it seemed that (either coffee or) the usual comparison trick to check that all bits were set was missing: return (*reinterpret_cast<const uint64_t*>(data)&0x8080808080808080ULL) == 0x8080808080808080LLU; |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 17 02:35PM +0200 > { > return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L; > }; If a byte is 8 bits and the data is properly aligned so that you avoid UB for that, the two functions still compute two different things. Consider an input of all zeroes. `is_valid` then returns true while `is_valid2` returns false. Cheers & hth., - Alf |
Barry Schwarz <schwarzb@dqel.com>: Apr 17 09:35AM -0700 >}; >And similarly, if data points to a 4-character sequence, can it be >interpreted to uint32_t in this very similar function? Since ! has higher precedence than &, isn't the expression evaluated as !(*reinterpret_cast<const uint64_t*>(data)) & 0x8080808080808080L and since ! evaluates to 0 or 1 won't this always return 0? After resolving the alignment issue, you would need !(*reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L) You did not tell us what valid means in this context. It appears you are looking for "normal" characters but you specifically raised the issue of different CPUs. If so, be aware there are systems that don't use ASCII (such as IBM mainframes that use EBCDIC) and on such systems normal text like ABCD1234 would fail either test. -- Remove del for email |
Ralf Goertz <me@myprovider.invalid>: Apr 17 11:01AM +0200 Hi, is it possible to catch multiple matches with the repetition operators "*", "+" and "{,}"? #include <iostream> #include <regex> #include <string> using namespace std; int main() { string s("foobar"); regex r("([fb][ao][or]){2}"); smatch sm; if (regex_search(s,sm,r)) { for (auto i:sm) cout<<i<<endl; } } That program only gives foobar bar I would like to also catch the "foo" alone. Of course I could rewrite the regex but in my real world problem I don't know how many iterations there will be and I want to catch them all. Is there a way to do that? |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Apr 17 11:43AM +0100 > is it possible to catch multiple matches with the repetition operators > "*", "+" and "{,}"? No. These operators describe a single pattern to be searched for. When they apply to a pattern containing ()s the std:smatch just tells you something about how the pattern was matched -- what was last matched by that sub-expression. There is no provision to store the arbitrary number of matches that might result from one single sub-expression. > I would like to also catch the "foo" alone. Of course I could rewrite > the regex but in my real world problem I don't know how many iterations > there will be and I want to catch them all. Is there a way to do that? You will need to write a regexp that matches smallest part you want and match it repeatedly. Even then you might not get exactly what you want because that is never entirely clear from a single example. -- Ben. |
Ralf Goertz <me@myprovider.invalid>: Apr 17 02:41PM +0200 Am Tue, 17 Apr 2018 11:43:36 +0100 > matched by that sub-expression. > There is no provision to store the arbitrary number of matches that > might result from one single sub-expression. Well, that's a pity. > You will need to write a regexp that matches smallest part you want > and match it repeatedly. Even then you might not get exactly what > you want because that is never entirely clear from a single example. One of my real world example (there are many with differing complexity) is the following: some text 4.7 ( 2.3 ) 5.8 (6.2) 4.3 23.4 (2.9) I need the numbers after "some text". There can be any number of numbers but they come in pairs with the second parenthesized. However, that second number is optional. So my regex looks something like (([0-9.]+) +(\( *([0-9.]+) *\))?)+$ (of course there is potential for improvement since I don't want the number to start or end with a "." and there should only be one "." in each number) Of course here I could use a regex without the trailing "+" and match repeatedly. And in all my other use cases I could probably use other tricks. But having many different scenarios where it would be beneficial to be able to match repetitive patterns I wonder why it isn't possible. |
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Apr 17 04:05PM +0200 On 17.04.2018 14:41, Ralf Goertz wrote: > repeatedly. And in all my other use cases I could probably use other > tricks. But having many different scenarios where it would be beneficial > to be able to match repetitive patterns I wonder why it isn't possible. Maybe invent a higher level pattern matching language? Cheers!, - Alf |
Ben Bacarisse <ben.usenet@bsb.me.uk>: Apr 17 03:55PM +0100 >> There is no provision to store the arbitrary number of matches that >> might result from one single sub-expression. > Well, that's a pity. Yes, it can be useful but it's not widely supported. I imagine that's because it often clearer to do repeated matching and it would complicate getting the results from otherwise simple patterns (though I suppose it could be an option). You can do it in Python with named captures using the extended (3rd-party) regex module. <snip> > (([0-9.]+) +(\( *([0-9.]+) *\))?)+$ > Of course here I could use a regex without the trailing "+" and match > repeatedly. Yes, that's probably what you'll have to do, though in this case you could just use sscanf or >> (again, in a loop). > And in all my other use cases I could probably use other > tricks. You might be able to generalise the loop into a function so that all the cases are done in essentially the same way but that's impossible to tell from here. -- Ben. |
gazelle@shell.xmission.com (Kenny McCormack): Apr 16 09:38PM In article <fa58d0b4-1ae4-41d2-be08-f4b1bb5bc7d5@googlegroups.com>, >Does anybody have any tools they use that are similar in ability >to compile and debug C++ projects, but do so faster than Visual >Studio? How about RDC? Oh, I just remembered. RDC is (and always will be) vaporware. You want something that actually exists, right? -- "Unattended children will be given an espresso and a free kitten." |
gazelle@shell.xmission.com (Kenny McCormack): Apr 16 09:51PM In article <8b8ba1fe-2a0c-4ce8-96b1-02bcdea6e0a0@googlegroups.com>, Rick C. Hodgin <rick.c.hodgin@gmail.com> wrote: ... >I keep looking, but I'm thinking CAlive is still my best bet, >assuming I can ever get it started. :-) FIFY -- Those on the right constantly remind us that America is not a democracy; now they claim that Obama is a threat to democracy. |
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Apr 16 06:08PM -0400 On 4/16/2018 5:38 PM, Kenny McCormack wrote: >> to compile and debug C++ projects, but do so faster than Visual >> Studio? > How about RDC? Wouldn't that be great? :-) > Oh, I just remembered. RDC is (and always will be) vaporware. My life isn't over yet, Kenny. Your statement is speculation at best. > You want something that actually exists, right? Yes. I haven't found anything. I'm still stuck with VS 2015. -- Rick C. Hodgin |
"Chris M. Thomasson" <invalid_chris_thomasson@invalid.invalid>: Apr 16 02:19PM -0700 On 4/15/2018 4:46 PM, Rick C. Hodgin wrote: > Still a different website, one that's obviously visible. > And, it still conveys the message, "Jesus loves you, will > forgive your sin." Fair enough. Just be aware of anything like that. Think if somebody registered a real website with a close enough url on purpose, well, then it just might not be so obvious. Now, think about it... For somebody to actual do that, well, imvvho, that would mean a real, sort of demonic entity is actively chasing you. Humm... If you have your own server, just do an HMAC and keep the secret key encrypted in on your own server with another key that only you can remember. The stored encrypted key that will ultimately be used to sign public plaintexts can be very large, but the key that you use to decrypt it can be smaller. Do not store this smaller key on any device. Keep it in your brain. Create a plaintext, and send it to your server. When your own server sends you a first pass confirmation, you send in your personal key to decrypt the private key that you use to HMAC the public plaintexts with. Then the server sends you the plaintext and the HMAC derived from the decrypted key in a single message. If you manage to buy up some close possible fake urls, and keep them for yourself, well, this should be fairly secure. Well, humm... Think about people directing users into false websites by hijacking their routers? Humm... Like you said, it might be unlikely that somebody would through all of the trouble to break your setup. They would have to forge the url into a site that looks the same. |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment