Thursday, October 5, 2017

Digest for comp.lang.c++@googlegroups.com - 23 updates in 3 topics

Daniel <danielaparker@gmail.com>: Oct 04 08:34PM -0700

I'd like to define
 
using bytes_view = std::basic_string_view<uint8_t>;
 
and have it work cross platform.
 
It compiles with vs2015. But do I need to worry if the specialization
std::char_traits<uin8_t> always exists? Or would it be safer to
define my own character traits?
 
Thanks,
Daniel
"Öö Tiib" <ootiib@hot.ee>: Oct 04 10:59PM -0700

On Thursday, 5 October 2017 06:34:34 UTC+3, Daniel wrote:
 
> It compiles with vs2015. But do I need to worry if the specialization
> std::char_traits<uin8_t> always exists? Or would it be safer to
> define my own character traits?
 
Theoretically there can be a platform where uin8_t does not exist but
in practice I don't think that any such platform has C++ compiler.
When there is uint8_t then it is either unsigned char or some "extended
unsigned integer type".
 
IIRC standard requires char_traits only for char, wchar_t, char16_t
and char32_t. So classes that need char_traits
(like std::basic_fstream<uin8_t> or std::basic_string_view<uint8_t>
or std::basic_string<uin8_t>) are not required to work.
 
Why you want to use uint8_t for text?
Daniel <danielaparker@gmail.com>: Oct 05 06:35AM -0700

On Thursday, October 5, 2017 at 1:59:18 AM UTC-4, Öö Tiib wrote:
> unsigned integer type".
 
> IIRC standard requires char_traits only for char, wchar_t, char16_t
> and char32_t.
 
I guess that means it's not required for "unsigned char" or "signed char"
either.
 
> (like std::basic_fstream<uin8_t> or std::basic_string_view<uint8_t>
> or std::basic_string<uin8_t>) are not required to work.
 
> Why you want to use uint8_t for text?
 
What is text :-)
 
The use is for binary strings, which would be written as base64 for
JSON, or the bytes themselves for CBOR.
 
Daniel
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>: Oct 05 04:24PM +0200

On 10/5/2017 7:59 AM, Öö Tiib wrote:
>> define my own character traits?
 
> Theoretically there can be a platform where uin8_t does not exist but
> in practice I don't think that any such platform has C++ compiler.
 
The usual example of CHAR_BIT > 8 has been Texas Instruments digital
signal processors, with CHAR_BIT = 16, and C++ compilers.
 
I guess one could check for existence via UINT8_MAX macro.
 
 
Cheers!,
 
- Alf
"James R. Kuyper" <jameskuyper@verizon.net>: Oct 05 12:11PM -0400

On 2017-10-05 09:35, Daniel wrote:
> On Thursday, October 5, 2017 at 1:59:18 AM UTC-4, Öö Tiib wrote:
...
>> and char32_t.
 
> I guess that means it's not required for "unsigned char" or "signed char"
> either.
 
Correct. In particular, std::char_traits<uint8_t> will exist only if
uint8_t is a typedef for char; if it's a typedef for unsigned char, that
specialization will not exist. That's true even if char is an unsigned
type: char, unsigned char, and signed char are always three distinct
types, even though char is required to represent the same rante of
values as one of the other two types.
 
>> or std::basic_string<uin8_t>) are not required to work.
 
>> Why you want to use uint8_t for text?
 
> What is text :-)
 
Text is the purpose for which std::basic_string<> was created. If you're
just looking for an array of uint8_t, then use one of the other standard
containers, such as std::vector<uint8_t>. If there's any feature that
std::basic_string<> has, which isn't shared by any of the other
standard containers, and you need to make use of that feature, then what
you're working with probably is text, in some sense.
Daniel <danielaparker@gmail.com>: Oct 05 10:23AM -0700

On Thursday, October 5, 2017 at 12:11:31 PM UTC-4, James R. Kuyper wrote:
> std::basic_string<> has, which isn't shared by any of the other
> standard containers, and you need to make use of that feature, then what
> you're working with probably is text, in some sense.
 
The question was about the need for a bytes_view, for which there's nothing
in the standard library, and whether it was sensible or stupid to base it on
std::basic_string_view<uint8_t,?>. I'm leaning towards stupid :-)
 
Regarding text, as far as I can tell, std::basic_string<> doesn't really
offer much more than a sequence container of 8, 16 or 32 bit items, with the
additional favour of appending a zero with c_str(), with no text semantics
except when they coincide with the usual operations on fixed sized items in
a sequence container, at least for the default definitions of
std::char_traits<>. In practice people seem to either (1) use std::string to
hold utf-8 octets, using the member functions when they make sense, and for
the rest, using extra functions for determining length in characters
(codepoints), iterating over characters (codepoints), etc. Or
(2), what you see sometimes on Windows platforms, using std::wstring to hold
utf-16 units and using extra functions.
 
Daniel
"James R. Kuyper" <jameskuyper@verizon.net>: Oct 05 02:18PM -0400

On 2017-10-05 13:23, Daniel wrote:
> std::basic_string_view<uint8_t,?>. I'm leaning towards stupid :-)
 
> Regarding text, as far as I can tell, std::basic_string<> doesn't really
> offer much more than a sequence container of 8, 16 or 32 bit items, with the
 
basic_string has no such restriction on the sizes of the things it can
contain. It has implementation-provided specializations for char,
wchar_t, char16_t and char32_t, but there's no requirement that either
of those first two types have a size that matches any of the three sizes
you've listed. And you can specialize for any user-defined non-array POD
type, as long as you provide a specialization of char_traits<> for that
same type which meets the requirements specified in 21.2.1.
 
> additional favour of appending a zero with c_str(),
 
In the general case, that's charT() (21.4.5p2) which is not necessarily
zero.
 
> except when they coincide with the usual operations on fixed sized items in
> a sequence container, at least for the default definitions of
> std::char_traits<>.
 
Most of the features that distinguish basic_string from other container
types are those listed in 21.4.7, which is, unsurprisingly, titled
"String operations". If you don't intend to use any of those operations,
you probably should use an ordinary container type.
"James R. Kuyper" <jameskuyper@verizon.net>: Oct 05 02:31PM -0400

On 2017-10-05 14:18, James R. Kuyper wrote:
> On 2017-10-05 13:23, Daniel wrote:
...
> wchar_t, char16_t and char32_t, but there's no requirement that either
> of those first two types have a size that matches any of the three sizes
> you've listed.
 
Actually, there's no such requirement for any of those types, char16_t
and char32_t are required to be typedefs for uint_least16_t and
uint_least32_t, respectively, which need not have a size of exactly 16
or 32 bits, respectively. It's extremely likely that each of those four
types will have one of those three sizes, but it's not a requirement.
Daniel <danielaparker@gmail.com>: Oct 05 12:22PM -0700

On Thursday, October 5, 2017 at 2:18:34 PM UTC-4, James R. Kuyper wrote:
 
> Most of the features that distinguish basic_string from other container
> types are those listed in 21.4.7, which is, unsurprisingly, titled
> "String operations".
 
Except for c_str(), and variants of the "string operations" that apply
to "null terminated" strings, it seems to me that all of those operations
would apply equally to CBOR binary strings. There are no text semantics.
std::string, for example, doesn't know about utf8, about continuation bytes,
even though that's often what it holds these days.
 
> If you don't intend to use any of those operations,
> you probably should use an ordinary container type.
 
Rather, my question was about the need for a bytes_view, for which there's
nothing in the standard library, and about the advisability or lack thereof
of basing one on std::basic_string_view<uint8_t,?>, I'm leaning towards no.
 
Daniel
"James R. Kuyper" <jameskuyper@verizon.net>: Oct 05 03:56PM -0400

On 2017-10-05 15:22, Daniel wrote:
 
> Except for c_str(), and variants of the "string operations" that apply
> to "null terminated" strings, it seems to me that all of those operations
> would apply equally to CBOR binary strings.
 
Everything I know about CBOR is from what I just read at
<https://en.wikipedia.org/wiki/CBOR>. Is that what you're referring to?
 
How and why would you want to apply any of the basic_string<>::find*()
member functions to CBOR binary strings? It would look at bytes that
contain the header, the payload, or the data, without discriminating
between them. I can't imagine why you'd want to use any facility on a
CBOR string that wasn't aware of the distinction between those parts of
the data format. Similarly, how and why would you want to use substr()?
 
I can imagine a use for a container that was aware of the CBOR format,
and which parsed the items in a CBOR string into actual data items. I
imagine that this container might internally use an array or a standard
container of uint8_t to work on the string. But why would any of
basic_string<>'s special capabilities be of any particular use for that
purpose? As I said before, one of the non-string oriented standard
containers would seem to be a better choice.
 
 
> Rather, my question was about the need for a bytes_view, for which there's
> nothing in the standard library, and about the advisability or lack thereof
> of basing one on std::basic_string_view<uint8_t,?>, I'm leaning towards no.
 
You haven't really explained anything about what bytes_view is supposed
to do, which makes it hard to answer that question. You indicated that
it has something to do with JSON and CBOR, which would incline me to
agree with your "no".
"Öö Tiib" <ootiib@hot.ee>: Oct 05 01:36PM -0700

On Thursday, 5 October 2017 22:22:55 UTC+3, Daniel wrote:
> would apply equally to CBOR binary strings. There are no text semantics.
> std::string, for example, doesn't know about utf8, about continuation bytes,
> even though that's often what it holds these days.
 
Currently one likely uses std::string to represent UTF8 in C++. The
literal u8"text" is of type const char[] and so there are no additional
conversions needed.
 
 
> Rather, my question was about the need for a bytes_view, for which there's
> nothing in the standard library, and about the advisability or lack thereof
> of basing one on std::basic_string_view<uint8_t,?>, I'm leaning towards no.
 
If there is a need for class referring to a contiguous sequence of values
of type T (that are not characters) somewhere in memory then may be use
some non-standard library class like gsl::span<T>?
 
Specializing 'std::char_traits' for uint8_t that are not really meant to
be characters just to get 'std::basic_string' to work just to get 'std::basic_string_view' to work (I feel) it can confuse more. From
where you get these "bytes strings" of whose "bytes views" you need?
Don't you need also 'std::codecvt<uint8_t>' for that?
 
On the other hand Microsoft apparently did it. On third hand Microsoft
has questionable practices. And on fourth hand I don't really know your
use cases and rest of software and plans. ;)
Daniel <danielaparker@gmail.com>: Oct 05 01:46PM -0700

On Thursday, October 5, 2017 at 3:56:56 PM UTC-4, James R. Kuyper wrote:
 
> Everything I know about CBOR is from what I just read at
> <https://en.wikipedia.org/wiki/CBOR>. Is that what you're referring to?
 
https://tools.ietf.org/html/rfc7049 is a better reference. CBOR supports
two types of strings: utf8 encoded, and binary. A binary string is just a
contiguous sequence of arbitrary bytes. If formatted to text, it would
typically be output as base64.
 
> between them. I can't imagine why you'd want to use any facility on a
> CBOR string that wasn't aware of the distinction between those parts of
> the data format. Similarly, how and why would you want to use substr()?
 
Point taken :-) On the other hand, you can't sensibly use substr on a utf8
encoded string either, at least for arbitrary indices. find can work, but only because of UTF-8's self-synchronizing features.
 
> and which parsed the items in a CBOR string into actual data items. I
> imagine that this container might internally use an array or a standard
> container of uint8_t to work on the string.
 
Yes, I have one, to encode/decode between CBOR and an unpacked
JSON variant. https://github.com/danielaparker/jsoncons/blob/master/doc/ref/cbor/encode_cbor.md
 
 
> You haven't really explained anything about what bytes_view is supposed
> to do
 
Analogous to string_view, a non-mutable non owning holder of a contiguous sequence of bytes, supporting member functions const uint8_t* data() const, length(), operator==, operator[], begin(), end(), perhaps a couple of others.
 
I was going to write one, but I noticed that somebody else's project in this
space was using
 
using bytes_view = std::experimental::basic_string_view<char>;
 
so I thought I'd run that by here, to see what people here thought. All
other things equal, I'd prefer to leverage existing things than to
introduce new things. That's all.
 
Daniel
Daniel <danielaparker@gmail.com>: Oct 05 02:43PM -0700

On Thursday, October 5, 2017 at 2:32:03 PM UTC-4, James R. Kuyper wrote:
> uint_least32_t, respectively, which need not have a size of exactly 16
> or 32 bits, respectively. It's extremely likely that each of those four
> types will have one of those three sizes, but it's not a requirement.
 
Thanks for remarking on that, I'd overlooked that. I find it lacking that
there's nothing in basic_string that tags the encoding, and have been
using sizeof(CharT) as an indicator of that, e.g. assuming wchar_t holds utf16 if sizeof(wchar_t) == 16, or utf32 if sizeof(wchar_t) == 32. I realize this isn't technically correct. Is there at least a presumption that char32_t holds utf32? as there's nothing that prevents you from stuffing utf8 or utf16 into it.
 
Daniel
"James R. Kuyper" <jameskuyper@verizon.net>: Oct 05 06:18PM -0400

On 2017-10-05 17:43, Daniel wrote:
 
>> Actually, there's no such requirement for any of those types, char16_t
>> and char32_t are required to be typedefs for uint_least16_t and
>> uint_least32_t, respectively, which need not have a size of exactly 16
 
That's not quite right - I was thinking of C, where that statement was
perfectly correct. In C++, char16_t and char32_t are their own distinct
types. But it's still correct to say that 16 and 32 bits, respectively,
are only minimum values for the widths of those types. There's no
requirement that they be exactly that size.
 
> been using sizeof(CharT) as an indicator of that, e.g. assuming
> wchar_t holds utf16 if sizeof(wchar_t) == 16, or utf32 if sizeof(wchar_t)
> == 32. ...
 
I presume you mean sizeof(...)*CHAR_BIT?
The encoding used for narrow (char), and wide (wchar_t) strings and
characters is completely implementation-defined. There's no guarantee
that it has anything to do with either ASCII or Unicode. I gather that,
particularly in Japan, it is (or at least, used to be) commonplace for
neither of them to have either encoding.
 
> ... I realize this isn't technically correct. Is there at least a
> presumption that char32_t holds utf32? as there's nothing that prevents
> you from stuffing utf8 or utf16 into it.
 
You're right - there's nothing to prevent you from stuffing a arbitrary
numeric value that's within range into any object of either type.
However, there's facilities for creating and interpreting utf-8, utf-16
and utf-32 strings, and those facilities use char, char16_t, and
char32_t, respectively.
 
"A string literal that begins with u8, such as u8"asdf", is a UTF-8
string literal and is initialized with the given characters as encoded
in UTF-8.
Ordinary string literals and UTF-8 string literals are also referred to
as narrow string literals. A narrow string literal has type "array of n
const char", where n is the size of the string as defined below, and has
static storage duration (3.7).
A string literal that begins with u, such as u"asdf", is a char16_t
string literal. A char16_t string literal has type "array of n const
char16_t", where n is the size of the string as defined below; it has
static storage duration and is initialized with the given characters. A
single c-char may produce more than one char16_t character in the form
of surrogate pairs.
A string literal that begins with U, such as U"asdf", is a char32_t
string literal. A char32_t string literal has type "array of n const
char32_t", where n is the size of the string as defined below; it has
static storage duration and is initialized with the given characters."
(2.14.5p7-10)
 
"... The specialization codecvt<char16_t, char, mbstate_t> converts
between the UTF-16 and UTF-8 encoding forms, and the specialization
codecvt <char32_t, char, mbstate_t> converts between the UTF-32 and
UTF-8 encoding forms." (22.4.1.4p3).
 
"For the facet codecvt_utf8:
— The facet shall convert between UTF-8 multibyte sequences and UCS2 or
UCS4 (depending on the size of Elem) within the program.
...
For the facet codecvt_utf16:
— The facet shall convert between UTF-16 multibyte sequences and UCS2 or
UCS4 (depending on the size of Elem) within the program.
...
For the facet codecvt_utf8_utf16:
— The facet shall convert between UTF-8 multibyte sequences and UTF-16
(one or two 16-bit codes) within the program." (22.5p4-6)
Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 05 07:02PM

On Wed, 2017-10-04, Jerry Stuckle wrote:
> On 10/3/2017 11:48 PM, Ian Collins wrote:
>> On 10/ 4/17 03:10 PM, Jerry Stuckle wrote:
...
>>> real answer to the problem.  I think this is a case of poor design.
 
>> Representing JSON objects or something similar.
 
> And why would that be necessary?
 
(He didn't say it was necessary.)
 
> objects, i.e. between systems or in files. Every time I've used JSON
> objects the first thing I've done in getting one is create a real object
> out of it - and do whatever is appropriate for that object.
 
I tend to do the same. There's already the external representation
(JSON) and the internal one (the data structures my code does concrete
work with); I don't want a third one, especially if it's more
open-ended than the other two.
 
I guess design bias like that is the reason I've never used variant/any/etc.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Jerry Stuckle <jstucklex@attglobal.net>: Oct 05 05:35PM -0400

On 10/5/2017 3:02 PM, Jorgen Grahn wrote:
 
>>> Representing JSON objects or something similar.
 
>> And why would that be necessary?
 
> (He didn't say it was necessary.)
 
Then why bring it up?
 
> open-ended than the other two.
 
> I guess design bias like that is the reason I've never used variant/any/etc.
 
> /Jorgen
 
For me it's not bias. I've just never found a need for it - there have
always been better ways.
 
 
 
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
ribeiroalvo@gmail.com: Oct 05 11:44AM -0700

I need help to implement the algorithm of my own in C language.
Someone can help me ?
 
Here is the algorithm:
 
Melgo a csprng by Ribeiro Alvo 2017

Description

a,b,c,d and n as ( 16 bit )
i as ( 64 bit )
n = 2**16-1
a = Initialzed in [ 0 , n ]
b = Initialzed in [ 0 , n ]
c = Initialzed in [ 0 , n ]
d = 2**13
X[ from 0 to n ] = Initialzed with a 61 bit values
 

Key-scheduling algorithm

for i from 0 to n
a = 1 + [ a + c ] mod n
b = 1 + [ b + a ] mod n
c = 1 + [ c + b ] mod n
X[i] = X[i] + a * b * c * d + a
endfor

Pseudo-random generation algorithm

i = 0
while GeneratingOutput:
 
i = i + 1
X(a) = [X(a) + X(b)] mod 2**62
a = [a + c + i] mod [n + 1]
Output [X(a) + X(b)] mod 2**56
b = [b + a] mod [n + 1]
Output [X(b) + x(c)] mod 2**56
c = [c + b] mod [n + 1]
Output [X(c) + x(a)] mod 2**56
 
endwhile
 
 
 
Thank you
red floyd <dont.bother@its.invalid>: Oct 05 01:18PM -0700

> I need help to implement the algorithm of my own in C language.
> Someone can help me ?
> [redacted]
 
You may have come to the wrong place. This is comp.lang.c++.
You may want to try comp.lang.c instead.
ribeiroalvo@gmail.com: Oct 05 01:40PM -0700

quinta-feira, 5 de Outubro de 2017 às 20:19:00 UTC, red floyd escreveu:
> > [redacted]
 
> You may have come to the wrong place. This is comp.lang.c++.
> You may want to try comp.lang.c instead.
 
Thanks
I'll do it
But if a C ++ version is possible, I would also appreciate it.
Ben Bacarisse <ben.usenet@bsb.me.uk>: Oct 05 09:43PM +0100


> I need help to implement the algorithm of my own in C language.
 
For C, I'd post in comp.lang.c. This group is for C++.
 
Since I've included C source, I've set the followup-to header to avoid a
language debate. If you reply, please honour that header.
 
> c = Initialzed in [ 0 , n ]
> d = 2**13
> X[ from 0 to n ] = Initialzed with a 61 bit values
 
How are a, b, c and X initialised? Below, there's a hint you mean
62-bit values.
 
> Key-scheduling algorithm
 
> for i from 0 to n
 
Does i ever get to n? I.e. is the upper bound of the for inclusive or
not?
 
> a = 1 + [ a + c ] mod n
> b = 1 + [ b + a ] mod n
> c = 1 + [ c + b ] mod n
 
What do the []s mean here? Is it grouping the "mod n"? I.e.
 
a = 1 + ((a + c) mod n))
 
> X[i] = X[i] + a * b * c * d + a
 
Are the X[i] supposed to be reduced to being 61-bit (or 62-bit) values?
I'm guessing yes.
 
> c = [c + b] mod [n + 1]
> Output [X(c) + x(a)] mod 2**56
 
> endwhile
 
It would be much better to use consistent notation. Keep [] for
indexing and use () for arithmetic grouping.
 
It's not clear if the output refers to three separate outputs or of the
generator is to make one 168-bit number at a time. Since making three
separate 56-bit values is more interesting, that's the interpretation
I've chosen.
 
Here's a first draft. You need C99 or later.
 
#include <stdio.h>
 
unsigned long long csprng(void)
{
const unsigned d = 1 << 13;
const unsigned n = 0xFFFF;
const unsigned long long mask_62_bits = (1ull << 62) - 1;
const unsigned long long mask_56_bits = (1ull << 56) - 1;
 
static unsigned a, b, c;
static unsigned long long X[0x10000];
 
static int state = 0;
static unsigned long long i = 0;
switch (state) {
case 0:
if (i == 0) {
/*
* Here we want to set initial values for a, b, c an X,
* but I don't know how that is supposed to be done.
*/
 
for (unsigned long i = 0; i <= n; i++) {
a = 1 + ((a + c) % n);
b = 1 + ((b + a) % n);
c = 1 + ((c + b) % n);
X[i] += (unsigned long long)a * b * c * d + a;
X[i] &= mask_62_bits;
}
}
 
i += 1;
X[a] = (X[a] + X[b]) & mask_62_bits;
 
a = (a + c + i) & n;
state = 1;
return (X[a] + X[b]) & mask_56_bits;
 
case 1:
b = (b + a) & n;
state = 2;
return (X[b] + X[c]) & mask_56_bits;
 
case 2:
c = (c + b) & n;
state = 0;
return (X[c] + X[a]) & mask_56_bits;
}
}
 
int main(int argc, char **argv)
{
for (int i = 0; i < 100; i++)
printf("%llu\n", csprng());
}
 
 
--
Ben.
"James R. Kuyper" <jameskuyper@verizon.net>: Oct 05 04:49PM -0400


> Thanks
> I'll do it
> But if a C ++ version is possible, I would also appreciate it.
 
If it can be done in C, it can generally also be done in C++, using
almost exactly the same code. There might also be a C++ way of doing it
that's better, using radically different code.
 
Note: regardless of what language you want to use, if this is a homework
assignment, many of the people who can give you the best help will
generally not give you that help until you've first made an attempt to
do it yourself. If your code doesn't work as intended, or maybe even
fails to compile, you can post your code here and people will be quite
happy to help you fix it - but they won't do your homework for you.
 
If this isn't homework, people who are competent to do so generally
expect to get paid for doing programming work for you. How much are you
willing to offer, and by what payment method?
ribeiroalvo@gmail.com: Oct 05 02:08PM -0700

quinta-feira, 5 de Outubro de 2017 às 20:49:25 UTC, James R. Kuyper escreveu:
 
> If this isn't homework, people who are competent to do so generally
> expect to get paid for doing programming work for you. How much are you
> willing to offer, and by what payment method?
 
 
This is not homework nor commercial purposes.
See:
www.number.com/Melgo.html
From now on this subject will be dealt with in
https://groups.google.com/forum/#!forum/comp.lang.c
ribeiroalvo@gmail.com: Oct 05 02:10PM -0700

Correction.
http://www.number.com.pt/Melgo.html
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: