Wednesday, February 19, 2020

Digest for comp.lang.c++@googlegroups.com - 16 updates in 4 topics

Lynn McGuire <lynnmcguire5@gmail.com>: Feb 19 03:35PM -0600

"Doing UTF-8 in Windows" by Mircea Neacsu
https://www.codeproject.com/Articles/5252037/Doing-UTF-8-in-Windows
 
"This is (yet another!) article on how to handle UTF-8 encoding on a
platform that still encourages the UTF-16 encoding. I am also providing
a small library for this purpose. The code works, it is clean, easy to
understand and small."
 
"This is an implementation of the solution advocated in the UTF-8
Everywhere manifesto. I would strongly encourage you to go read the
whole document to get indoctrinated ☺."
http://utf8everywhere.org/
 
We are finally moving our software to UTF-8. It is horrendous so far.
 
Lynn
Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 19 10:16PM

On Wed, 2020-02-19, Lynn McGuire wrote:
...
> We are finally moving our software to UTF-8. It is horrendous so far.
 
Can you expand on that? E.g. moving from what?
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Lynn McGuire <lynnmcguire5@gmail.com>: Feb 19 04:26PM -0600

On 2/19/2020 4:16 PM, Jorgen Grahn wrote:
>> We are finally moving our software to UTF-8. It is horrendous so far.
 
> Can you expand on that? E.g. moving from what?
 
> /Jorgen
 
ASCII. Our Windows user interface has 450,000 lines of code in C++.
Our Calculation Engine has 700,000 lines of F77 and 10,000+ lines of C
and C++.
 
Lynn
Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 19 10:48PM

On Wed, 2020-02-19, Lynn McGuire wrote:
 
>> Can you expand on that? E.g. moving from what?
 
>> /Jorgen
 
> ASCII.
 
Then you're already doing UTF-8! (Only half-joking.)
 
> Our Windows user interface has 450,000 lines of code in C++.
> Our Calculation Engine has 700,000 lines of F77 and 10,000+ lines of C
> and C++.
 
I guess this is much work or little, depending on how much that code cares
about the actual contents of strings.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Lynn McGuire <lynnmcguire5@gmail.com>: Feb 19 05:06PM -0600

On 2/19/2020 4:48 PM, Jorgen Grahn wrote:
 
> I guess this is much work or little, depending on how much that code cares
> about the actual contents of strings.
 
> /Jorgen
 
Anything that calls the Win32 API or opens a file ...
 
Lynn
Lynn McGuire <lynnmcguire5@gmail.com>: Feb 19 04:08PM -0600

"2020-02 Prague ISO C++ Committee Trip Report — 🎉 C++20 is Done! 🎉"

https://www.reddit.com/r/cpp/comments/f47x4o/202002_prague_iso_c_committee_trip_report_c20_is/
 
Wow, that is a lot of new stuff that I probably will not use.
 
Lynn
Jorgen Grahn <grahn+nntp@snipabacken.se>: Feb 19 10:21PM

On Wed, 2020-02-19, Lynn McGuire wrote:
> "2020-02 Prague ISO C++ Committee Trip Report — 🎉 C++20 is Done! 🎉"
 
> https://www.reddit.com/r/cpp/comments/f47x4o/202002_prague_iso_c_committee_trip_report_c20_is/
 
> Wow, that is a lot of new stuff that I probably will not use.
 
I wonder what happened to Stroustrup's request to slow down the
development? I read no C++-related news, so I am unaware of any
results.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Daniel <danielaparker@gmail.com>: Feb 18 08:40PM -0800

I've been following the "Union type punning in C++" posts with some interest,
but not exactly sure of the conclusion.
 
Would the following be legal C++?
 
#include <string>
#include <new>
 
// pod type
struct A
{
uint8_t tag;
};
 
// non-pod
struct B
{
A a;
std::string s;
 
B(const std::string& s)
: a{1}, s(s)
{
 
}
};
 
struct C
{
A a;
double d;
 
C(double d)
: a{ 2 }, d(d)
{
 
}
};
 
class V
{
union
{
A a;
B b;
C c;
};
 
public:
V(const std::string& s)
{
::new(&b) B(s);
}
V(double d)
{
::new(&c) C(d);
}
~V()
{
switch (tag())
{
case 1:
b.~B();
break;
case 2:
c.~C();
break;
default:
break;
}
}
uint8_t tag() const
{
return a.tag;
}
};
 
What if B and C were defined through inheritance from A instead, i.e.
 
struct B : A
{
std::string s;
 
B(const std::string& s)
: A{1}, s(s)
{
 
}
};
 
 
struct C : A
{
double d;
 
C(double d)
: A{ 2 }, d(d)
{
 
}
};
 
Thanks,
Daniel
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Feb 19 01:21AM -0500

Daniel wrote:
> return a.tag;
> }
> };
I think no: it is not allowed to inspect a.tag regardless of the constructor
used to construct V because `tag' is not a part of the common initial sequence
of either V::a and V::b or of V::a and V::c (I think the common initial sequence
is empty in both cases).
> {
 
> }
> };
I think no, for same reason.
 
I think changing implementation of tag() to either of { return b.tag; } or {
return c.tag; } would make the code valid (then of course having V::a member
would be unnecessary).
 
> Thanks,
> Daniel
 
FWIW,
 
-Pavel
Daniel <danielaparker@gmail.com>: Feb 19 06:09AM -0800

On Wednesday, February 19, 2020 at 1:21:36 AM UTC-5, Pavel wrote:
 
> > Would the following be legal C++?
 
> > snipped
 
> I think no
 
Would the following (non-union) alternative be legal C++?
 
#include <string>
#include <new>
#include <algorithm>
 
enum class tag_type : uint8_t {b,c};
 
struct A
{
tag_type tag;
};
 
struct B : A
{
uint8_t extra;
uint64_t n;
 
B(uint64_t n)
: A{tag_type::b}, n(n)
{
 
}
};
 
struct C : A
{
double d;
 
C(double d)
: A{tag_type::c}, d(d)
{
 
}
};
 
class V
{
static constexpr size_t data_size = std::max(sizeof(B), sizeof(C));
static constexpr size_t data_align = std::max(alignof(B), alignof(C));
 
typedef typename std::aligned_storage<data_size, data_align>::type data_t;
 
data_t data_;
 
public:
V(uint8_t n)
{
::new(&data_) B(n);
}
V(double d)
{
::new(&data_) B(d);
}
~V()
{
switch (tag())
{
case tag_type::b:
reinterpret_cast<const B*>(&data_)->~B();
break;
case tag_type::c:
reinterpret_cast<const C*>(&data_)->~C();
break;
default:
break;
}
}
tag_type tag() const
{
return reinterpret_cast<const A*>(&data_)->tag;
}
};
"Öö Tiib" <ootiib@hot.ee>: Feb 19 08:25AM -0800

On Wednesday, 19 February 2020 16:10:23 UTC+2, Daniel wrote:
 
> > > snipped
 
> > I think no
 
> Would the following (non-union) alternative be legal C++?
 
No. You have wrong idea that base classes are better.
 
Resulting are not standard layout types. It is because
the requirement (in [class.prop]) "has all non-static data
members and bit-fields in the class and its base classes
first declared in the same class" is not fulfilled.
 
And so "common initial sequence" does not apply and
"address of class object is same as address of its first
non-static data member object" does not also apply.
 
> return reinterpret_cast<const A*>(&data_)->tag;
> }
> };
 
I suggest to get rid of base classes with data members
and have A as first data member. Then it is proper
approach.
 
Additional note:
When you want to have standard library classes as data members of
your standard layout classes then always static_assert in code
that these are:
 
static_assert(std::is_standard_layout<std::string>::value
, "This code needs std::string to be standard layout");
 
It is because standard does not require it and that can turn
your reinterpret_cast of pointer of object into pointer of its
first member into undefined behavior.
Daniel <danielaparker@gmail.com>: Feb 19 09:08AM -0800

On Wednesday, February 19, 2020 at 11:26:22 AM UTC-5, Öö Tiib wrote:
 
> <snipped>
 
> No.
 
> <snipped>
 
That's very helpful, thanks.
 
For this exercise, the design goals are correctness, compactness (assume 10's
of millions of V's), and encapsulation of the B's and C's, in that order. For
the last example, assuming eight byte alignment, it would be desirable to have
sizeof(V) == 16.
 
To that end, my next question is, would this be legal C++:
 
#include <string>
#include <new>
#include <algorithm>
#include <cstring>
 
struct B
{
uint8_t tag;
uint8_t extra;
uint64_t n;
 
B(uint64_t n, uint8_t extra = 0)
: tag{1}, n(n), extra(extra)
{
 
}
};
 
struct C
{
uint8_t tag;
double d;
 
C(double d)
: tag{2}, d(d)
{
 
}
};
 
class V
{
static constexpr size_t data_size = std::max(sizeof(B), sizeof(C));
static constexpr size_t data_align = std::max(alignof(B), alignof(C));
 
typedef typename std::aligned_storage<data_size, data_align>::type data_t;
 
data_t data_;
 
public:
V(uint8_t n)
{
::new(&data_) B(n);
}
V(double d)
{
::new(&data_) B(d);
}
~V()
{
switch (tag())
{
case 1:
reinterpret_cast<const B*>(&data_)->~B();
break;
case 2:
reinterpret_cast<const C*>(&data_)->~C();
break;
default:
break;
}
}
uint8_t tag() const
{
uint8_t t;
std::memcpy(&t, &data_, sizeof(uint8_t));
return t;
}
};
 
 
> It is because standard does not require it and that can turn
> your reinterpret_cast of pointer of object into pointer of its
> first member into undefined behavior.
 
Thanks for pointing that out. For my purposes, I do need to support
 
std::allocator_traits<Alloc>::pointer
 
including fancy pointers.
 
Daniel
"Öö Tiib" <ootiib@hot.ee>: Feb 19 12:22PM -0800

On Wednesday, 19 February 2020 19:09:20 UTC+2, Daniel wrote:
> {
> case 1:
> reinterpret_cast<const B*>(&data_)->~B();
 
Why const B* not B*?
 
> return t;
> }
> };
 
Yes, the classes are standard layout and therefore in tag()
you could just return *reinterpret_cast<uint8_t const*>(&data_);
as well.
 
 
 
> std::allocator_traits<Alloc>::pointer
 
> including fancy pointers.
 
> Daniel
 
Yes, when you are unsure if certain member in your B or C
is standard layout or not then check std::is_standard_layout
about the member or about whole B or C.
Daniel <danielaparker@gmail.com>: Feb 19 12:59PM -0800

On Wednesday, February 19, 2020 at 3:22:52 PM UTC-5, Öö Tiib wrote:
> > reinterpret_cast<const B*>(&data_)->~B();
> > break;
 
> Why const B* not B*?
 
No reason. Copied that piece from code that accesses the object.
 
 
> Yes, the classes are standard layout and therefore in tag()
> you could just return *reinterpret_cast<uint8_t const*>(&data_);
> as well.
 
Thanks! very much appreciate your feedback.
 
Daniel
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Feb 19 12:28AM -0500

Bonita Montero wrote:
>> your "nice thing" above. Readable code separates concerns and your "nice
>> thing" does the opposite.
 
> The code I've shown doesn't have a bad redability.
I have never said it had bad readability. I said your "nice thing" serving no
purpose was an obfuscation pretending to be simple while not being so. It would
be a bad idea (now I am saying "bad") to use it as an example to do anything useful.
 
Your original "transaction" code was rather readable although had an unnecessary
"else" clause made it more complex than necessary.
 
While fixing the bug your worsened the readability from ok to rather poor (again
I am not calling it "bad").
 
A better readable code for "transaction", after the bug fix and preserving your
preferences for indentation and braces but not new lines and using magic numbers
vs symbolic constants could be, for example:
 
enum TsxStatusCategory: int {
TSX_ABORTED_DONT_RETRY = -1,
TSX_ABORTED_CAN_RETRY = 0,
TSX_COMMITTED = 1
};
 
template<typename L>
inline
TsxStatusCategory
doTsxTransaction( L &lambda ) // function name should be a verb [clause]
{
unsigned code = _xbegin(); // unsinged is not unsigned BTW
 
if( code == _XBEGIN_STARTED )
{
lambda();
_xend();
return TSX_COMMITTED;
} // `else' served no purpose here other than obfuscation
 
if (code & _XABORT_EXPLICIT)
return TSX_ABORTED_DONT_RETRY;
 
if (code & _XABORT_RETRY)
return TSX_ABORTED_CAN_RETRY;
 
return TSX_ABORTED_DONT_RETRY;
}
 
 
> Yours has a bad readabiliy.
Yes, it does and as said earlier it is its point.
Bonita Montero <Bonita.Montero@gmail.com>: Feb 19 07:28AM +0100

> I have never said it had bad readability. I said your "nice thing" serving
> no purpose was an obfuscation pretending to be simple while not being so. ...
Whoever finds the contradiction may keep it.
 
> return TSX_ABORTED_CAN_RETRY;
 
> return TSX_ABORTED_DONT_RETRY;
> }
 
That's a matter of taste.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: