Wednesday, October 18, 2017

Digest for comp.lang.c++@googlegroups.com - 25 updates in 3 topics

asetofsymbols@gmail.com: Oct 17 08:17PM -0700

Sometime ago I heard someone speak of
boolean variable it is better to be one int...
I agree, possibly can be unsigned int;
for me can not be a bit nor one char
because conversion until int
C language has to follow
 
Boolean has some practical use for to be seen as single bit only when it is one array of boolean...
Paavo Helde <myfirstname@osa.pri.ee>: Oct 18 08:22AM +0300

> for me can not be a bit nor one char
> because conversion until int
> C language has to follow
 
Wow, a poem!
Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 18 05:28AM

On Tue, 2017-10-17, Rick C. Hodgin wrote:
 
> value = 10;
> std::cout << ((flag) ? "True" : "False") << std::endl;
 
> Is there a proper term for populating value (flag's internal value)?
 
I think the problem is you're trying for too much abstraction.
A "bool with metadata" doesn't make sense to me, because bools
don't have metadata.
 
While if you take a more concrete example, an errno value (or an error
code in general) is in some sense a boolean with (in case of an error)
additional data.
 
> And does anyone know if C or C++ compilers will ever strip the value
> down to its true/false state when passing the value to a function?
 
I'm ignoring your union above, but in C++ with a class, you as the
author control conversion to bool. (Before C++11 it was a bit tricky
IIRC, if you wanted nice syntax.)
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
David Brown <david.brown@hesbynett.no>: Oct 18 09:02AM +0200

On 18/10/17 07:28, Jorgen Grahn wrote:
 
> I'm ignoring your union above, but in C++ with a class, you as the
> author control conversion to bool. (Before C++11 it was a bit tricky
> IIRC, if you wanted nice syntax.)
 
"A bit tricky" ? The "safe boolean idiom" was a fine example of just
how hideous C++ could be in order to do something so useful and so
simple to describe. The introduction of explicit conversions in C++11
is then a fine example of how the C++ committee pay attention to the
real world C++ challenges, and add features that make the language
significantly better. Safe boolean conversions are now simple to write,
and very useful.
David Brown <david.brown@hesbynett.no>: Oct 18 09:08AM +0200

On 17/10/17 22:18, Rick C. Hodgin wrote:
 
> Is there a proper term for populating value (flag's internal value)?
> And does anyone know if C or C++ compilers will ever strip the value
> down to its true/false state when passing the value to a function?
 
If you are adding metadata to a bool, it is not a bool any more. You
should consider it from the other side - think of a type holding some
information, that you will sometimes want to view or test as a simple
boolean flag.
 
There is a lot of use in having something that holds data and can be
considered as "true" or "false", or "valid" or "invalid". This concept
goes back to the earliest days of C, using a null pointer 0 to indicate
an invalid pointer - thus letting you test pointer validity with "if (p)
{ ... }". In C, you basically take a pointer or an integer type and use
that - any time you treat it in a boolean context (like in "if", or when
assigning to a bool) then 0 is "false", non-0 is "true".
 
With C++, you make a class with an explicit conversion to bool. That
lets you have more freedom in picking which values (of the data members)
are considered "false", and which are considered "true".
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Oct 18 12:38AM -0700

Is there a proper name for exploiting otherwise unused data space,
such as in the example I gave, which does not alter program behavior,
but does introduce additional data conveyance mechanisms atop legacy
code bases (for example)?
 
If you don't know the proper name, don't answer. I am looking to know
if there is a proper name for this ability.
 
I do not know the proper name, or even if one exists in computer science,
hence my asking. I presume it does exist.
 
Thank you,
Rick C. Hodgin
David Brown <david.brown@hesbynett.no>: Oct 18 11:03AM +0200

On 18/10/17 09:38, Rick C. Hodgin wrote:
> such as in the example I gave, which does not alter program behavior,
> but does introduce additional data conveyance mechanisms atop legacy
> code bases (for example)?
 
I don't know of any name - perhaps "tagging" of some sort?
 
I am sceptical to the idea of applying this to unchanged existing code.
Either it /will/ change program behaviour to at least some extent, or
it is going to be unreliable. For example, if you try to use values
other than 0 or 1 in place of a boolean, to convey additional
information, then there are three things that can go badly wrong. One
is that any use of "bool", "!!", or similar constructs, will wipe the
extra information. Another is that code that expects the value to be
either 0 or 1, will break. Finally, the existing code could already be
using the extra bits for its own purposes in exactly the same way.
 
It is a different matter if you are talking about a C++ type with
careful access control, and you change the type itself - if all access
to the data members, copying, etc., happens via methods that /you/
control, you can add more information and safely re-compile the code
that uses it. But for low-level types, you would have to be sure that
the legacy code uses them safely.
 
It is certainly possible to make use of otherwise unused space in types
- look up "tagged pointer" for an example. But that is done in
cooperation with code that uses the pointer, not as an add-on to
existing code.
 
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Oct 18 04:44AM -0700

What is it called when padded data is used to convey information?
Like on BMP files. Every row must be 32-bit aligned, but RGB()
color data can be 24-bits, leaving up to 3 bytes per row available
that is otherwise wasted data space.
 
If the use of that data has a name, then the name for this
type of other data use I describe is probably similar.
 
Thank you,
Rick C. Hodgin
David Brown <david.brown@hesbynett.no>: Oct 18 02:29PM +0200

On 18/10/17 13:44, Rick C. Hodgin wrote:
> What is it called when padded data is used to convey information?
 
My immediate instinct is to call it "a mistake". I am still not sure
what you are hoping to do here, so it would be wrong of me to be too
categorical - but please tread carefully. You risk making something
that will work during your testing, but break with other code because
either you or the original code author are making assumptions.
 
> that is otherwise wasted data space.
 
> If the use of that data has a name, then the name for this
> type of other data use I describe is probably similar.
 
Are you thinking of this?
<https://en.wikipedia.org/wiki/Steganography>
 
That is more about putting information in less significant bits
(effectively adding noise) rather than unused bits.
 
(In a BMP file, there will only be padding if the image width is not a
multiple of 4. And any manipulation of the picture data may change or
drop data stored in the padding.)
 
If you are really just adding information into previously unused spaces,
in a safe manner, then you are simply extending the existing format.
scott@slp53.sl.home (Scott Lurndal): Oct 18 12:30PM

>such as in the example I gave, which does not alter program behavior,
>but does introduce additional data conveyance mechanisms atop legacy
>code bases (for example)?
 
The proper name is "foolish". Many attempts in the past to
"exploit otherwise unused data space" have never lasted. Many
systems used to use the unused high bits of a pointer, for example,
which led to problems with VA/PA sizes changed on processors
and crappy code.
 
Other attempts (e.g. storing data in unused portions of the
instruction stream - like the Burroughs B200) have their own
issues.
Bo Persson <bop@gmb.dk>: Oct 18 02:44PM +0200

On 17/10/17 22:18, Rick C. Hodgin wrote:
 
> Is there a proper term for populating value (flag's internal value)?
> And does anyone know if C or C++ compilers will ever strip the value
> down to its true/false state when passing the value to a function?
 
It is not unusual to see compilers only test the low bit of a boolean,
for example using BT [flag],1 on an x86 system.
 
So would fail for even values, like 10.
 
This sounds like holding one foot over the edge of a cliff, hoping not
to fall.
 
 
 
Bo Persson
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Oct 18 05:59AM -0700

On Wednesday, October 18, 2017 at 8:30:32 AM UTC-4, Scott Lurndal wrote:
> >but does introduce additional data conveyance mechanisms atop legacy
> >code bases (for example)?
 
> The proper name is "foolish"...
 
I'm not asking about the ethical nature, or security or validity of
using this technique. I'm simply asking what it's called (if there's
a technical name for it, if it's something noted (perhaps by computer
virus researchers) as something that is known, and is employed to
convey more information than the original intent in the data).
 
I've searched and I don't know what to search for to find the name,
so I thought I would ask other developers who may know.
 
If you don't know that's fine. All I'm asking for is the name.
 
Thank you,
Rick C. Hodgin
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Oct 18 06:01AM -0700

On Wednesday, October 18, 2017 at 8:44:36 AM UTC-4, Bo Persson wrote:
> It is not unusual to see compilers only test the low bit of a boolean,
> for example using BT [flag],1 on an x86 system.
 
> So would fail for even values, like 10.
 
This is useful information, Bo. Appreciated.
 
Thank you,
Rick C. Hodgin
David Brown <david.brown@hesbynett.no>: Oct 18 03:19PM +0200

On 18/10/17 14:44, Bo Persson wrote:
 
> It is not unusual to see compilers only test the low bit of a boolean,
> for example using BT [flag],1 on an x86 system.
 
> So would fail for even values, like 10.
 
I remember helping someone with code that was something equivalent to:
 
extern void readFromEEprom(uint16_t eepromAddress, uint16_t count,
void * destination);
 
int foo(void) {
bool b;
readFromEEprom(1000, 1, &b);
if (b) {
return 1;
} else {
return 2;
}
}
 
He was most surprised that this function was returning values other than
1 or 2 - apparently his bool was neither true nor false. It turned out
that the value stored in the eeprom was something like 10, rather than 0
or 1. The compiler had optimised the conditional into the equivalent of
"return 2 - b;" - which is perfectly correct as a bool can only hold
either 0 or 1, as long as you have used it in a proper manner.
 
The moral here is don't lie to your compiler - it will get its revenge.
 
 
> This sounds like holding one foot over the edge of a cliff, hoping not
> to fall.
 
Indeed.
 
(Further comments from Rick suggest he simply wants to know about this,
perhaps to be sure of avoiding it, rather than because he wants to
balance on a cliff edge. I still can't help him with any names to aid
his search.)
"Öö Tiib" <ootiib@hot.ee>: Oct 18 06:53AM -0700

On Wednesday, 18 October 2017 15:59:26 UTC+3, Rick C. Hodgin wrote:
> a technical name for it, if it's something noted (perhaps by computer
> virus researchers) as something that is known, and is employed to
> convey more information than the original intent in the data).
 
It is called "bit packing" and more generally "data compression".
The exact technique that you describe has additionally "foolish"
before it and "in the wrong way" after it.
David Brown <david.brown@hesbynett.no>: Oct 18 04:07PM +0200

On 18/10/17 15:53, Öö Tiib wrote:
 
> It is called "bit packing" and more generally "data compression".
> The exact technique that you describe has additionally "foolish"
> before it and "in the wrong way" after it.
 
It might have other adjectives attached, like "deceitful" or "secret" -
as Rick suggests, it could be used in connection with malware or other
hidden information (watermarking is a related technology). Presumably
if that's what Rick is thinking of, it is in terms of spotting the
malware or preventing it.
scott@slp53.sl.home (Scott Lurndal): Oct 18 02:09PM

>perhaps to be sure of avoiding it, rather than because he wants to
>balance on a cliff edge. I still can't help him with any names to aid
>his search.)
 
ARM calls it TBI (top byte ignored) when applied to pointers. A processor
state bit can be set by the OS to cause the high byte of 64-bit pointers
to be ignored by the hardware. This does, however, limit the virtual
address space to only 56 bits (to be fair, it's currently limited by the
architecture (v8.2) to 52-bits).
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Oct 18 07:24AM -0700

On Wednesday, October 18, 2017 at 10:09:58 AM UTC-4, Scott Lurndal wrote:
> to be ignored by the hardware. This does, however, limit the virtual
> address space to only 56 bits (to be fair, it's currently limited by the
> architecture (v8.2) to 52-bits).
 
I think it has to be something like data co-opting, or non-design
reclamation, or something along those lines.
 
Thank you,
Rick C. Hodgin
"James R. Kuyper" <jameskuyper@verizon.net>: Oct 18 11:38AM -0400

On 2017-10-18 09:19, David Brown wrote:
>>> };
 
>>> flag = true; // value is 0x01
>>> flag = false; // value is 0x00
 
Unless bool is your own typedef, you're making way too many unjustified
assumptions.
Assuming that <stdbool.h> has been #included, so bool is a macro that
expands tor _Bool, the only thing that the standard guarantees about
flag is that it has at least one value bit. There's no upper limit on
the number of padding or value bits it may contain. sizeof(bool) is
permitted to be > 1 (it might be the same as sizeof(int_fast8_t)). If it
contains only one value bit, there's no guaranteeing which one that is.
It might be the same bit that signed char uses as a sign bit.
 
Think for a minute about how your code might fail if sizeof(flag) >
sizeof(value). Consider how it might fail if the the only value bit of
flag corresponds to any bit other than the lowest order bit of value.
Then re-design.
 
> perhaps to be sure of avoiding it, rather than because he wants to
> balance on a cliff edge. I still can't help him with any names to aid
> his search.)
 
I get the impression he's having trouble seeing the forest because
there's too many trees in the way. He should simply use 'value', and
take advantage of the fact that for the purpose of conditional
expressions, 0 is treated as false, and non-zero is treated as true. By
declaring it simply and correctly as a char, rather than a union with
bool, he avoids the problems that Bo Persson and you, have pointed out.
Those are valid optimizations only because the unique properties of _Bool.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Oct 18 04:46PM +0100

On Wed, 18 Oct 2017 14:09:46 GMT
> 64-bit pointers to be ignored by the hardware. This does, however,
> limit the virtual address space to only 56 bits (to be fair, it's
> currently limited by the architecture (v8.2) to 52-bits).
 
The bottom bits (rather than top bits) of pointers are sometimes used in
dynamically typed language implementations to hold a limited amount of
type information. If the allocator allocates memory aligned on the
pointer size, with 64-bit pointers the bottom 3 bits hold 000 and are
available for use; likewise the bottom 2 bits for 32-bit pointers[1].
This seems to go by the name of pointer tagging.
 
Chris
 
[1] glibc malloc() and cognates reputedly always align addresses on
8-byte boundaries even on 32-bit systems, so allowing the bottom 3 bits
to be used in that case.
Gareth Owen <gwowen@gmail.com>: Oct 18 07:32PM +0100

> systems used to use the unused high bits of a pointer, for example,
> which led to problems with VA/PA sizes changed on processors
> and crappy code.
 
CounterPoint: Chandler Carruth how they use ever single bit of composite
objects in Clang/LLVM https://www.youtube.com/watch?v=vElZc6zSIXM (key
bits from 22:00 ish)
"Rick C. Hodgin" <rick.c.hodgin@gmail.com>: Oct 18 11:37AM -0700

On Wednesday, October 18, 2017 at 2:33:01 PM UTC-4, gwowen wrote:
> CounterPoint: Chandler Carruth how they use ever single bit of composite
> objects in Clang/LLVM https://www.youtube.com/watch?v=vElZc6zSIXM (key
> bits from 22:00 ish)
 
You can embed timestamps in YouTube URLs using &t=XxhYymZzs:
 
Jump right to 22:00 using this url:
https://www.youtube.com/watch?v=vElZc6zSIXM&t=22m0s
 
Thank you,
Rick C. Hodgin
Gareth Owen <gwowen@gmail.com>: Oct 18 07:47PM +0100


> You can embed timestamps in YouTube URLs using &t=XxhYymZzs:
 
> Jump right to 22:00 using this url:
> https://www.youtube.com/watch?v=vElZc6zSIXM&t=22m0s
 
Thank you
JiiPee <no@notvalid.com>: Oct 18 07:34PM +0100

Been thinking this many times: Which one is normally better, have an
initilize/reset function or destroy the object and just create a new
object (which would then have automatically everything initialized to
first values by the constructor). When you have to "zero" the object to
its initial values (the same values as when the object is created).
Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 18 05:37AM

On Tue, 2017-10-17, Christopher Pisz wrote:
> On Friday, October 13, 2017 at 1:30:27 PM UTC-5, Stuart Redmann wrote:
...
> people looked over the comparator and none saw the problem.
 
> Let this be a lesson learned. Unit Test that (A < B) && (B < A)
> should never be true.
 
Interesting! I would never have imagined (and have never seen) that a
flawed T < T could cause a crash, but it's arguably better than
silently producing incorrect results.
 
You'd have to be careful to produce a reasonably complete unit test
though ... probably as careful as someone writing a correct T < T
in the first place.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: