soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Stored in Big Endian. . . . . . inside Little Endian - 15 Updates

Stored in Big Endian. . . . . . inside Little Endian

Frederick Virchanza Gotham <cauldwell.thomas@gmail.com>: Feb 12 03:40PM -0800

So as you all know I'm combining three programs into one to make a program that can connect to any SSH server and use it as a VPN (without admin rights on the remote server). I actually already have it working and so now I'm just cleaning it up.

Anyway, at one point I had to deal with the IP address held inside a "struct sockaddr_in". This struct has a member called "sin_addr" which has a member called "s_addr". The 32-Bit unsigned number that I'm looking for is inside the integer variable "s_addr". I checked the Linux manual and it says that 's_addr' is always in Network Byte Order (i.e. Big Endian), even on modern desktop PC's running MS-Windows which are all Little Endian.

So first I wrote code like this:

char unsigned const *p = static_cast<char unsigned const*>(static_cast<void const*>(&s_addr));
cout << static_cast<unsigned>(p[0]) << "." << static_cast<unsigned>(p[1]) << "." << static_cast<unsigned>(p[2]) << "." << static_cast<unsigned>(p[3]);

But before clicking Save, I checked the type of "s_addr". I thought it might be uint32_t, but actually it's long unsigned. So then I stopped and thought... hmm... on a few Linux systems this could be a 64-Bit type, and so I'll change my code to:

static_assert(CHAR_BIT==8u, "Can't deal with 16-Bit char's or whatever size they are");
unsigned constexpr n = sizeof(s_addr) - 4u;
cout << static_cast<unsigned>(p[n+0]) << "." << static_cast<unsigned>(p[n+1]) << "." << static_cast<unsigned>(p[n+2]) << "." << static_cast<unsigned>(p[n+3]);

I saved this code, compiled, linked and tested it, and the IP address came out as "0.0.0.0". So then I tried my original code that didn't account for 's_addr' not being a uint32_t, and it came out as "192.168.1.1".

So on Linux on a 64-Bit x86 CPU, which is Little Endian, the 'long' type is 64-Bit, and it stores an IP address in 's_addr' as bytes as follows:

[192][168][1][1][0][0][0][0]

So they've sort of stored a 32-Bit number as Big Endian inside the lower 32 Bits of a 64-Bit Little Endian unsigned long. I suppose the cool thing about this is that they can do:

uint32_t my_ip = s_addr;

And now 'my_ip' will have the actual IP address instead of the four zeroed-out bytes.

Still though it's a bit mad.

red floyd <no.spam.here@its.invalid>: Feb 12 09:22PM -0800

On 2/12/2023 3:40 PM, Frederick Virchanza Gotham wrote:

> uint32_t my_ip = s_addr;

> And now 'my_ip' will have the actual IP address instead of the four zeroed-out bytes.

> Still though it's a bit mad.

htonl() to go from native to network format, or ntohl() to go from
network to native.

Paavo Helde <eesnimi@osa.pri.ee>: Feb 13 09:53AM +0200

13.02.2023 01:40 Frederick Virchanza Gotham kirjutas:

> So as you all know I'm combining three programs into one to make a program that can connect to any SSH server and use it as a VPN (without admin rights on the remote server). I actually already have it working and so now I'm just cleaning it up.

> Anyway, at one point I had to deal with the IP address held inside a "struct sockaddr_in". This struct has a member called "sin_addr" which has a member called "s_addr". The 32-Bit unsigned number that I'm looking for is inside the integer variable "s_addr". I checked the Linux manual and it says that 's_addr' is always in Network Byte Order (i.e. Big Endian), even on modern desktop PC's running MS-Windows which are all Little Endian.

So read a little bit more documentation and use the appropriate
conversion functions as red floyd suggested.

Also, do not forget to also support IPv6, this is a must nowadays.

Frederick Virchanza Gotham <cauldwell.thomas@gmail.com>: Feb 13 12:41AM -0800

On Monday, February 13, 2023 at 7:54:03 AM UTC, Paavo Helde wrote:

> Also, do not forget to also support IPv6, this is a must nowadays.

I have IPv6 disabled on my laptop.

Frederick Virchanza Gotham <cauldwell.thomas@gmail.com>: Feb 13 12:46AM -0800

On Monday, February 13, 2023 at 7:54:03 AM UTC, Paavo Helde wrote:

> So read a little bit more documentation and use the appropriate
> conversion functions as red floyd suggested.

I don't see anywhere in the documentation that the IP address will be stored in the way they store it.

Frederick Virchanza Gotham <cauldwell.thomas@gmail.com>: Feb 13 12:52AM -0800

On Monday, February 13, 2023 at 5:22:43 AM UTC, red floyd wrote:

> htonl() to go from native to network format, or ntohl() to go from
> network to native.

On this page:

https://linux.die.net/man/3/htonl

It says that those two functions take a 'uint32_t' -- not an unsigned long.

Paavo Helde <eesnimi@osa.pri.ee>: Feb 13 11:53AM +0200

13.02.2023 10:52 Frederick Virchanza Gotham kirjutas:

> On this page:

> https://linux.die.net/man/3/htonl

> It says that those two functions take a 'uint32_t' -- not an unsigned long.

An IPv4 address is 32 bits.

In case you haven't noticed, this protocol is used for communication
between different computers, so it cannot depend on how 'unsigned long'
might be defined by some particular C implementation in some particular
OS on some particular computer.

gazelle@shell.xmission.com (Kenny McCormack): Feb 13 10:14AM

In article <656fa1e3-a88d-4ddf-b7c6-af4f27dfc11an@googlegroups.com>,
>On Monday, February 13, 2023 at 7:54:03 AM UTC, Paavo Helde wrote:

>> Also, just forget to also support IPv6, this is a must avoid nowadays.

>I have IPv6 disabled on my laptop.

Smart man.

--
People sleep peaceably in their beds at night only because rough
men stand ready to do violence on their behalf.

George Orwell

Bonita Montero <Bonita.Montero@gmail.com>: Feb 13 11:28AM +0100

Don't progam sockets yoursel, use Boost.ASIO.
That makes much less work and you have more efficient code.

Frederick Virchanza Gotham <cauldwell.thomas@gmail.com>: Feb 13 02:53AM -0800

On Monday, February 13, 2023 at 9:53:24 AM UTC, Paavo Helde wrote:

> between different computers, so it cannot depend on how 'unsigned long'
> might be defined by some particular C implementation in some particular
> OS on some particular computer.

The documentation says that the IP address is stored in network-byte order inside an unsigned long. On systems where long is 8 bytes, this would mean:

[0][0][0][0][a][b][c][d]

However that is not how they store it. They do:

[a][b][c][d][0][0][0][0]

David Brown <david.brown@hesbynett.no>: Feb 13 12:40PM +0100

On 13/02/2023 00:40, Frederick Virchanza Gotham wrote:

> uint32_t my_ip = s_addr;

> And now 'my_ip' will have the actual IP address instead of the four zeroed-out bytes.

> Still though it's a bit mad.

It is not "mad" - the problem is that you are trying to think in terms
of numbers and integers, instead of the data itself. The term
"endianness" does not really apply here - the IP address is a sequence
of 4 octets, not an integer. The "long unsigned" is not a number, it's
just a storage unit for holding the four octets in one lump. The size
doesn't matter as long as it is big enough - I expect the use of "long
unsigned" comes from a history that stretches back to support for the
same code on 16-bit int systems.

"Öö Tiib" <ootiib@hot.ee>: Feb 13 05:21AM -0800

On Monday, 13 February 2023 at 10:53:02 UTC+2, Frederick Virchanza Gotham wrote:
> On this page:

> https://linux.die.net/man/3/htonl

> It says that those two functions take a 'uint32_t' -- not an unsigned long.

The unsigned long you have is simply convertible into uint32_t without
overflow (as its value is less that UINT32_MAX and
std::numeric_limits<uint32_t>::max() whichever you prefer).
If you do so and pass it to ntohl then you get uint32_t that has expected
value. If you do your own byte gymnastics then you get your own
results. Such is life.

Paavo Helde <eesnimi@osa.pri.ee>: Feb 13 03:53PM +0200

13.02.2023 12:53 Frederick Virchanza Gotham kirjutas:
>> might be defined by some particular C implementation in some particular
>> OS on some particular computer.

> The documentation says that the IP address is stored in network-byte order inside an unsigned long.

No, it doesn't. The documentation says the IP address is stored in
network-byte order in 4 octets, or in a 32-bit field.

Whenever you see a documentation page saying the IPv4 address is stored
in an unsigned long, let them know they need to fix their page.

scott@slp53.sl.home (Scott Lurndal): Feb 13 02:55PM

>> So read a little bit more documentation and use the appropriate
>> conversion functions as red floyd suggested.

>I don't see anywhere in the documentation that the IP address will be stored in the way they store it.

https://www.amazon.com/TCP-Illustrated-Vol-Addison-Wesley-Professional/dp/0201633469

Chris Vine <vine.chris@gmail.com>: Feb 13 12:22PM -0800

On Sunday, 12 February 2023 at 23:40:16 UTC, Frederick Virchanza Gotham wrote:
[snip]
> So first I wrote code like this:

> char unsigned const *p = static_cast<char unsigned const*>(static_cast<void const*>(&s_addr));
> cout << static_cast<unsigned>(p[0]) << "." << static_cast<unsigned>(p[1]) << "." << static_cast<unsigned>(p[2]) << "." << static_cast<unsigned>(p[3]);

On a point not relating to your question about endiannes, C++ being what it is this appears to have undefined behaviour. This is not because it breaches the strict aliasing rules (it doesn't by virtue of [basic.lval]/11), but because of the rules on pointer arithmetic.

This is because the array subscript operator implies arithmetic: according to [expr.sub]/1 "the expression E1[E2] is identical (by definition) to *((E1)+(E2))". However, pointer arithmetic is only allowed on pointers pointing into arrays, and only within the range of the array, unless E1 is a null pointer and E2 is 0 ([expr.add]/4). Here, s_addr is required to be composed of contiguous bytes but this may take the form of a 32-bit scalar rather than formally of an array of unsigned char meeting the definition in [dcl.array].

This seems an oversight in the C++ standard, or at least poor drafting of [expr.add]/4 with respect to its spraying about of undefined behaviour on pointer arithmetic. Leaving aside endianness, this usage is reasonable, is widely employed in practice and has defined behaviour in C. The up side is that it seems highly improbable that any compiler is going to do other than what you expect.

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Monday, February 13, 2023

Digest for comp.lang.c++@googlegroups.com - 15 updates in 1 topic

No comments:

Blog Archive

About Me