Sunday, October 26, 2014

Digest for comp.lang.c++@googlegroups.com - 24 updates in 5 topics

comp.lang.c++@googlegroups.com Google Groups
Unsure why you received this message? You previously subscribed to digests from this group, but we haven't been sending them for a while. We fixed that, but if you don't want to get these messages, send an email to comp.lang.c+++unsubscribe@googlegroups.com.
Bo Persson <bop@gmb.dk>: Oct 26 12:52AM +0200

On 2014-10-25 19:09, Öö Tiib wrote:
> string literals. Translation company? No. Coder? No. End user? No.
> So non-ASCII string literal is basically red herring and bear trap
> for naive novice developer.
 
"Everybody" doesn't use ASCII. Some of us use EBCDIC since before ASCII
even existed.
 
How is a language standard going to change that?
 
 
Bo Persson
"Öö Tiib" <ootiib@hot.ee>: Oct 25 05:36PM -0700

On Sunday, 26 October 2014 01:52:35 UTC+3, Bo Persson wrote:
> > for naive novice developer.
 
> "Everybody" doesn't use ASCII. Some of us use EBCDIC since before ASCII
> even existed.
 
Sure, everybody doesn't use C++ either but we weren't discussing
string literals in COBOL. ASCII was first used commercially 1963 ...
neither C, C++ nor string literals of those languages did exist
back then.
 
> How is a language standard going to change that?
 
Most internet text content is UTF8, rest is its subset ASCII. How can
some computing device in modern world expect to communicate with rest
of it if it does not know those encodings? So programming language
standard can easily just accept what the objective reality anyway
is.
 
IBM (or is it Lenovo now?) is the General Motors of IT industry and
if it needs its EBCDIC then it may provide "extensions" that support
its EBCDIC, not other way around.
Martijn Lievaart <m@rtij.nl.invlalid>: Oct 26 01:03PM +0100

On Fri, 24 Oct 2014 23:37:21 +0200, Bo Persson wrote:
 
>> big a part in this mess as Microsoft does.
 
> IBM has used EBCDIC for file names on their mainframes since the 1960's.
> No programming language is likely to change that.
 
It's actually EBCDIC-UTF8 now, which more or less reinforces the point
made.
 
M4
David Brown <david.brown@hesbynett.no>: Oct 26 01:09PM +0100

On 25/10/14 19:09, Öö Tiib wrote:
> string literals. Translation company? No. Coder? No. End user? No.
> So non-ASCII string literal is basically red herring and bear trap
> for naive novice developer.
 
That is not remotely true.
 
Most programs are written to be used only in the language of the country
they are written, and are written by people who speak that language. So
when I am writing a program where the user-visible strings will be in
Norwegian, it's because I am writing a single-language program (and it
will /always/ be single language) and I want to write my string literals
using Norwegian letters (æ ø å) - without the fuss and effort of
external translations.
 
When you are working with external translations, or coders that don't
speak the language, then I agree that plain ASCII is the way to go. But
that is not the way that most code is written - most coders want to be
able to write strings (and comments) in their own language.
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Oct 26 03:01PM

On Sun, 26 Oct 2014 13:09:48 +0100
> Most programs are written to be used only in the language of the
> country they are written
 
That seems to me to be most improbable. Casting my mind to all the
devices I have about the home, car or office, and which have some sort
of programmatic control, they all seem to be marketed to more than one
country.
 
I am not sure where a metric on this can be obtained. What were you
basing your conclusion on - is this personal anecdote/experience?
 
> single-language program (and it will /always/ be single language) and
> I want to write my string literals using Norwegian letters (æ ø å) -
> without the fuss and effort of external translations.
 
However, even if you are correct about single-language targets for
"most programs", that does not mean that the user is using the same
codeset as the programmer to represent the alphabet of the language in
question. To have such control of the system in use I guess you are
thinking of embedded devices or single supplier domain specific
computing. But as I say, most embedded devices seem to be marketed to
more than one country.
 
Chris
"Öö Tiib" <ootiib@hot.ee>: Oct 26 09:00AM -0700

On Sunday, 26 October 2014 14:10:01 UTC+2, David Brown wrote:
 
> That is not remotely true.
 
> Most programs are written to be used only in the language of the country
> they are written, and are written by people who speak that language.
 
Most? I have directly opposite observation.
 
Most open source C or C++ software has English comments and names in
code and is either not dealing with texts at all, works English-only
or is internationalized to several languages.
Most commercial C++ software is internationalized to several languages.
 
One reason is that the C++ itself does not support universal encoding.
That u8"Whatever" is pointless and even that is not supported by most
compilers anyway. Correct would be that utf8 text can be immediately
written "اللغة العربية الفصحى" and every other crap should look ugly
like ebcdic"Lenovo stuff".

> will /always/ be single language) and I want to write my string literals
> using Norwegian letters (æ ø å) - without the fuss and effort of
> external translations.
 
I applaud such welcome nationalism ... you have every right to do like
that. The only thing that is strange ... why you, majority, haven't
beaten the C++ standard committee to senses so it has decent UTF-8
support in it?
 
I myself avoid writing for subset of 5 million people when there is
likely similar subset on market of 7 billion people but that is only
mine greedy and mercantile world-view. ;)
 
> When you are working with external translations, or coders that don't
> speak the language, then I agree that plain ASCII is the way to go.
 
Yes, we here are in constant difficulties to find enough decent
engineers so we always want to hire bright people from where ever who
happened to move here. If they can understand our bad English and
we can understand their C++ code then deal it likely is. That is
other reason why we keep all names and comments in English.
 
> that is not the way that most code is written - most coders want to be
> able to write strings (and comments) in their own language.
 
Oh, lets speak about place where there are most coders then?
 
India is a country with at least 4 major local alphabets (Hindi,
Bengali, Punjabi, Beharati) and about 40 major local languages (more
than million of speakers each). There are about 200 times more people
than in Norway and about 400 times more coders than in Norway.
 
One of reason why they have such booming software industry there with
millions and millions of Indian developers is that *none* of them
stubbornly insists to write C or C++ using those alphabets or languages.
The compilers do not accept it and they themselves would be in trouble
to understand and to maintain each others code if they did. Instead they
stick with Latin alphabet, ASCII and English.
Bo Persson <bop@gmb.dk>: Oct 26 08:15PM +0100

On 2014-10-26 17:00, Öö Tiib wrote:
> code and is either not dealing with texts at all, works English-only
> or is internationalized to several languages.
> Most commercial C++ software is internationalized to several languages.
 
Wanna make a guess if "most software" is open source or closed source
inhouse software? :-)
 
Quite often the business is the service produced by the software, not
the software itself.
 
 
Bo Persson
David Brown <david.brown@hesbynett.no>: Oct 26 09:20PM +0100

On 26/10/14 16:01, Chris Vine wrote:
> country.
 
> I am not sure where a metric on this can be obtained. What were you
> basing your conclusion on - is this personal anecdote/experience?
 
I am basing my my argument on the "fact" (meaning I can't give decent
references or statistics - so if you don't agree with me, that's fair
enough) that most programs written are for very small audiences. It is
more common to write code for a single customer, or for a single
specialised application, than to write something that ends up spread all
over the world.
 
In fact, I expect that the majority of programs ever written have not
made it off the developer's own PC.
 
> thinking of embedded devices or single supplier domain specific
> computing. But as I say, most embedded devices seem to be marketed to
> more than one country.
 
If you count in terms of number of embedded devices, then most will be
international - but if you count in terms of the number of types of
embedded devices, or the number of embedded programs sold, then only a
very small proportion get sold internationally. (Of course, most
embedded devices don't have any user-level strings anyway.)
David Brown <david.brown@hesbynett.no>: Oct 26 09:42PM +0100

On 26/10/14 17:00, Öö Tiib wrote:
 
> Most open source C or C++ software has English comments and names in
> code and is either not dealing with texts at all, works English-only
> or is internationalized to several languages.
 
I would agree about open source software. But most software written is
not open source software.
 
> Most commercial C++ software is internationalized to several languages.
 
I disagree (again, I have no statistics to back this up). Most publicly
available commercial software is written in English only, and only
supports English. But most commercially written software is not
publicly available at all - it is written for specific uses. (And a lot
of software that hopes to be international never makes it that far.)
 
However, I don't think that C++ is the most common language for such
software.
 
> compilers anyway. Correct would be that utf8 text can be immediately
> written "اللغة العربية الفصحى" and every other crap should look ugly
> like ebcdic"Lenovo stuff".
 
It doesn't matter what "most compilers" support - for most software, it
matters what /your/ compiler supports. Again I am basing this on my
unsupported claim that software is typically written for specific
purposes and specific uses.
 
On /my/ compiler (gcc), I can write:
 
#include <iostream>
 
int main(void) {
std::cout << "This is a test åøæ ÅØÆ\n";
 
return 0;
}
 
 
The output is:
 
This is a test åøæ ÅØÆ
 
 
That's all I need to be able to write software in Norwegian.
 
> that. The only thing that is strange ... why you, majority, haven't
> beaten the C++ standard committee to senses so it has decent UTF-8
> support in it?
 
See above.
 
 
> I myself avoid writing for subset of 5 million people when there is
> likely similar subset on market of 7 billion people but that is only
> mine greedy and mercantile world-view. ;)
 
If I were writing software for other markets, then I agree. Sometimes I
do - I have written international software in Python (using unicode) and
in embedded systems in C and assembly (with latin-1 or home-made
encodings to handle the limitations of the display).
 
> happened to move here. If they can understand our bad English and
> we can understand their C++ code then deal it likely is. That is
> other reason why we keep all names and comments in English.
 
Certainly that is true. And with an increasing globalisation of the
market, it is becoming increasingly common that the people writing the
software don't speak the language of the users even if the users are all
from one country.
 
> Bengali, Punjabi, Beharati) and about 40 major local languages (more
> than million of speakers each). There are about 200 times more people
> than in Norway and about 400 times more coders than in Norway.
 
The only serious common language in India is English (very handy for the
rest of us - and it's no coincidence that India is a centre of
programming). Indians prefer to code in English to Hindi because it is
easier for them, even if they have a free choice. Indians from
different parts of the country will often talk to each other in English,
even though in theory they have Hindi in common.
 
But if you look at code from European countries that is not targeted
internationally (either for users or for other developers), you'll find
most of the strings and comments will be in their own language. Most
names of functions, variables, classes, etc., will be Anglicised
spellings of words from their own language.
 
Chris Vine <chris@cvine--nospam--.freeserve.co.uk>: Oct 26 09:44PM

On Sun, 26 Oct 2014 21:42:26 +0100
David Brown <david.brown@hesbynett.no> wrote:
[snip]
 
> The output is:
 
> This is a test åøæ ÅØÆ
 
> That's all I need to be able to write software in Norwegian.
 
Maybe, but did you just get lucky or really look it up in the gcc
documentation and have all your compiler switches correctly set?
 
C++ (and C) have the concept of a source character set (the encoding of
the source files) and an execution character set (the encoding for
string literals in the binary) and the two need not be the same. And
they need not be the same as the character set(s) for the locale used by
your output stream and terminal, even in the same machine. The
execution character set is implementation defined.
 
As you use gcc, http://gcc.gnu.org/onlinedocs/cpp/Character-sets.html
suggests you should be OK in assuming UTF-8 as the default for the
encoding of the narrow character execution character set (but you will
be in trouble with the default if your locale codeset is, say,
ISO-8559-15). You can use the -fexec-charset compiler flag to put
something else in the binary though. However, you still have the input
character set to contend with. Here gcc is quite complicated. It first
converts the encoding of the input files passed to it into its own
notion of the source character set. One curiosity is that if the input
charset is not specified via -finput-charset, gcc tries to obtain the
locale character set to perform this conversion:
 
"-finput-charset=charset: Set the input character set, used for
translation from the character set of the input file to the source
character set used by GCC. If the locale does not specify, or GCC
cannot get this information from the locale, the default is UTF-8.
This can be overridden by either the locale or this command line
option. Currently the command line option takes precedence if there's
a conflict. charset can be any encoding supported by the system's
iconv library routine."
 
This means that with gcc source code may not be portable in the absence
of -finput-charset being passed to the compiler. That is not
necessarily problematic where the source is written on the machine on
which the binary runs provided (see the extract above) the source file
is in fact in the locale encoding.
 
It follows that you can get string literals outside the ASCII range to
compile with gcc and subsequently run correctly on your machine, but
only if you are meticulous about your compiler switches (or the planets
are in conjunction with Venus in the ascension). However, everyone I
know writes literals in ASCII and uses dynamic string conversion at
runtime (but I accept people you know don't). The ASCII approach just
works.
 
Chris
Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 26 07:56AM

On Thu, 2014-10-23, James K. Lowden wrote:
> On Wed, 22 Oct 2014 07:29:05 CST
> agent@drrob1.com wrote:
...
 
> (The whole TAOUP is worth reading.)
 
> PMake ? A Tutorial
> http://docs.freebsd.org/44doc/psd/12.make/paper.pdf
 
Also this one by the late Peter Miller:
 
Recursive Make Considered Harmful
http://aegis.sourceforge.net/auug97.pdf
 
> you ignore some of the specifics, you'll get a nice introduction to the
> basic ideas. That background will make your definitive resource, "info
> make" (assuming you're using GNU make), easier to understand.
 
Now I get the urge to write a paper called "The useful subset of Make"
... IME a well-written Makefile is easy to understand, read and use,
but people often seem to get trapped in the exotic constructs which
are really only there for backwards compatibility or rarely
encountered situations.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Francis Glassborow <francis.glassborow@btinternet.com>: Oct 26 07:27AM -0600

On 10/25/2014 2:34 PM, Rosario193 wrote:
>> Rob
 
> there was someone that said that is not good break code in many files
> if not dll...
 
Well if they were talking abiut C++ they do not know very much about it. Breaking code into separate files is normal. Indeed whether you realise it or not all C++ executables rely on multiple files of code (because that is how libraries are provided)
 
Indeed it is usually consider good design to isolate classes into their own files (though sometimes where classes are inter-dependant it makes some sense to place them in the same file.
 
Francis
 
 
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 26 04:27PM

On Sun, 2014-10-26, Francis Glassborow wrote:
 
> On 10/25/2014 2:34 PM, Rosario193 wrote:
...
 
> Indeed it is usually consider good design to isolate classes into
> their own files (though sometimes where classes are inter-dependant it
> makes some sense to place them in the same file.
 
(And sometimes it makes sense to split the implementation of a class
across several source files.)
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Rosario193 <Rosario@invalid.invalid>: Oct 26 09:37PM +0100

On Sun, 26 Oct 2014 07:27:46 CST, Francis Glassborow wrote:
>Breaking code into separate files is normal. Indeed whether
>you realise it or not all C++ executables rely on multiple
>files of code (because that is how libraries are provided)
 
i use monofile for dll... one dll for language etc...
 
one for cpp one for assembly
but possibly i put not too much code there...
 
Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 26 08:21AM

On Wed, 2014-10-22, JiiPee wrote:
>> have to deal with the issue that often.)
 
> I used like 15 years i++. Then forced myself to use ++i and now do not
> have much problem doing it. Our brains get rewired doing new things....
 
Mine does too now and then, but for some reason I could never
learn to like ++i.
 
> I feel good when seeing ++i because i know am doing faster code now :)
 
While I was using ++i, that was what I used to tell myself. But see
above -- is it really faster, if you're using one of the iterators
from the standard library?
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
"Öö Tiib" <ootiib@hot.ee>: Oct 26 02:48AM -0700

On Sunday, 26 October 2014 10:22:05 UTC+2, Jorgen Grahn wrote:
> > have much problem doing it. Our brains get rewired doing new things....
 
> Mine does too now and then, but for some reason I could never
> learn to like ++i.
 
It is again some sort of "taste" issue? Experienced developer should
be capable of reading both forms.
 
 
> While I was using ++i, that was what I used to tell myself. But see
> above -- is it really faster, if you're using one of the iterators
> from the standard library?
 
When one of the '++i' and 'i++' is faster then it is '++i'. However
often neither is faster.
 
Compilers inline 'i++' if they only can and reorder the two
operations "increment value of i" and "use previous value of i" into
"use value of i" and "increment value of i". Also on case of 'for'
the value of 'i++' is not used so they simply discard that part.
 
Compilers do not stop there with optimizations. Novice who debugs
optimized code often asks "why debugger shows me ??? as value of i?".
It is often because 'i' is integral indexing continuous memory and
so optimized out fully. Compiler has replaced 'i++' and 'arr[i]'
with '++parr' and '*parr' and there are no 'i' whatsoever left.
Jorgen Grahn <grahn+nntp@snipabacken.se>: Oct 26 10:14AM

On Sun, 2014-10-26, 嘱 Tiib wrote:
>> learn to like ++i.
 
> It is again some sort of "taste" issue? Experienced developer should
> be capable of reading both forms.
 
Yes, it's a taste issue (and a minor one). I can read both, but I
cannot help preferring one or the other.
 
/Jorgen
 
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Wouter van Ooijen <wouter@voti.nl>: Oct 26 11:45AM +0100

嘱 Tiib schreef op 26-Oct-14 10:48 AM:
> operations "increment value of i" and "use previous value of i" into
> "use value of i" and "increment value of i". Also on case of 'for'
> the value of 'i++' is not used so they simply discard that part.
 
For complex 'i' objects the ++i operator is simpler than the i++
operator because i++ must first make a copy of the old value, then
increment, and then return the copy (by value!). The ++i can simply
increment and then return itself (by reference).
 
class foo {
 
// foo++, dummy int argument makes this the postfix version
foo operator++( int ){
foo temp{ *this };
// do the incrementing here
return temp;
}
 
// ++foo
foo & operator++(){
// do the incrementing here
return *this;
}
 
};
 
Hence at least for *some* types of i the ++i is the preferred form. For
me this is sufficient argument to use ++i whenever both would be OK.
 
Wouter van Ooijen
Bo Persson <bop@gmb.dk>: Oct 26 12:59PM +0100

On 2014-10-26 09:21, Jorgen Grahn wrote:
 
> While I was using ++i, that was what I used to tell myself. But see
> above -- is it really faster, if you're using one of the iterators
> from the standard library?
 
An optimizing compiler is likely to remove any ununsed copies of the
iterator, so probably there is no difference.
 
On the other hand, when there is a difference, I have never seen a case
where i++ is faster.
 
 
Bo Persson
JiiPee <no@notvalid.com>: Oct 26 04:40PM

On 26/10/2014 10:45, Wouter van Ooijen wrote:
 
> Hence at least for *some* types of i the ++i is the preferred form.
> For me this is sufficient argument to use ++i whenever both would be OK.
 
> Wouter van Ooijen
 
yes, because then your code is similar everywhere... not that sometimes
you use ++i and sometimes not. Its good to use the same style everywhere
I think. So if sometimes its required to use ++i, then I would also use
it with integers even though the benefit is small there. Just to keep
the code having the same style everywhere.
David Brown <david.brown@hesbynett.no>: Oct 26 09:13PM +0100

On 26/10/14 11:45, Wouter van Ooijen wrote:
> operator because i++ must first make a copy of the old value, then
> increment, and then return the copy (by value!). The ++i can simply
> increment and then return itself (by reference).
 
This is not nearly as common as the myth has it.
 
First, if you actually need the return value of i++ or ++i, then you
must use the correct one, since the return values are different.
 
Most of the time, you don't need the return value - and the compiler
knows that. It can generate the same code for either.
 
In many cases, even for "complex" objects, ++ operators are inlined in
class definitions, and the compiler will generate optimal code for
either choice.
 
It is only in the case where you have a complex ++ operator that is
defined externally, that there will be any difference. In such cases,
++i is typically more efficient.
woodbrian77@gmail.com: Oct 25 04:42PM -0700

> things that others like about BSD are Dtrace and
> jails. I don't know much about either of those,
> but expect to use Dtrace eventually.
 
I've been trying Dtrace on BSD today. I have to
run dtrace from /boot/kernel for it to kind of work.
When I tried to get a user stack (ustack()) on the
C++ Middleware Writer, it lists a number of addresses
rather than function names. Is there something I
have to do to get the names?
 
And after I stop dtrace with control-c, the C++ Middleware
Writer hangs. I have to restart the back tier (cmw) to
get things working again. Is that to be expected?
 
Brian
Ebenezer Enterprises - So far G-d has helped us.
http://webEbenezer.net
Ian Collins <ian-news@hotmail.com>: Oct 26 02:40PM +1300

> C++ Middleware Writer, it lists a number of addresses
> rather than function names. Is there something I
> have to do to get the names?
 
Capture the process data before dtrace exits. If the process exits
before dtrace is able to read the symbolic data, you will just get the
stack frame addresses. I usually use dtrace -c to run the command I
want to trace. This avoids the lack of symbolic data problem. For
example, a grab from one of my unit tests that intercepts malloc calls:
 
libc.so.1`__pread+0x15
libc.so.1`read_safe+0x38
libc.so.1`walkcontext+0x63
f1`_ZN4test6malloccvPvEv+0x1a0
f1`malloc+0x24
libstdc++.so.6.0.18`_Znwj+0x29
libstdc++.so.6.0.18`_ZNSs4_Rep9_S_createEjjRKSaIcE+0x65
libstdc++.so.6.0.18`_ZNSs12_S_constructIPKcEEPcT_S3_RKSaIcESt20forward_iterator_tag+0x4
libstdc++.so.6.0.18`_ZNSsC1EPKcRKSaIcE+0x41
libcppunit.so`_ZN7CppUnit10TestRunnerC1Ev+0x55
f1`main+0x97
f1`_start+0x72
 
> And after I stop dtrace with control-c, the C++ Middleware
> Writer hangs. I have to restart the back tier (cmw) to
> get things working again. Is that to be expected?
 
No idea, it depends how you are running things.
--
Ian Collins
"Öö Tiib" <ootiib@hot.ee>: Oct 25 04:46PM -0700

On Sunday, 26 October 2014 00:34:15 UTC+3, jacob navia wrote:
> using the hardware advances. Modern processors do get bogged down
> compiling templte ridden code but it is masked with machines with 8/16
> processors and 32 GB of memory.
 
Good C++ does not lose in performance compared to good C. It is easy to
produce badly performing and inefficient programs in C++ and in C and
even in assembler.
 
> Nobody is able to understand 100% of all C++. Not even Stroustroup, that
> after years of efforts was forced to acknowledge that he could not add
> an essential new feature to the language ("concepts" anyone?)
 
The "concepts" was bit too optimistic plan anyway, no language has it.
They still aim to simplify type traits checking and to produce better
diagnostics with templates with reduced plan "concepts lite". What it
is to you? You won't use templates anyway, you use macros that are
often even far worse to debug than templates.
 
> Look, C++ has its strengths but also has a lot of weakness, the
> principal one is its shher SIZE.
 
C++ language itself is indeed rather complex but the binaries that
the compilers produce are not that large. C++ is well-performing and
efficient language in good hands.
 
> Of course (as you point out) you can go to the computer store and buy
> more RAM and accomodate an even bigger language.
 
C++ does not have anything that forces data or code of resulting
application to be bigger than same thing achieved with C.
 
> But there aren't any stores to buy a BRAIN EXTENSION to accomodate the
> ever increasing amount of C++ trivia we are supposed to swallow!
 
What is most difficult in our work (and where I sometimes feel a need
for brain extension) is to understand correctly what the humans want
to do with our programs and why. Once that is clear then to put it
to programming language is lot easier. With C it is just less convenient
than with C++.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: