soft and program: comp.lang.c++ - 25 new messages in 16 topics

comp.lang.c++
http://groups.google.com/group/comp.lang.c++?hl=en

comp.lang.c++@googlegroups.com

Today's topics:

==============================================================================
TOPIC: Automating Serialization?
http://groups.google.com/group/comp.lang.c++/t/c2a2010a11f58705?hl=en
==============================================================================

== 1 of 2 ==
Date: Fri, Nov 27 2009 12:51 pm
From: Brian

Recently on the Boost Users list someone started a thread with the
title "Automating Serialization?". I was thinking about copying the
thread over here and then someone in the thread got a little defensive
about things so I decided to discuss it here. I'll use a line of
dashes to
separate the posts from each other. In the middle of the dashes
I'll describe how that post relates to the others.

------------------------------------------ OP
-------------------------------------------

Hi,

By design Boost Serialization requires the user to list each field
that
needs to be serialised.

This is in contrast, for example, with Java and some other languages
where serialisation is supported (kind of) at the language level.

The current approach requires some code duplication. We have to
declare a field. We have to manually (de)serialise it. If we need to
make a change, we make it in at least two distinct places.

Is there any way to automate the process of serialisation, perhaps
harnessing the power of the preprocessor? E.g. we could label the
fields that need to be serialised. Is there anything in Boost that
could help?

Many thanks,
Paul

Paul Bilokon, Vice President
Citigroup | FX - Options Trading | Quants
33 Canada Square | Canary Wharf | Floor 3
London, E14 5LB
Phone: +44 20 798-62191

---------------------------------------- first reply to OP
-----------------------------------

The C++ Middleware Writer doesn't have that problem --
http://webEbenezer.net/comparison.html . (There's a performance
section
on that page using Boost 1.38. We're in the process of updating that
page
using Boost 1.41 and hope to have those results on line in the next
two
weeks.)

---------------------------------- second reply to OP
---------------------------

Stefan Strasser <strasser@uni-bremen.de>
To: boost-users@lists.boost.org
[...]

not that I'm aware of, but I even think that's a good thing.
the serialize() function represents a file format, and you want file
formats
to be stable and not being changed because someone added a runtime
field to a
class.
usually when you do want to add a serialized field you'd also want old
versions of the file still to be readable, so you end up writing
custom
(versioned) deserialization code anyway, even if your language has
built-in
serialization support.

you could use some compile time code generator to write default
serialization
code for you, using e.g. OpenC++, GCC-XML, or Doxygen, but I doubt
those
generated functions would stay there very long.

--------------------------------------- third reply to OP
------------------------------------

On Thu, Nov 26, 2009 at 5:05 AM, Bilokon, Paul <paul.bilokon@citi.com>
wrote:
> Hi,
>
> By design Boost Serialization requires the user to list each field that
> needs to be serialised.
>
> This is in contrast, for example, with Java and some other languages where
> serialisation is supported (kind of) at the language level.

No, Paul, C++ has no reflection and the reason is that unlike Java it
does not define ABI. Consider that in C++ not only the size of the
built-in types is not standardized, but even some operations have
"implementation-defined" semantics.

> The current approach requires some code duplication. We have to declare a
> field. We have to manually (de)serialise it. If we need to make a change, we
> make it in at least two distinct places.

You are thinking in terms of reflection. Think in terms of states and
invariants and it'll make more sense. For example, if you have an
array of items and a pointer, and your invariant is that the pointer
always points the last element in the array, the pointer should not be
serialized.

Emil Dotchevski
Reverge Studios, Inc.
http://www.revergestudios.com/reblog/index.php?n=ReCode

------------------------------- my reply to Stefan Strasser
------------------------

> not that I'm aware of, but I even think that's a good thing.
> the serialize() function represents a file format, and you want file
> formats to be stable and not being changed because someone
> added a runtime field to a class.

"runtime field" ?

> usually when you do want to add a serialized field you'd also want old
> versions of the file still to be readable, so you end up writing custom
> (versioned) deserialization code anyway, even if your language has built-in
> serialization support.

I recommend avoiding the versioning support in Boost Serialization.
It runs counter to good development practices by averting the type
system. Consider for example a class called Account that uses
versioning to support multiple releases of a product. In the usual
case, later releases will have more fields and added complexity
than earlier releases. Support then for a client at an early release,
say 1.1, becomes inefficient since Account is being used to
handle both 1.2 and 1.1 users. If instead Account_11 and
Account_12 are used -- with Account_12 probably derived from
Account_11 -- this weakness is avoided. Additionally this approach
is beneficial from a testing perspective. In a 1.2 server, 1.1
clients are supported using Account_11, which has already been
tested and is not messed with to support 1.2 clients.

>
> you could use some compile time code generator to write default
> serialization code for you, using e.g. OpenC++, GCC-XML, or
> Doxygen, but I doubt those generated functions would stay there
> very long.

I'm not exactly sure what you are saying here, but if a user is
simply improving the names of some of his fields, the functions
won't stay the same very long. Automating this helps with a
common problem of forgetting to update the serialization
functions and the compiler then barking. I don't claim though
that every class should be handled this way. The C++ Middleware
Writer allows users to turn off the automated generation of
marshalling functions if that is desired. In my experience, it is
unusual to turn that functionality off.

------------------------------- my reply to Emil Dotchevski
--------------------

> You are thinking in terms of reflection. Think in terms of states and
> invariants and it'll make more sense. For example, if you have an
> array of items and a pointer, and your invariant is that the pointer
> always points the last element in the array, the pointer should not be
> serialized.
>

I don't know if that is a common situation, but there are two ways to
handle that with the C++ Middleware Writer. One as I mentioned
in another post is to turn off the automatic generation of marshalling
functions. The other is to place those fields which shouldn't be
included in the marshalling process within an #ifdef SERVER_SIDE
-- http://webEbenezer.net/ifdefSERVER.html .
The macro is turned on when servers are built and off when clients
are built. This approach enables servers to have a complete view of
a type and clients to have an accurate, but limited view of the same
type.

--------------------------------Stefan Strasser's reply to me
------------------

> "runtime field" ?

field that exists only at runtime. I was assuming serialization used
for
persistence.
non-serialized field.

>
> I recommend avoiding the versioning support in Boost Serialization.
> It runs counter to good development practices by averting the type
> system. Consider for example a class called Account that uses
> versioning to support multiple releases of a product. In the usual
> case, later releases will have more fields and added complexity
> than earlier releases. Support then for a client at an early release,
> say 1.1, becomes inefficient since Account is being used to

that's a very specific case. you can use different types for
versioning using boost.serialization.
more often you want the evolved types to handle old files/streams/...

> >code for you, using e.g. OpenC++, GCC-XML, or Doxygen, but I doubt those
> >generated functions would stay there very long.
>
> I'm not exactly sure what you are saying here, but if a user is
> simply improving the names of some of his fields, the functions
> won't stay the same very long. Automating this helps with a

the same as above, that most serialization functions you might want to
generate automatically at the start of a project end up being custom
serialization functions anyway.

> common problem of forgetting to update the serialization
> functions and the compiler then barking. I don't claim though
> that every class should be handled this way. The C++ Middleware
> Writer allows users to turn off the automated generation of
> marshalling functions if that is desired. In my experience, it is
> unusual to turn that functionality off.

I don't want to start anything, but you already spend the better part
of your messages on this list advertising your ebenezer thing, so I
don't think we need yet another discussion about it.

-------------------------------------------------------------------------------------------------------

I'm going to reply to Stefan Strasser's last post in a separate
post.

Brian Wood
http://www.webEbenezer.net

== 2 of 2 ==
Date: Fri, Nov 27 2009 1:27 pm
From: Brian

On Nov 27, 2:51 pm, Brian <c...@mailvault.com> wrote:

> > "runtime field" ?
>
> field that exists only at runtime. I was assuming serialization used
> for
> persistence.
> non-serialized field.
>
>
>
> > I recommend avoiding the versioning support in Boost Serialization.
> > It runs counter to good development practices by averting the type
> > system. Consider for example a class called Account that uses
> > versioning to support multiple releases of a product. In the usual
> > case, later releases will have more fields and added complexity
> > than earlier releases. Support then for a client at an early release,
> > say 1.1, becomes inefficient since Account is being used to
>
> that's a very specific case.

I don't think it is a specific case.

> you can use different types for
> versioning using boost.serialization.

Yes, and should in my opinion.

> more often you want the evolved types to handle old files/streams/...

I would agree with that except change the word types to system
or product. I think accomplishing that is best done without
using the versioning of the Boost Serialization library.

>
> > >code for you, using e.g. OpenC++, GCC-XML, or Doxygen, but I doubt those
> > >generated functions would stay there very long.
>
> > I'm not exactly sure what you are saying here, but if a user is
> > simply improving the names of some of his fields, the functions
> > won't stay the same very long. Automating this helps with a
>
> the same as above, that most serialization functions you might want to
> generate automatically at the start of a project end up being custom
> serialization functions anyway.
>
> > common problem of forgetting to update the serialization
> > functions and the compiler then barking. I don't claim though
> > that every class should be handled this way. The C++ Middleware
> > Writer allows users to turn off the automated generation of
> > marshalling functions if that is desired. In my experience, it is
> > unusual to turn that functionality off.
>
> I don't want to start anything, but you already spend the better
> part of your messages on this list advertising your ebenezer thing,
> so I don't think we need yet another discussion about it.
>

This reminds me of something in "The Abolition of Man" by
C. S. Lewis. I'm paraphrasing... With a sort of ghastly
simplicity they castrate and bid the gelding to be fruitful
and multiply. It seems to me he would like it if I never
mentioned that I have experience with a concrete implementation
of the ideas being discussed. I suggest to any who are
sympathetic to his view, that they work to improve the
Boost Serialization libary. For example, the performance
tests that I'm updating using Boost 1.41 show that the
new version of the library hasn't improved the implementation
of deserializing some associative containers like std::set.
At David Abrahams' suggestion, I opened a ticket on this --
https://svn.boost.org/trac/boost/ticket/2945 .

By working on improving the Boost Serialization library,
they will reduce the amount of material that I have to
demonstrate the strengths of the C++ Middleware Writer
over the Boost Serialization library.

Brian Wood
http://www.webEbenezer.net

==============================================================================
TOPIC: Looking for a good memory and CPU profiler
http://groups.google.com/group/comp.lang.c++/t/ac40d4006ea2c71c?hl=en
==============================================================================

== 1 of 2 ==
Date: Fri, Nov 27 2009 12:54 pm
From: Ian Collins

Johnson wrote:
> I am at a stage to test the footprint of my projects, such as the CPU
> usage and memory usage during its running. My project is written in
> standard c++.
> Could anybody please recommend a memory and CPU profiler, easy to learn,
> better open-source and free?

Such tool are invariably tool-chain or platform specific, so you should
try asking in a more relevant forum.

--
Ian Collins

== 2 of 2 ==
Date: Fri, Nov 27 2009 1:02 pm
From: Johnson

Ian Collins wrote:
> Johnson wrote:
>> I am at a stage to test the footprint of my projects, such as the CPU
>> usage and memory usage during its running. My project is written in
>> standard c++.
>> Could anybody please recommend a memory and CPU profiler, easy to
>> learn, better open-source and free?
>
> Such tool are invariably tool-chain or platform specific, so you should
> try asking in a more relevant forum.
>
Thanks for the info, Ian. This project is developed under Microsoft
Visual C++ 2008. Do you have any recommendation for trhe profiler?
BTW, can you recommend me a few relevant forum?

==============================================================================
TOPIC: Boost unit test?
http://groups.google.com/group/comp.lang.c++/t/f33796b358d2b00e?hl=en
==============================================================================

== 1 of 2 ==
Date: Fri, Nov 27 2009 1:15 pm
From: Gert-Jan de Vos

On Nov 27, 9:50 pm, "carl" <carl@.com> wrote:
> I am writing unit-test using boost. Now for all the tests (5 at the moment)
> I need to load two rather large images.
>
> Is it possible to define a Boost unit test function that gets run once
> before all the tests where I can put this loading of images so I don't have
> to do it in each of the tests?

Read about fixtures or the test initialization function in the boost
unit test
documentation. The examples show what you want.

== 2 of 2 ==
Date: Fri, Nov 27 2009 1:33 pm
From: "carl"

"Gert-Jan de Vos" <gert-jan.de.vos@onsneteindhoven.nl> wrote in message
news:7500bf59-9bed-4b6b-9add-7ce3904c398b@o10g2000yqa.googlegroups.com...
On Nov 27, 9:50 pm, "carl" <carl@.com> wrote:
> I am writing unit-test using boost. Now for all the tests (5 at the
> moment)
> I need to load two rather large images.
>
> Is it possible to define a Boost unit test function that gets run once
> before all the tests where I can put this loading of images so I don't
> have
> to do it in each of the tests?

Read about fixtures or the test initialization function in the boost
unit test
documentation. The examples show what you want.

Thanks this was perfect:

http://www.boost.org/doc/libs/1_37_0/libs/test/doc/html/utf/user-guide/fixture/global.html

==============================================================================
TOPIC: I don't have to tell you...
http://groups.google.com/group/comp.lang.c++/t/f615b948e5cca45b?hl=en
==============================================================================

== 1 of 4 ==
Date: Fri, Nov 27 2009 1:18 pm
From: "Alf P. Steinbach"

* Howard Beale:
> Alf P. Steinbach wrote:
>
>> [Now that I've been asked not to just explain the spec, I don't want to
>> talk about it any more]
>
> Perfectly acceptable.

The above alleged quote of me is Howard's invention.

Cheers,

- Alf

== 2 of 4 ==
Date: Fri, Nov 27 2009 1:24 pm
From: "Alf P. Steinbach"

* Pavel:
> Bo Persson wrote:
>> Balog Pal wrote:
>>> "Alf P. Steinbach"<alfps@start.no>
>>>>>>>>> http://www.artima.com/cppsource/nevercall.html
>>>>>>>>> - "Never Call Virtual Functions during Construction or
>>>>>>>>> Destruction"
>>>>>>>>
>>>>>>>> This, however, is total bullshit.
>>>>>>>
>>>>>>> It isn't. It's Item#9 form EC++,
>>>>>>
>>>>>> I don't care about the messenger. The message is bullshit.
>>>>>> Really bad advice. FUD. Even in the context of programming for
>>>>>> constrained embedded systems (there's no connection).
>>>>>
>>>>> I'm confused. Please quote me the part of the message which fits
>>>>> the description, or elaborate.
>>>>
>>>> Every part of the quoted part is meaningless FUD.
>>>>
>>>> So, I haven't looked at the rest. :-)
>>>
>>> And doing so you flushed the baby with the bathwater. FUD is
>>> something that puts down a claim and either not include rationale
>>> at all or creates a phony one. You shouldn't address content such
>>> without reading it.
>>>> The point is that you don't have to care whether the methods are
>>>> virtual or not.
>>>> In C++ it's safe to call them anyway.
>>> ...
>>>> But as an abstract example, if a base class has a non-virtual
>>>> method foo(), that calls a virtual method bar(), and your derived
>>>> class T overrides bar(), then in your T constructor you can call
>>>> foo and foo's call of bar() ends up in T::bar.
>>>>
>>>> It's a not uncommon scenario. The mentioned bugs in Java programs
>>>> are mainly due to this scenario occurring often in actual code. In
>>>> C++ it's no problem. :-)
>>>
>>> It *IS* a problem. One you appear to miss entirely, or turn a
>>> blind eye.
>>> The problem is not of the nature you argue against. Technically
>>> the call works, and what it does is fully defined. And what
>>> happens (IMO) makes more sense too, than say in java.
>>>
>>> The problem is a human one -- when calls to virtuals are done,
>>> directly or indirectly, the expectation is they end up in the most
>>> derived object. In the real-life and not the technical sense.
>>> People are (appear) just not aware that in ctor and dtor different
>>> rules apply, and expect the code just work by magic -- as it does
>>> in every other context.
>>
>> Why, oh why, should the base class constructor be bothered to call
>> code in the derived class?
> Because
>
> 1. It is what the user wants and the compiler has enough information to
> generate this code and thereby satisfy the user (the information about
> object of which most derived class is actually being created is known at
> compile time). Note: if what the user wants is to call the base class's
> virtual function, s/he can always force it by explicit qualification.

Incorrect.

Consider in class T construtor a call to a base class' foo which calls virtual
bar which is overridden in T.

> 2. It allows more efficient implementation (the known-to-me C++
> implementations first write a pointer to virtual table of the base class
> to the object; then override it with the pointer to derived class's
> table; this is unnecessary and unjustified overhead breaking the promise
> of C++ to be as efficient as possible)

Yes, efficiency can be improved.

> 3. The promise of C++ to be as powerful and dangerous to allow the
> programmer to blow up his/her entire leg is broken.

No, you can do whatever you want. But if you want to do nonsensical
initialization you'll have to do it yourself. Not via the language's mechanisms.

> 4. It is proven to work (by Java for example).

Meaningless unless you define "work". Java programs generally have problems here.

Cheers & hth.,

- Alf

== 3 of 4 ==
Date: Fri, Nov 27 2009 9:18 pm
From: Pavel

Alf P. Steinbach wrote:
> * Pavel:
>> Bo Persson wrote:
>>> Balog Pal wrote:
>>>> "Alf P. Steinbach"<alfps@start.no>
>>>>>>>>>> http://www.artima.com/cppsource/nevercall.html
>>>>>>>>>> - "Never Call Virtual Functions during Construction or
>>>>>>>>>> Destruction"
>>>>>>>>>
>>>>>>>>> This, however, is total bullshit.
>>>>>>>>
>>>>>>>> It isn't. It's Item#9 form EC++,
>>>>>>>
>>>>>>> I don't care about the messenger. The message is bullshit.
>>>>>>> Really bad advice. FUD. Even in the context of programming for
>>>>>>> constrained embedded systems (there's no connection).
>>>>>>
>>>>>> I'm confused. Please quote me the part of the message which fits
>>>>>> the description, or elaborate.
>>>>>
>>>>> Every part of the quoted part is meaningless FUD.
>>>>>
>>>>> So, I haven't looked at the rest. :-)
>>>>
>>>> And doing so you flushed the baby with the bathwater. FUD is
>>>> something that puts down a claim and either not include rationale
>>>> at all or creates a phony one. You shouldn't address content such
>>>> without reading it.
>>>>> The point is that you don't have to care whether the methods are
>>>>> virtual or not.
>>>>> In C++ it's safe to call them anyway.
>>>> ...
>>>>> But as an abstract example, if a base class has a non-virtual
>>>>> method foo(), that calls a virtual method bar(), and your derived
>>>>> class T overrides bar(), then in your T constructor you can call
>>>>> foo and foo's call of bar() ends up in T::bar.
>>>>>
>>>>> It's a not uncommon scenario. The mentioned bugs in Java programs
>>>>> are mainly due to this scenario occurring often in actual code. In
>>>>> C++ it's no problem. :-)
>>>>
>>>> It *IS* a problem. One you appear to miss entirely, or turn a
>>>> blind eye.
>>>> The problem is not of the nature you argue against. Technically
>>>> the call works, and what it does is fully defined. And what
>>>> happens (IMO) makes more sense too, than say in java.
>>>>
>>>> The problem is a human one -- when calls to virtuals are done,
>>>> directly or indirectly, the expectation is they end up in the most
>>>> derived object. In the real-life and not the technical sense.
>>>> People are (appear) just not aware that in ctor and dtor different
>>>> rules apply, and expect the code just work by magic -- as it does
>>>> in every other context.
>>>
>>> Why, oh why, should the base class constructor be bothered to call
>>> code in the derived class?
>> Because
>>
>> 1. It is what the user wants and the compiler has enough information
>> to generate this code and thereby satisfy the user (the information
>> about object of which most derived class is actually being created is
>> known at compile time). Note: if what the user wants is to call the
>> base class's virtual function, s/he can always force it by explicit
>> qualification.
>
> Incorrect.
>
> Consider in class T construtor a call to a base class' foo which calls
> virtual bar which is overridden in T.
I am not sure I am following your sentence above. Did you mean this?

class BT { public:
void foo() { bar(); }
virtual void bar() { cout << "BT::bar()\n"; }
};

class T {
T() { foo(); }
};

This does not present a problem in either specs (existing or desired) as
the derived class' bar() is called in both cases. If you meant anything
different could you please show sample code?

>
>
>> 2. It allows more efficient implementation (the known-to-me C++
>> implementations first write a pointer to virtual table of the base
>> class to the object; then override it with the pointer to derived
>> class's table; this is unnecessary and unjustified overhead breaking
>> the promise of C++ to be as efficient as possible)
>
> Yes, efficiency can be improved.
>
>
>> 3. The promise of C++ to be as powerful and dangerous to allow the
>> programmer to blow up his/her entire leg is broken.
>
> No, you can do whatever you want.
Thanks! I guess I did not know that before.

> But if you want to do nonsensical
> initialization you'll have to do it yourself. Not via the language's
> mechanisms.
You can do whatever you want, too, in particular boldly business
requirements "nonsensical". Fortunately for both of us, I am not your
employer.

>
>
>> 4. It is proven to work (by Java for example).
>
> Meaningless unless you define "work". Java programs generally have
> problems here.
Fair enough. I define "Work" as "satisfy requirements" and "work well"
as "work at lesser cost than known alternatives". IMHO this aspect of
initialization (calling virtuals from constructors) works well in Java
and not so well in C++. Does it make more sense now?

>
>
> Cheers & hth.,
>
> - Alf

== 4 of 4 ==
Date: Fri, Nov 27 2009 10:53 pm
From: Joshua Maurice

On Nov 26, 10:19 pm, Howard Beale <n...@none.none> wrote:
> [1] As in the example I gave (and any example that could be given), you
> wouldn't know, by just looking at the code, that you're going to get the
> wrong output. You would have to also know about the order in which
> constructors / destructors are called, which is exactly what is at issue
> here. You may say that the output isn't "wrong," but I personally
> believe that if the area of a circle is pi*r^2 during its entire
> existence, then it's area should also be pi*r^2 when it is being created
> or destroyed. And most programmers, unaware of this little gotcha,
> would say the same thing. You, being a C++ proponent and expert, know
> differently, but most programmers prefer consistent behavior. Yes, I'll
> grant that programmers also want safety, but it is entirely possible for
> a compiler to safely allow virtual methods during construction and
> destruction.

This sounds like the "wrong" position of the circle vs ellipse
problem. Just because in math "all circles are ellipses" does not mean
that "circle should be a subclass of ellipse". It doesn't matter if
the lack of inheritance will surprise "most programmers". In this
case, "most programmers" are wrong, and they will be making a bad
design decision if they went with it.

The writers of the standard, when they wrote the standard, decided to
try and make it difficult to shoot yourself in the foot. Generally,
when one shoots themself in the foot, it's by accident, and generally
by incompetence.

Put another way, "truth" is not democratic. Just because most
programmers think that a circle should "be an" ellipse does not make
it good code.

As another example, what about those programmers who find it
surprising that they can't access an object after it's deleted? Does
that mean manually memory management is bad and we should all be in a
garbage collected environment? Perhaps we should change POSIX as well
to support common anti-patterns, like double checked locking.

Virtual calls made in destructors and constructors do not go down to
the unconstructed most derived object because this would be a logical
error in nearly all cases. It seems as though you want such calls to
act on unconstructed objects, and in which case you are simply wrong,
as based upon years of evidence.

> [2] Do you know of a way to change the order in which constructors /
> destructors are called? I don't, outside of writing a new compiler.

The order of construction of subobjects is designed as "syntactic
sugar". All of this you could do in C, but it would be a huge pain to
write out and maintain. Thus, the standard writers decided on the most
useful ordering. Frankly, it's quite a good ordering. They decided to
write a "syntactic sugar" which works really well for 99% of cases. If
you're really in that 1% of cases, you can still write out whatever
manually. However, I don't think you actually hit such a case, and are
instead complaining out of your ass that it doesn't do X when no one
actually needs X. (Where to be very clear, X is the order of
construction and destruction. Thus far, you haven't seemed to disagree
with this, just how virtual functions work in them.)

> [4] My compiler doesn't give me any warning when I call a virtual method
> from a constructor or destructor. I don't know what more to tell you.

Yes. I agree that Quality of Implementation issues, like decent
warnings, is a huge problem with modern C++ implementations. (For
example, it annoys me to no end that I cannot find an option on \any\
compiler to flag deleting an incomplete type as a fatal error, or to
flag deleting a void pointer as a fatal error. Personally, I think
it's a shortcoming in the standard and implementations. Both errors
should require a diagnostic, and apart from a requirement from the
standard, any nonshitty compiler should have an option to flag both as
a fatal error turned on by default. I've spent days tracking down
problems from such things.)

==============================================================================
TOPIC: How to get an insertion hint for an unordered associated container?
http://groups.google.com/group/comp.lang.c++/t/21e47512ffcaf8a2?hl=en
==============================================================================

== 1 of 1 ==
Date: Fri, Nov 27 2009 1:24 pm
From: Pavel

James Kanze wrote:
> On Nov 27, 4:30 am, Pavel
> <pauldontspamt...@removeyourself.dontspam.yahoo> wrote:
>> James Kanze wrote:
>>> On Nov 24, 5:24 am, Pavel
>>> <pauldontspamt...@removeyourself.dontspam.yahoo> wrote:
>>>> James Kanze wrote:
>>>>> On Nov 21, 8:34 pm, Pavel
>>
>>> [...]
>>>> they have to re-compute same information in insert() again (at
>>>> least bucket index and hash code). I hope they will use it in
>>>> the future though and that API is there to allow optimization,
>>>> not just for compatibility..
>
>>> Explain how? The iterator doesn't contain the hash code in any
>>> way.
>
>> Iterator may contain anything that addresses the element, in
>> particular hash_code (it does so in one version of GCC
>> hashtable, more precisely, points to the node that contains
>> hash_code).
>
> That's true, but how would that help? The hint has to be
> checked; it's not an absolute. So the insertion code would
> still have to calculate the hash code for the element to be
> inserted. And having done that, having the iterator will not
> make much difference.
>
>> It has to contain something to allow navigation (++, --). For
>> example, it could contain:
>> 1. A handle for constant-time access to the bucket, to be able to find
>> out where the bucket ends (like an index of the bucket in some
>> array/vector/deque or a pointer to it),
>> 2. A handle for constant-time access to element in the bucket (another
>> index or whatever)
>> 3. A handle for constant-time access to the container (to know how to
>> navigate to the next bucket as you need for ++, --). Again, a pointer or
>> similar.
>
>> If buckets contain pointers to actual objects (which sounds
>> feasible as the Standard guarantees the references and
>> pointers to the objects are not invalidated during re-hashes),
>> the above is quite enough to insert the object "at" given
>> iterator or at first available space in the bucket pointed to
>> by the iterator.
>
> You seem to be forgetting that the iterator is only a hint, and
> that the insertion has to work correctly even if the hint isn't
> appropriate.
point taken -- the hint-using code must count hash code.

>>> Most of the hash maps I've written in the past would cache
>>> the last element found, in order to support things like:
>
>>> if ( m.contains(x) ) {
>>> m[x] = 2*m[x] ; // or whatever...
>>> }
>
>> I understand you can optimize the set for the usage pattern
>> you think is common. This comes at cost even for this use,
>> however, as the comparision may be much more expensive than
>> hash-function computation and you guarantee you will have an
>> extra comparison in m[x] at all times. We know we can do
>> without (see above) so why should we live with it.
>
> It's a common usage pattern with my hashed containers: since []
> has a precondition that the entry exists,
You mean -- in your container, not STL? I don't like STL's inserting
things into maps behind my back either, e.g.:

std::map<int, int> m;
int i = m[5];
/* it's not exactly intuitive that we have just created a (5, 0) entry
on the map.. */

That's why I mostly use explicit insert, find etc.

> you often end up
> writing things like the above (and if (m.contains) followed by
> use. With the STL idiom, of course, the above would probably be
> written:
>
> Map::iterator elem = m.find(x);
> if ( elem !== m.end() ) {
> *m = 2 * *m ; // or whatever...
> }
>
> The STL avoids multiple look-up, at the cost of IMHO a less
> natural syntax. Given my syntax, however, caching will be a
> win: you do have the comparison each time, but you at least
> avoid the extra calcuation of the hash code.
Sometimes hash code computation is more expensive than comparison and
sometimes it isn't. Depends on how you calculate hashcode: sometimes it
is feasible to only use a part of the whole that is known to be usually
more distinctive.

>
>>> efficiently. It still required a comparison for each
>>> access, in order to know whether I could use the cached
>>> value, but it did avoid multiple calculations of the hash
>>> code when accessing the same element several times in
>>> succession.
>
>> You are making assumptions that may be very true for your
>> particular problem but I would not use these in a
>> general-purpose library like STL.
>
> I think it more a question of the idioms associated with the
> container, than whether it is general purpose or not. With the
> STL, you use find and then the returned iterator; with mine, you
> use contain(), and then [].
I meant the assumption that hash code calculation is the most expensive
part of the search. Value comparison can be sometimes more expensive and
walking the bucket may take some time (The bucket of size 5 or 6 is
still traverse-able at O(1) but O becomes too big :-))

>> Why not? You can have a bucket and an index in the bucket and
>> the bucket has the index if the last element. If a pointer to
>> the element stored the bucket (think of the bucket as an
>> array/vector of pointers although it does not have to be it)
>> is NULL, insert your element into this spot; otherwise, if the
>> bucket has free space insert it at the end of the bucket;
>> otherwise, you have to re-hash (but you would have anyway;
>> supposedly, you still have "amortized constant" for the
>> average insertion time)
>
> Several points, but the most important one is what I mentionned
> before: the function must work even if the hint is incorrect.
true as admitted above. Still some gain is due.
> And typically, the bucket doesn't have a fixed length: you don't
> rehash because there are more than n elements in a single
> bucket; you rehash because size()/bucket_count() passes a
> pre-established limit.
Yes, poing taken.. what I was alluding to a map implementation that was
not stl compliant; The essense does not change though: please replace
"re-hash" with "add more space to the end of the bucket".

>
> With regards to the original poster's problem (he didn't want to
> construct a complete object if the entry was already there, and
> he didn't want to calculate the hash code multiple times)
I was the original poster BTW :-)

, the
> interface could be extended with a additional function,
> insert_into_bucket, so that he could get the bucket using
> bucket, then iterate over it to see if his target was already
> present, and if not use this new function for the insertion.
> Whether it's worth it is another question: it's a very
> specialized use, and most users would probably be content with
> just insert (which only inserts if not present), despite having
> to create a new complete object.
That's what I ended up for a while (sigh). But it's quite probably I
will have to return to it as the lookup in this map seems to take the
lion share of CPU time and the nature of the problem is such that 99% of
all inserted values get removed by the end-of-day so I never know
whether the given key is already in or not. I need to measure more to
see what actually takes time within this lookup.

>
>>>> What is the benefit of requiring them both be end()? Checking
>>>> for "not found" condition costs one iterator comparison either
>>>> way.. seems like waste to me.
>
>>> No benefit, perhaps, but no real harm either. Ordered and
>>> unordered containers are fundamentally different beasts.
>
>> If you returned two identical iterators pointing to a free
>> spot in the correct bucket (say the first free spot), you
>> would know to insert your element there without any
>> computations whatsoever.
>
> You still need to calculate the hash code of the object being
> inserted, to validate the hint (and in a unique hash table, do
> some comparisons to ensure that the element isn't already
> present). And once you've done that, there's not much
> calculation left anyway.
As I said above, there is some: comparison and potential navigation in
the bucket. It is usually linear and the unpleasant part is that
reasonable bucket length is only guaranteed on average; wherever
selecting a good hash function is difficult (this is not the case in my
current problem), it is possible to accidentally push some 50% of the
whole key population into one bucket and then spend n/4 iterations on
average for each lookup while the average number of elements per bucket
is still below whatever implementation believes is the reasonable limit.

>
> [...]
>> The cost of
>> insertion can be anything -- in case of conflict it may even be some
>> secondary hash or linear or non-linear search in the overflow area or
>> similar. The Standard does not define how exactly the overflows are
>> processed.
>
> It does require buckets, since it exposes this detail at the
> interface level. The guarantees with regards to this part of
> the interface are rather vague, but I would expect that if
> (double)(m.bucket_count()+1)/m.size()< m.max_load_factor(), then:
> size_t b = m.bucket(obj);
> m.insert(obj);
> assert(b == m.bucket(obj));
My reading is opposite:

It says (1) "Keys with the same hash code appear in the same bucket" and
(2) "The number of buckets is automatically increased as elements are
added to an unordered associative container"

So, if there is one bucket now, m.bucket(obj) has to return 0;
if there must be two buckets after after the insertion as per (2), the
inserted element may have gone to the bucket 1 with other its peers (the
bucket may contain elements with more than one hash code it is just that
if two elements have same hash code they must appear in the same bucket
per (1) so if obj had hashcode 2 and 0th bucket contained all elements
with codes 1 and 2, all twos may need to go to the 1st bucket during the
insertion).

> should hold. If so, that pretty much defines how overflows are
> handled.

>
>> To summarize my point:
>
>> The Standard recognizes the hint may be useful (it is still in
>> the API for unordered ass. containers)
>
> Does it recognize a utility, other than compatibility of
> interface (so e.g. you can instantiation an insert_iterator on
> an unordered ass. container)? (I'm not sure I understand the
> meaning of "The iterator q is a hint pointing to where the
> search should start" in the standard. It doesn't make too much
> sense when buckets are used for collision handling, at least to
> me.)
>
>> and I am ok that GCC STL does not use it now -- it or another
>> implementation may do it in the future or I can write it
>> myself and the client code will continue to rely on the
>> Standard-compliant unordered ass. container while enjoying
>> the faster implementation.
>
> I'd be interesting in seeing an actual implementation which does
> use it somehow.
I will post a notice here if I make the change in GNU implementation (I
will be legally obliged to contribute it back anyway). But I will only
work on it if I figure out it would it would make a difference for my
particular problem so it's not guaranteed.
-Pavel

>
>> But, I do have an issue with the requirement to return two
>> end() iterators in equal_range() on "not found" condition
>> instead of such a hint. I think it is a defect in the Standard
>> that limits possible optimizations without good reason.
>
> I would agree that it seems to be a case of overspecification.
> Even if I don't see any possible advantages in returning
> something else at present, that doesn't mean that they can't
> exist, if not now, in the future, and there's no real advantage
> in excluding them.
>
> --
> James Kanze

==============================================================================
TOPIC: How can I sort this set?
http://groups.google.com/group/comp.lang.c++/t/0bd32b5633e0030a?hl=en
==============================================================================

== 1 of 4 ==
Date: Fri, Nov 27 2009 1:47 pm
From: petertwocakes

I have

typedef vector<string> TokenVector
which is an array of string (words) making up a sentence.
For the purposes of my app I need it structured like this, rather than
a single composite string.

then I have

typedef set<TokenVector, compare > TokenVectorSet;

which is a set of such sentences.

I want to sort the set in to alphabetical order, as if I was dealing
with composite strings.

I think I need something like this to use as a compare function:

class compare
{
public:
std::vector<TokenVector> * myObjects;
compare(std::vector<TokenVector>* obj) : myObjects(obj) {}
bool operator ()(const int p1,const int p2)
{
.... construct composite strings from the TokenVectors and compare
them
}
};

Once I get down to operator (), I know to re-construct the strings and
compare them. It's the protocol/syntax of setting up the class/
function I'm struggling with.

Am I barking up the wrong tree? ...the compiler is just going mental!

Thanks for any clues.

== 2 of 4 ==
Date: Fri, Nov 27 2009 3:12 pm
From: Sam

petertwocakes writes:

> Hi
>
> I have
>
> typedef vector<string> TokenVector
> which is an array of string (words) making up a sentence.
> For the purposes of my app I need it structured like this, rather than
> a single composite string.
>
> then I have
>
> typedef set<TokenVector, compare > TokenVectorSet;
>
> which is a set of such sentences.
>
> I want to sort the set in to alphabetical order, as if I was dealing
> with composite strings.
>
> I think I need something like this to use as a compare function:
>
> class compare
> {
> public:
> std::vector<TokenVector> * myObjects;
> compare(std::vector<TokenVector>* obj) : myObjects(obj) {}
> bool operator ()(const int p1,const int p2)
> {
> .... construct composite strings from the TokenVectors and compare
> them
> }
> };

The comparison object of a set compares the actual elements of the set, and
nothing else. Your set contains TokenVector objects, and not ints.

As such the correct prototype for a comparison object that you intend to use
with your actual set should be:

bool operator()(const TokenVector &a, const TokenVector &b) const;

It's arguable whether or not the comparison operator should be a constant
function, or not.

== 3 of 4 ==
Date: Fri, Nov 27 2009 3:06 pm
From: Jiří Paleček

On Fri, 27 Nov 2009 22:47:46 +0100, petertwocakes
<petertwocakes@googlemail.com> wrote:

> Hi

Hello,

> I have
>
> typedef vector<string> TokenVector
> which is an array of string (words) making up a sentence.
> For the purposes of my app I need it structured like this, rather than
> a single composite string.
>
> then I have
>
> typedef set<TokenVector, compare > TokenVectorSet;
>
> which is a set of such sentences.
>
> I want to sort the set in to alphabetical order, as if I was dealing
> with composite strings.
>
> I think I need something like this to use as a compare function:

You mean functor.

> class compare
> {
> public:
> std::vector<TokenVector> * myObjects;

No, this is totally unneeded here.

> compare(std::vector<TokenVector>* obj) : myObjects(obj) {}
> bool operator ()(const int p1,const int p2)

No, the operator should take two "TokenVector const&", and compare them.

> {
> .... construct composite strings from the TokenVectors and compare
> them
> }
> };
>
> Am I barking up the wrong tree? ...the compiler is just going mental!
>
> Thanks for any clues.

Just for other possibilities:

- you can use an ordinary function, and use some library function object
(boost::function or std::pointer_to_binary_function) to eg. use it with a
container.

- you can check if the default (lexicographic) ordering on vectors isn't
consistent with your needs. Note that the lexicographic ordering of lists
of words is the same as the lexicographic ordering of sentences formed by
joining the list with a character with a lower value than any of the
characters used in the words.

Regards
Jiri Palecek

== 4 of 4 ==
Date: Fri, Nov 27 2009 11:06 pm
From: David Harmon

On Fri, 27 Nov 2009 13:47:46 -0800 (PST) in comp.lang.c++, petertwocakes
<petertwocakes@googlemail.com> wrote,
>I have
>
>typedef vector<string> TokenVector
>which is an array of string (words) making up a sentence.
>For the purposes of my app I need it structured like this, rather than
>a single composite string.
>
>then I have
>
>typedef set<TokenVector, compare > TokenVectorSet;
>
>which is a set of such sentences.
>
>I want to sort the set in to alphabetical order, as if I was dealing
>with composite strings.

Make life easy for yourself, construct the composite strings, then use
std::map<string, TokenVector>

==============================================================================
TOPIC: Article on possible improvements to C++
http://groups.google.com/group/comp.lang.c++/t/e46e9b3e07711d05?hl=en
==============================================================================

== 1 of 1 ==
Date: Fri, Nov 27 2009 4:42 pm
From: Paavo Helde

Nick Keighley <nick_keighley_nospam@hotmail.com> wrote in
news:808cc67a-f8ce-4db4-bcf2-65897d504e17@d10g2000yqh.googlegroups.com:

> On 21 Nov, 00:02, Paavo Helde <myfirstn...@osa.pri.ee> wrote:
>> sfuerst <svfue...@gmail.com> wrote in news:4d48f5b8-3aa3-4726-81f9-
>> 12542c22d...@f20g2000prn.googlegroups.com:
>> > On Nov 20, 8:33 am, "Bo Persson" <b...@gmb.dk> wrote:
>> >> sfuerst wrote:
>
>> >> > Hello, I've written an article [...] detailing ten
>> >> > perceived problems with C
>
> <snip>
>
>> >> > 2) Exceptions
>>
>> >> Ok, I'll take on number 2. :-)
>>
>> >> Constructors is just one reason for exceptions. Overloaded
>> >> operators is another one. How would you return your failed code
>> >> from an operator+()?
>>
>> > You are right, basically you are stuck. Exceptions are sometimes
>> > the only way to communicate failure conditions in some cases.
>> > [...] the problem with exceptions is [...] that
>> > they provide a poorly documented interface to functions. In
>> > theory, all functions should list what potential exceptions they
>> > could throw as part of their definition / external documented
>> > interface.
>> [...]
>> > The reason is that far too many
>> > exceptions are possible in any non-trivial code.
>> [...]
>> For reducing the combinatorial explosion, all exceptions propagating
>> out of a library should be derived from std::exception.
>
> which rules out using much of boost

Like which? All Boost exceptions I have seen have been derived from
std::exception. But I admit I have used only a few libraries.

In any case if there are other unrelated exception types, it does not
rule out anything, it just makes life a bit harder.

>> In my experience, most exceptions finally get logged as a text, or
>> displayed to the end user as a text.
>
> not in mine. many exceptions are swallowed silently. Even if they are
> reported to a log file that isn't the primary reason for the
> exception. Think automatic invocation of dtors and passing control
> back up the stack.

This is even easier. If you are using a particular exception type, you
can throw and catch it exactly in the points you like. This has nothing
to do with the original problem raised up in the thread, at least not as
far as I have understood, namely that by using multiple libraries one
often does not know exactly which exceptions might arise, and how to deal
with them. Obivously, if you are using a custom exception just to wind up
the stack, this is no problem.

>
>> So what I need is just to convert
>> every exception to text.
>
> use what()

Yes, if the exception is derived from std::exception. It gets more
difficult when it isn't.

>
>
>> This can be done by a single function, which is
>> called from all catch(...) clauses (... is verbatim here!). The
>> function rethrows and catches the exception, converting it into the
>> text appropriately for all kind of known exception types (language
>> translation can be done at this point as well).
>
> sounds complicated

Not really. The alternative would be to have multiple catch clauses in
each try block. This was actually needed for MSVC++ 6.0 because of a
compiler bug.

I typically have something like this, in pseudocode:

while(true) {
try {
choose-and-process-request;
return-result-to-client;
} catch(...) {
Log("request failed: " + RecoverExceptionTextInCatch());
return-failure-to-client;
}
}

Where RecoverExceptionTextInCatch() might look something like:

std::string RecoverExceptionTextInCatch() {
try {
throw;
} catch(const std::exception& e) {
return e.what();
} catch(const char* e) {
return e? e: "NULL";
#ifdef _MSC_VER
} catch(CException* pe) {
std::string message = TranslateCException(pe);
pe->Delete();
return message;

soft and program

Friday, November 27, 2009

comp.lang.c++ - 25 new messages in 16 topics - digest

No comments:

Blog Archive

About Me