soft and program: Digest for comp.lang.c++@googlegroups.com

comp.lang.c++@googlegroups.com

Google Groups

Order of type conversions - 2 Updates
Return a transient sequence of results similar to LINQ iterator blocks - 5 Updates
Machine code!!! \o/ - 5 Updates
Advantage or Not? - 8 Updates
Big problem with templates - 1 Update
A "better" C++ - 1 Update

Richard Hartman <rmhartman@gmail.com>: Sep 10 03:37PM -0700

We have an 16-bit signed int, with value of -1.
If it gets cast to a 32 bit unsigned int, does it go:

a) convert to 16 bit unsigned (0xFFFF)
b) convert to 32 bit unsigned (0x0000FFFF)

or

a) convert to 32 bit signed (-1)
b) convert to 32 bit unsigned (0xFFFFFFFF)

and is this order fixed, or undefined (basically left up to the compiler)?

bartekltg <bartekltg@gmail.com>: Sep 11 12:49AM +0200

On 11.09.2015 00:37, Richard Hartman wrote:

> a) convert to 32 bit signed (-1)
> b) convert to 32 bit unsigned (0xFFFFFFFF)

> and is this order fixed, or undefined (basically left up to the compiler)?

4.7 Integral conversions [conv.integral]

2 If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to the source integer (modulo 2
n where n is the number of bits used to represent the unsigned type).

So 0xFFFFFFFF.

bartekltg

Return a transient sequence of results similar to LINQ iterator blocks

Marcel Mueller <news.5.maazl@spamgourmet.org>: Sep 10 10:14PM +0200

Is there a pattern to return a result set from a function without to
fill a temporary container with the results? I.e. a concept similar to
the .Net iterator blocks.

E.g.:

#include <stdio.h>
#include <vector>

using namespace std;

vector<int> even_numbers(vector<int> numbers)
{
vector<int> result;
for (int num : numbers)
if ((num & 1) == 0)
result.push_back(num);
return result;
}

int main()
{
for (int num : even_numbers(vector<int>({ 1,5,3,2,3,6,8 })))
printf("%i\t", num);
}

The function even_numbers just picks the even number from its input. But
it creates a collection with all the results. No problem in this simple
example, but when the input is large transient data instead of vector<>
this is no longer desirable.

So I would prefer to return a virtual container that just supports input
iteration and returns the requested results on the fly.

Of course, I could define my own container class and iterator class each
time, but I am looking for a simpler way to return an object with
required properties, since writing STL compatible containers is not that
easy.

Is there a common pattern for use cases like this?

Marcel

bartekltg <bartekltg@gmail.com>: Sep 10 11:24PM +0200

On 10.09.2015 22:14, Marcel Mueller wrote:
> #include <vector>

> using namespace std;

> vector<int> even_numbers(vector<int> numbers)

vector<int> even_numbers(const vector<int> &numbers)

> required properties, since writing STL compatible containers is not that
> easy.

> Is there a common pattern for use cases like this?

Maybe this will work.

http://www.boost.org/doc/libs/1_59_0/libs/iterator/doc/filter_iterator.html
Look at the examples.

This get you a pair of iterators that can skip, not a container,
so you can't use range based loop, but I think it isn't a big problem.

bartekltg

Luca Risolia <luca.risolia@linux-projects.org>: Sep 10 11:34PM +0200

Il 10/09/2015 22:14, Marcel Mueller ha scritto:
> for (int num : even_numbers(vector<int>({ 1,5,3,2,3,6,8 })))

what's wrong with:
for (int num : even_numbers({ 1,5,3,2,3,6,8 }))

anyway:

> vector<> this is no longer desirable.

> So I would prefer to return a virtual container that just supports
> input iteration and returns the requested results on the fly.

I am not sure I understood your question.

Are you talking about using/returning (a lighter) vector of wrappers,
similar to std::vector<std::reference_wrapper<int>>, for example:

http://en.cppreference.com/w/cpp/utility/functional/reference_wrapper

(see the examples there)

Also, although I do not clearly see what you are trying to achieve,
consider this alternative approach:

template <class F, class... Args>
void for_each_argument(F f, Args&&... args) {
std::array<int, sizeof...(Args)>{(f(std::forward<Args>(args)), 0)...};
}

for_each_argument([](int num) {
if (!(num&1)) printf("%i\t", num);
}, 1, 5, 3, 2, 3, 6, 8);

"Öö Tiib" <ootiib@hot.ee>: Sep 10 02:48PM -0700

On Friday, 11 September 2015 00:24:30 UTC+3, bartekltg wrote:
> > Is there a pattern to return a result set from a function without to
> > fill a temporary container with the results? I.e. a concept similar to
> > the .Net iterator blocks.

...

> Maybe this will work.

> http://www.boost.org/doc/libs/1_59_0/libs/iterator/doc/filter_iterator.html
> Look at the examples.

+1

Also rest of the Boost.Iterator is worth eyeballing if you feel like
needing to make your own iterators.

mark <mark@invalid.invalid>: Sep 11 12:03AM +0200

If you are willing to use Boost (header only):

--------------------------------------------------------------------------
#include <iostream>
#include <vector>

#define BOOST_ALL_NO_LIB
#include <boost/range/adaptors.hpp>

using boost::adaptors::filtered;
using boost::adaptors::transformed;

auto filter_fn = [](const auto& elem) {
return elem % 2 == 0;
};

auto trans_fn = [](const auto& elem) {
return elem * 42;
};

auto print_range = [](const auto& range) {
for(const auto& elem : range) std::cout << elem << " ";
std::cout << std::endl;
};

int main() {
auto input = std::vector<int>({ 1,5,3,2,3,6,8 });
auto filtered_vec = input | filtered([](const auto& elem) {
return elem % 2 == 0;
});
print_range(filtered_vec);
auto filtered_vec2 = input | filtered(filter_fn);
print_range(filtered_vec2);

// can be stuffed into the loop statement
for(const auto& elem : input | filtered(filter_fn))
std::cout << elem << " ";
std::cout << std::endl;

// also multiply filtered elements by 42
auto trans_vec = input | filtered(filter_fn) | transformed(trans_fn);
print_range(trans_vec);
}
--------------------------------------------------------------------------

This is C++14, but with increasing levels of uglification things work
with earlier C++ versions.

"filtered_vec" is exactly what you want. It's not a container, but
rather an adapter that supports iteration.

(Your even check doesn't work on negative numbers on one's complement
platforms.)

Machine code!!! \o/

"Skybuck Flying" <skybuck2000@hotmail.com>: Sep 10 12:48PM +0200

Ah to bad nigga... it almost worked lol:

// FuckThisShit.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

extern const unsigned char _tmain[] = { 0xEB, 0xFE }; // Machine code!!! \o/

int _tmain(int argc, _TCHAR* argv[])
{
printf("dildo\n");
return 0;
}

1>------ Build started: Project: FuckThisShit, Configuration: Debug
Win32 ------
1>Build started 10/9/2015 12:46:38.
1>InitializeBuildStatus:
1> Touching "Debug\FuckThisShit.unsuccessfulbuild".
1>ClCompile:
1> All outputs are up-to-date.
1> FuckThisShit.cpp
1>c:\junk\fuckthisshit\fuckthisshit\fuckthisshit.cpp(10): error C2365:
'wmain' : redefinition; previous definition was 'data variable'
1> c:\junk\fuckthisshit\fuckthisshit\fuckthisshit.cpp(7) : see
declaration of 'wmain'
1>
1>Build FAILED.
1>
1>Time Elapsed 00:00:00.16
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

Perhaps lowering settings in visual studio 2010 might do the trick... but
that'd be cheating ! ;)

Close but no cookie... Keep trying nigga ! ;) =D

"Mr Flibble" wrote in message
news:EO-dnThO_tjShnvInZ2dnUU7-QOdnZ2d@giganews.com...

extern const unsigned char main[] = { 0xEB, 0xFE }; // Machine code!!! \o/

/Flibble

woodbrian77@gmail.com: Sep 10 09:13AM -0700

On Thursday, September 10, 2015 at 5:48:30 AM UTC-5, Skybuck Flying wrote:

Please don't use racial slurs or swear here.

Brian
Ebenezer Enterprises
http://webEbenezer.net

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Sep 10 06:53PM +0100

On 10/09/2015 11:48, Skybuck Flying wrote:

> Perhaps lowering settings in visual studio 2010 might do the trick...
> but that'd be cheating ! ;)

> Close but no cookie... Keep trying nigga ! ;) =D

Can't you read compiler errors fucktard? You are defining the same
symbol twice: my machine code trick defines main to be an array of
opcodes which may or may not work on a particular implementation.

You keep trying and/or go back to school.

/Flibble

woodbrian77@gmail.com: Sep 10 12:14PM -0700

Leigh, please don't swear here.

Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Sep 10 09:04PM +0100

> Leigh, please don't swear here.

Cunting fucknuckles.

/Flibble

Advantage or Not?

MikeCopeland <mrc2323@cox.net>: Sep 09 05:09PM -0700

In article <msqbrh$tvt$1@dont-email.me>, nospam@notanaddress.com says...

> > Given files of several thousand records, each up to 1400
characters,

> Deleting and replacing can get a little more hairy as to whether you
> want to use an index or an iterator, but you must be careful with both,
> because they will no longer point to where you think they do after a delete.

Understood.

> party libraries out there for it that will probably do a better job of
> it then one could do on their own. XML, Json, Binary serializers,
> they're all out there.

These are text data files I'm given (to manipulate and gather data
from). However, I must normalize the data record for parsing and
converting activities, and I can't control the input formatting. 8<{{

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

"Öö Tiib" <ootiib@hot.ee>: Sep 09 09:27PM -0700

On Thursday, 10 September 2015 01:15:56 UTC+3, MikeCopeland wrote:
> scanning every character, deleting many and replacing some with other
> data characters. Is it better (more efficient/faster) to use a string
> iterator or the ".at(pos)" function to do this type of work? TIA

If the result of your scan is always either shorter or of same length
and you need to scan only once then destructive parse might be
fastest. Basically it is scanning by iterating over input buffer with
iterator and same time building the result to same buffer using other
iterator.

Marcel Mueller <news.5.maazl@spamgourmet.org>: Sep 10 11:24AM +0200

On 10.09.15 00.15, MikeCopeland wrote:
> scanning every character, deleting many and replacing some with other
> data characters. Is it better (more efficient/faster) to use a string
> iterator or the ".at(pos)" function to do this type of work? TIA

Any string or vector will end up with a complexity of O(n²) which is
evil for large files.

You could reduce this to O(n log n) by using segmented containers. But
this is still in the order of sorting the entire file content.

If you are looking for a /fast/ solution you need to implement a stream
processor.
I.e. a buffered source stream reads blocks of the original stream, a
buffered output stream writes to the destination file. The stream
processor in the middle forwards all unchanged characters from the
source stream to the destination stream, skips characters to delete and
replaces others. This is an O(n) operation as long as the context
required to decide which characters to keep, delete or replace has a
limited constant size (probably 1400 in your case).
This implementation will never keep the large file in memory at all,
which might be an important property for files with some GB of size.
I.e. it is O(1) with respect to memory usage.
If the latter (memory) does not count, you could read the entire source
file into a buffer and process this one in the same way into the
destination buffer which might be the same since you did not mention
inserts. However, this is likely to be slightly slower on real life
hardware, because the larger working set of your process probably impact
memory cache efficiency.

Another question is which stream buffers to use. Although the standard
buffered iostreams probably fit your needs, I have seen many really slow
iostream implementations. You have to test whether you target platform
if affected here. If the standard implementation is not too bad and the
algorithm for choosing characters to skip or replace is not too complex
you will likely be I/O bound.
Make the stream buffers large enough to keep the I/O subsystem
efficient. A few MB are a good choice to avoid latency problems even on
rotational disks.

Marcel

Jorgen Grahn <grahn+nntp@snipabacken.se>: Sep 10 11:53AM

On Thu, 2015-09-10, MikeCopeland wrote:
> In article <msqbrh$tvt$1@dont-email.me>, nospam@notanaddress.com says...
...

> These are text data files I'm given (to manipulate and gather data
> from). However, I must normalize the data record for parsing and
> converting activities, and I can't control the input formatting. 8<{{

Yes -- and it's not just you; it's a common situation.

Also, given a choice, some of us want to stay away from XML and JSON
(not to mention binary formats). If it's possible to define the
language so that it's easily parsed by Awk, Perl and so it fits well
in a Unix pipeline[0], that's what I do.

(On the other hand, that often makes Perl a better choice than C++ for
manipulating the data.)

/Jorgen

[0] Things like
zcat foo.gz | perl -pe '...' | uniq -c | sort -nr

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

mark <mark@invalid.invalid>: Sep 10 02:13PM +0200

On 2015-09-10 00:15, MikeCopeland wrote:
> scanning every character, deleting many and replacing some with other
> data characters. Is it better (more efficient/faster) to use a string
> iterator or the ".at(pos)" function to do this type of work? TIA

The iterator will usually be a bit faster. There is typically an extra
pointer indirection when at() is used.

E.g., disassembly for loop incrementing each character in the string:

Visual C++ 2015 x64

--- Iterator -----------------------------------------------------
<+0x1830> inc byte ptr [rcx]
<+0x1832> lea rcx,[rcx+1]
<+0x1836> inc rbx
<+0x1839> cmp rbx,rdx
<+0x183c> jne 0x1830
------------------------------------------------------------------

--- .at() --------------------------------------------------------
<+0x17a0> cmp qword ptr [rsp+38h],10h
<+0x17a6> lea rax,[rsp+20h]
<+0x17ab> cmovae rax,qword ptr [rsp+20h]
<+0x17b1> inc byte ptr [rax+rbx]
<+0x17b4> inc rbx
<+0x17b7> cmp rbx,rcx
<+0x17ba> jb 0x17a0
------------------------------------------------------------------

GCC 5.2 x64

--- Iterator -----------------------------------------------------
<+0x1880> add byte ptr [rax],1
<+0x1883> add rax,1
<+0x1887> cmp rcx,rax
<+0x188a> jne 0x1880
------------------------------------------------------------------

--- .at() --------------------------------------------------------
<+0x1871> mov rdx,rax
<+0x1874> add rdx,qword ptr [rsp+30h]
<+0x1879> add rax,1
<+0x187d> add byte ptr [rdx],1
<+0x1880> cmp rcx,rax
<+0x1883> ja 0x1871
------------------------------------------------------------------

Don't do anything that causes data blocks in the string to be moved
around (e.g. by deleting characters). Either do destructive parsing and
keep an output pointer/iterator to the current string or append to a new
output string (preallocate/reserve the output string).

Louis Krupp <lkrupp@nospam.pssw.com.invalid>: Sep 10 07:54AM -0600

On Thu, 10 Sep 2015 11:24:48 +0200, Marcel Mueller
<news.5.maazl@spamgourmet.org> wrote:

<snip>
>processor.
>I.e. a buffered source stream reads blocks of the original stream, a
>buffered output stream writes to the destination file
<snip>

A convenient approach would be to use mmap() to map a block of
virtual memory to the file and then step through the file as one would
step through an array. That makes the coding easier; has anyone had
any experience comparing performance with reading or writing files?

Louis

mark <mark@invalid.invalid>: Sep 10 04:27PM +0200

On 2015-09-10 15:54, Louis Krupp wrote:
> virtual memory to the file and then step through the file as one would
> step through an array. That makes the coding easier; has anyone had
> any experience comparing performance with reading or writing files?

If you do things properly, there is normally little difference. But this
is platform-dependent and you need to set the right flags for your
access pattern (fadvice/madvice).

On Windows, mmap is slower for sequential access (madvice equivalent
missing unless you have >= Win8).

On 32-bit systems, contiguous address space is a limited resource and
mmap isn't going to be simpler for large files.

Christian Gollwitzer <auriocus@gmx.de>: Sep 10 07:17PM +0200

Am 10.09.15 um 02:09 schrieb MikeCopeland:

> These are text data files I'm given (to manipulate and gather data
> from). However, I must normalize the data record for parsing and
> converting activities, and I can't control the input formatting. 8<{{

Understood. Still my advice would be to suck it into a database engine,
like sqlite, and then execute queries instead of lengthy programs. This
has the additional advantage that you can keep the db file around for
ultrafast loading/querying if you happen to work on the same data again.
Many questions you asked in te past would be trivial to do from a
relational database (joins, multiple index etc.)

Christian

Big problem with templates

jacobnavia <jacob@jacob.remcomp.fr>: Sep 10 03:35PM +0200

Le 09/09/2015 18:32, Mike Stump a écrit :
> fixed, you can then remove the flag. Management should be able to
> provide guidance if they plan for the company to be around, if the
> software is to be around.

Thanks for this help Mr Stump. The situation changed now. I have worked
intensively for 10 days and solved all problems... I can't even realize
that I am finished.

Basically it involved in all those problems to go to the definition of
the template where the templated method was defined and put the name of
the template.

./DiskIndex.h:4843:12: error: use of undeclared identifier
'FindParentBranchIndex'
int pbi = FindParentBranchIndex(parent_node, node);
^
this->
replaced with:
int pbi = DiskBasedTree<T>::FindParentBranchIndex(parent_node, node);

This "fixes" the problem.

A "better" C++

bartekltg <bartekltg@gmail.com>: Sep 10 02:39AM +0200

On 10.09.2015 01:27, Stefan Ram wrote:

> Nowadays, when people talk about whether »push_back« is a
> good idea, they usually do this in the context of
> discussions about when to use »emplace_back« instead.

So we taking about a class for which moving is relatively expensive.
And this is exactly the case when we want to avoid relocation.
;-)

Also using reserve is independent form choice between push/emplace_back,
if number of objects is known (or even can be approximate), reserve()
will help. A little.

> important than a maintainable large-scale structure of the
> source code, which then always will allow to do
> micro-optimizations later when deemed necessary.

I agree.

bartekltg

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

soft and program

Thursday, September 10, 2015

Digest for comp.lang.c++@googlegroups.com - 22 updates in 6 topics

No comments:

Blog Archive

About Me