Thursday, September 24, 2020

Digest for comp.lang.c++@googlegroups.com - 5 updates in 2 topics

Frederick Gotham <cauldwell.thomas@gmail.com>: Sep 24 02:15AM -0700

I tried to post this here yesterday but it ended up in comp.lang.c because the new Google Groups interface can't handle a + sign in the name of a group.
 
I'm trying to write a function just like "set_intersection", except that the containers don't need to be sorted.
 
#include <type_traits> /* remove_cv, remove_reference */
#include <algorithm> /* sort */
 
template<class InputIt1, class InputIt2, class OutputIt>
OutputIt set_intersection_unsorted(InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2,
OutputIt d_first)
{
/* The iterators might be plain ol' pointers, e.g. char* */
/* To keep compatibility with C++11, don't use 'remove_cv_t' */

typedef typename std::remove_cv< typename std::remove_reference<decltype(*first1)>::type >::type type1;
typedef typename std::remove_cv< typename std::remove_reference<decltype(*first2)>::type >::type type2;

vector<type1> container1(first1,last1);
vector<type2> container2(first2,last2);
 
std::sort(container1.begin(), container1.end());
std::sort(container2.begin(), container2.end());
 
return std::set_intersection(container1.begin(), container1.end(),
container2.begin(), container2.end(),
d_first);
}
 
 
Does this look okay? Is there a better way of doing it?
Paavo Helde <myfirstname@osa.pri.ee>: Sep 24 10:07PM +0300

24.09.2020 12:15 Frederick Gotham kirjutas:
> d_first);
> }
 
> Does this look okay? Is there a better way of doing it?
 
The above seems OK, but let's see if it can be enhanced.
 
Let's say the lengths of ranges are M and N. This algorithm sorts both
ranges, so has complexity O(M*log(M) + N*log(N)).
 
An alternative would be to only sort the shorter range, say M, then
iterate over the longer range and look up values in M via
std::binary_search(). The complexity of that would be O(M*log(M) +
N*log(M)). This is not worse than the original, and should perform
better if M << N. It is also better memory-wise. So this algorithm ought
to be better for this specific task (not tested!)
 
Potential drawbacks are that the result range remains unsorted and the
behavior with repeated elements is different from std::set_intersection.
So it isn't an exact replacement.
"Öö Tiib" <ootiib@hot.ee>: Sep 24 03:29PM -0700

On Thursday, 24 September 2020 12:15:53 UTC+3, Frederick Gotham wrote:
> d_first);
> }
 
> Does this look okay? Is there a better way of doing it?
 
It is generally OK. However it is for that generic case of how
operator<(type1,type2) exists and then that destination range
accepts type1. When I care about performance then I do not make it
generic silver bullet as such don't exist.
 
Instead I take into account every thing about given input sequences,
given type, output sequence and constraints to resource usage.
Isn't it usually same type? Can't that type usually be cheaply hashed?
Are the sequence lengths usually similar or different? Is there plenty
of memory? Can there be several equal values in input sequences? Is
the output iterator really output iterator or just pointer of buffer
that I can sort afterwards? Is it even needed that result is sorted?
 
Making two equal capacity hash multisets and intersecting those and
then sorting the result is bit cheaper in my tests than your algorithm
but uses more memory.
 
Paavo Helde's suggestion is also great when hashing is unavailable
or memory is precious, especially when size difference between
two sequences is major but it takes some trick of marking
already matched elements on case the sequences can contain multiple
equal elements.
Frederick Gotham <cauldwell.thomas@gmail.com>: Sep 24 02:13AM -0700

I've recently started doing web GUI programming.
 
On the web server, I have a PHP script that uses the "exec" function to run my C++ program.
 
My C++ program performs two HTTPS requests, and depending on the data it gets back, it might perform 2 or 3 more HTTPS requests. My program then prints HTML code to stdout. The PHP script takes this HTML and throws it up on the end user's screen as a webpage.
 
My C++ program could fall down in several ways. Any of the HTTPS requests could fail, or return partial (or corrupt) data. There could be an uncaught exception from the networking code, or a segfault in a 3rd party library. It could fail in lots of ways.
 
My C++ code at the moment is quite clean, and I don't want to litter it with error-handling code.
 
One thing I could do is throw an "std::runtime_error" whenever anything goes wrong, then let these exceptions propagate up to 'main', and then in 'main' just restart the whole program.
 
Another option would be to kill my program with "std::exit(EXIT_FAILURE)" when anything goes wrong. Then I would have a Linux shell script that restarts my program. The rationale of the Linux shell script would be:
"Run the C++ program and check that it returns EXIT_SUCCESS. If it doesn't return EXIT_SUCCESS, then try to restart it. If it fails 5 times in a row, stop trying."
 
I would also make it a little more complicated:
"Put a time limit of 4 seconds on the C++ program -- if it runs into 5 seconds then kill it and start it again (up to a max of 5 times)".
 
A simple Linux script to constantly restart a program if it fails looks like this:
 
#!/bin/sh
until my_program; do
echo "Program 'my_program' crashed with exit code $?. Respawning.." >&2
sleep 1
done
 
So next to try 5 times, I could do:
 
#!/bin/sh
succeeded=0
 
for i in {1..5}
do
output=$(./myprogram)
status=$?
 
if [ ${status} -eq 0 ]; then
echo -n ${output} #This prints the HTML to stdout
succeeded=1
break
fi
 
sleep 1
done
 
if [ ${succeeded} -eq 0 ]; then
echo -n "<h2>Error</h2>"
exit 1
fi
 
And then finally to give it a max time of 4 seconds, use the program "timeout" which will exit with status 124 if it times out:
 
#!/bin/sh
succeeded=0
 
for i in {1..5}
do
output=$(timeout --signal SIGKILL 4 ./myprogram)
status=$?
 
if [ ${status} -eq 0 ]; then
echo -n ${output} #This prints the HTML to stdout
succeeded=1
break
fi
 
sleep 1
done
 
if [ ${succeeded} -eq 0 ]; then
echo -n "<h2>Error</h2>"
exit 1
fi
 
And so then in my C++ program, I'd have;
 
inline void exitfail(void) { std::exit(EXIT_FAILURE); }
 
And then in my C++ program if I'm parsing the HTML I get back, and something's wrong:
 
string const html = PerformHTTPSrequest(. . .);
 
size_t const i = html.rfind("<diameter>");
 
if ( string::npos == i ) exitfail();
 
So this way, if my C++ program fails in any way, an entire new process is spawned to try again (which might be the right thing to do if it's a runtime error for example to do with loading a shared library).
 
Any thoughts or advice on this?
Ian Collins <ian-news@hotmail.com>: Sep 25 09:12AM +1200

On 24/09/2020 21:13, Frederick Gotham wrote:
 
> I've recently started doing web GUI programming.
 
> On the web server, I have a PHP script that uses the "exec" function
> to run my C++ program.
 
Why don't you just run it from the web-server?
 
> it gets back, it might perform 2 or 3 more HTTPS requests. My program
> then prints HTML code to stdout. The PHP script takes this HTML and
> throws it up on the end user's screen as a webpage.
 
Same question...
 
> 3rd party library. It could fail in lots of ways.
 
> My C++ code at the moment is quite clean, and I don't want to litter
> it with error-handling code.
 
Bad call. Error handling should be part of the design (and tested).
 
> One thing I could do is throw an "std::runtime_error" whenever
> anything goes wrong, then let these exceptions propagate up to
> 'main', and then in 'main' just restart the whole program.
 
Not an uncommon approach.
 
Your code should handle its errors and pass any failures back to the
web-server.
 
--
Ian.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com.

No comments: