Technical Difficulties from on Top of the Mountain: September 2013

Technical Difficulties from on Top of the Mountain

2013-09-30

Lost by looking at the map.

I'm not really sure what the real value of rosettacode.org is, but while looking at implementations for continued fractions in C++, my mind wandered down to the Perl 6 section and was sucked into a black hole:

Perl6 allows us to define a custom composition operator. We can then use it with the triangular reduction metaoperator, and map each resulting function with an infinite value for x.
sub infix:<⚬>(&f, &g) { -> $x { &f(&g($x)) } }
sub continued-fraction(@a, @b) { [\⚬] map { @a[$_] + @b[$_] / * }, 0 .. * }

I program a fair amount in Perl 5 and I don't recognize a thing here. I remember seeing a bit of an interview of Larry Wall, and he was talking about how much fun he was having posting code to rosettacode.org. I'm going to blame this one on him.

Labels: perl

¶ 10:00 PM 0 comments

2013-09-25

Relearning sorting.

I had a prospective client ask me to provide some code samples doing sorting. They asked for three different sorts, but I ended up doing 12. Ok, so I kind of got into it.

The first version (testsort1.cpp), had some warts. I used inline macros for some boilerplate expansion, inheritance for the different sorters, and I used a self organizing global singleton list to collect the tasks to run. Plus the version I turned in had an error in one of the three versions of merge sort I wrote. Not that surprising that they didn't call back.

But I decided to press on, and see if I could do better. For one thing, the timing I was getting from shell sort (one of my favorites) was not that great; and I was watching a lot of industry experts in online conferences railing against inheritance (see Sean Parent, an Adobe principal scientist, talk about how Inheritance Is The Base Class of Evil). So my second version (testsort2.cpp), used initializer lists, but needed this really ugly adapter to deal with having two different signatures for the sort functors depending on whether they operated on iterators or raw pointers. I also had a little fight with formatting floating point numbers with iostreams, but that's a whole other story.

Finally, after running over the type erasure/pimpl pattern several times in the Going Native 2013 lectures, I decided to give it a try. Its a really convoluted strategy. Basically, instead of having an abstract base class and a descendant with its own data and extra stuff that implements the base interface, you create a concrete object that doesn't do anything, but has a safe pointer to a virtual base class, which is implemented by a descendant adapter class which is templated to dispatch all its calls to a helper object which it owns. Hopefully the compiler is cleaning up most of this mess.

Anyways, I implemented all of this in version three (testsort3.cpp) which I think is pretty good. I also templated some of the sort routines, though I should use the traits of containers to make the interface a little nicer. One interesting thing though were the results.

I ran GCC and Clang on a 1U server first, and realized these days that -O0 is pretty horrible, especially on the iterator based code:

Compiler n Bubble Sort Selection Insertion Sort

Iter Raw Iter Raw Iter Raw

gcc -O0 10000 2515.88 836.98 711.25 156.54 304.98 60.87

gcc -O2 10000 191.08 182.4 173.08 173.07 22.29 22.17

gcc -O3 10000 197.23 213.65 87.49 87.45 22.35 22.2

Clang -O0 10000 1952.49 686.21 675.17 158.1 287.81 79.07

Clang -O2 10000 215.3 210.12 63.93 63.92 22.15 22.08

Clang -O3 10000 212.86 213.72 63.67 63.67 22.49 22.51

Compiler	n	Bubble Sort	Selection	Insertion Sort
Iter	Raw	Iter	Raw	Iter	Raw
gcc -O0	10000	2515.88	836.98	711.25	156.54	304.98	60.87
gcc -O2	10000	191.08	182.4	173.08	173.07	22.29	22.17
gcc -O3	10000	197.23	213.65	87.49	87.45	22.35	22.2
Clang -O0	10000	1952.49	686.21	675.17	158.1	287.81	79.07
Clang -O2	10000	215.3	210.12	63.93	63.92	22.15	22.08
Clang -O3	10000	212.86	213.72	63.67	63.67	22.49	22.51

Thankfully with some optimization the difference between iterators and raw pointers is down to noise, though it also seems these days that -O3 vs -O2 is pretty much a wash.

It was also amazing how wide the spread was between the worst and best sorts:

Compiler n Bubble Selection Insertion Shell Merge buf-Merge qsort() std::sort

gcc -O2 50000 5226.38 4353.58 576.49 4.44 7.67 5.15 6.75 3.3

gcc -O3 50000 5347.93 2198.94 580.26 4.84 7.61 3.97 6.88 3.36

clang -O2 50000 5361.56 1587.31 578.19 4.77 7.36 4.81 6.97 3.12

clang -O3 50000 5345.02 1587.3 584.16 4.59 8.13 4.83 6.88 3.18

Compiler	n	Bubble	Selection	Insertion	Shell	Merge	buf-Merge	qsort()	std::sort
gcc -O2	50000	5226.38	4353.58	576.49	4.44	7.67	5.15	6.75	3.3
gcc -O3	50000	5347.93	2198.94	580.26	4.84	7.61	3.97	6.88	3.36
clang -O2	50000	5361.56	1587.31	578.19	4.77	7.36	4.81	6.97	3.12
clang -O3	50000	5345.02	1587.3	584.16	4.59	8.13	4.83	6.88	3.18

Everybody knows that bubble sort is just the worst sort ever, but I was surprised that it was 1000X worse than the cluster of the fast ones. Even selection and insertion sort are way out there, being 300X and 100X times the best times. But I was also curious how well the fast ones would fare as the set sizes continued to grow.

So I pushed on into bigger datasets with the fast ones (sorry, I wasn't going to time n=1000000 for bubble sort with it growing at O(n^2)). Letting it run overnight, I let the fast ones push the bounds of the server's physical memory:

Compiler n Shell Merge buf-Merge qsort() std::sort

gcc -O3 1000000 121.52 174.83 98.2 176.69 83.99

gcc -O3 10000000 1519.97 1950.48 1124.86 1984.32 967.28

gcc -O3 100000000 18622.18 21343.25 12492.76 22463.4 10886.02

gcc -O3 1000000000 67849.88 171505.77 204308.42 204446.49 47208.55

clang -O3 1000000 124.86 176.53 119.42 174.18 79.76

clang -O3 10000000 1564.63 1982.01 1392.44 2010.94 926.81

clang -O3 100000000 19084.54 21694.59 15671.13 22861.94 10585.51

clang -O3 1000000000 73168.25 1040334.6 144162.16 187819.62 45389.01

Compiler	n	Shell	Merge	buf-Merge	qsort()	std::sort
gcc -O3	1000000	121.52	174.83	98.2	176.69	83.99
gcc -O3	10000000	1519.97	1950.48	1124.86	1984.32	967.28
gcc -O3	100000000	18622.18	21343.25	12492.76	22463.4	10886.02
gcc -O3	1000000000	67849.88	171505.77	204308.42	204446.49	47208.55
clang -O3	1000000	124.86	176.53	119.42	174.18	79.76
clang -O3	10000000	1564.63	1982.01	1392.44	2010.94	926.81
clang -O3	100000000	19084.54	21694.59	15671.13	22861.94	10585.51
clang -O3	1000000000	73168.25	1040334.6	144162.16	187819.62	45389.01

Here we see performance between clang and gcc varying a bit more (though not in a consistent fashion). Strangely enough as the set size gets huge, qsort() takes the biggest hit under gcc, while merge sort goes off the rails under clang.

Clearly std::sort is the winner hands down, though shell sort keeps right up with it until the end. This is important to keep in mind as shell sort's implementation is compact (the key loop is 8 lines long), and its non-recursive. Two very useful properties in embedded applications. It also remains mysterious, as academia can still not explain why it works so well. That's probably one of the reasons I like it so much.

Ok, and now for some pretty pictures:
gcc clang

And a comparison between the timings of gcc and clang for shell & std::sort. I think the units are in ns/N.
diff

Not as fancy as using Google Visualization charts live like in this benchmark of std::containers, but it will have to do for now.

Labels: c++, c++11, sort

¶ 10:00 PM 0 comments

2013-09-09

Pointing the barrel at your foot.

As always, language design is some steps forward, some steps back. You just have to hope that the forward trends larger than the back over time. Some times its hard to tell.

I thought the delegating constructors addition to c++11 was great. Not something I'd need all the time, but it'd probably come in handy once in a while. But as always, there's a hole.

The standard specifies that if a constructor delegates to itself, the program is ill-formed. It also states that in that case no diagnostic is required.
Ref: thenewcpp.wordpress.com

So given something like this:

class   C
{
        public:
                C() { }

                C(int aval) : C('i') { }
                C(char) : C(42) { }

} ;

int main(int N, char ** S)
{
        C       test(1) ;

        return 0 ;
}

gcc 4.8 compiles without warning, and then crashes when you run. Thankfully clang does better:

testerror.cpp:9:13: error: constructor for 'C' creates a delegation cycle [-Wdelegating-ctor-cycles] C(char) : C(42) { } ^ testerror.cpp:8:3: note: it delegates to C(int aval) : C('i') { } ^ testerror.cpp:9:3: note: which delegates to C(char) : C(42) { } ^ 1 error generated.

I think there's a serious chance that gcc could be irrelevant in the coming future.

Labels: c++, c++11, programming

¶ 10:00 PM 0 comments

2013-09-02

Cleaning up the C++ past.

Once apon a time, I bought a C++ library. Shocking I know, people thought they could make money selling libraries. Most people have gotten over that misconception.

But I didn't buy it to use it, I bought it to learn from. I had written my own cross platform networking library which ran on SunOS, HPUX, SGI IRIX, Ultrix, BSD and Windows NT; and it worked really well. But I was now slowly delving into threading and didn't want to suffer through same five year learning curve as I had with networking. I had scratched the surface of pthreads, and I figured if I could get a Rosetta stone, I'd be most of the way there. So several hundred dollars later, I had the Object Space C++ cross platform libraries in my hands.

Their code for threads was great, because windows gives you a lot of high level calls while pthreads was basically parts; and it showed how you could make the windows things out of the pthread parts. But there was more on the CD besides this. There were these strange template things called Containers and Algorithms. These things warped my brain the way Flatland does the first time you read it, and for ten minutes you can visualize four dimensions in your head. But like my grasp of the thermal cycle of a propane refrigerator, after a night of rest, it was gone.

Some time down the road, I was ready to come back to Algorithms, and started off trying to do this:

void  someclass::somemethod(int x)
{
  struct { bool operator(something * ptr) { return ptr->sometest( x ) ; } } anon_test ;
  something * foundptr = stl::find_if( somelist.begin(), somelist.end(), anon_test) ;
}

This didn't work for all kinds of reasons, but even when I knocked down the easy ones (like creating local storage in the struct for x), I ended up hitting language limits like not being able to have an un-named struct, and I abandoned the whole thing.

There were really ugly work-arounds involving horrible bind macros and worse types so that just to compare a list of ints against x, you'd end up doing something like:

  binder2nd testor = bind2nd( greater(), x) ;

Ick. Wasn't going to happen, and for the most part, didn't. Not one project, or company I worked at in the next 15 years ever used STL containers, or algorithms. I'm sure somewhere, somebody used it, but I never ran into it.

Thankfully, the standards body took up all the patchwork that boost had been doing for ten years, and cleaned up C++ to make things easier and better. We got lambdas, closures, and a lot of duct tape like auto and range-for.

So now I can do my previous example in a much cleaner way, and even better, its readable:

void someclass::somemethod( int x)
{
  something * foundptr= std::find_if( somelist.begin(), somelist.end(), [x](something * ptr){ return ptr->sometest( x) ; } ) ;
}

The worst part of this is the little magic: [x] but you can go read about lambda syntax and you'll get it. And for performance, if you're going to use that predicate a lot, you can save it away ahead of time with just a tad more effort:

  auto xtest = [x](something * ptr){ return ptr->sometest( x) ; } ;

I think this is big.

Labels: c++, c++11, lambda

¶ 10:00 PM 0 comments