Technical Difficulties from on Top of the Mountain
  The power in thinking about hard problems.
I am an engineer by trade, not a scientist, because I have a certain amount of impatience with thinking for thinking's sake.  But as I have spent great amounts of time solving simple problems, I have more and more appreciation for big thinking.  It can be a great place to go steal ideas to use tomorrow.

feynman One big thinker was Feynman.  He made many contributions to Physics, finding ways to solve problems that were intractable with traditional tools, but he was also curious about a great many things and was able to predict the future just by wondering about what would happen if you took things to their extremes.  He had predicted molecular machines (or MEMS) by just thinking about a serious of 10:1 reduction levers that just kept getting smaller and smaller.  But it turned out he also had been pondering the future of computing and its intersection with physics way back when.

The thread for this started from watching the quantum computing talk (LIQUi) from LangNext 2014: (channel9)

There was a reference to a list of publications at the end of the talk with a link: Search Results for Dave Wecker, and skimming through the paper 'Improving Quantum Algorithms for Quantum Chemistry', I noticed that the first paper was by Feynman: "Simulating physics with computers".  This actually was a keynote talk at a conference in 1981, but back not everything got recorded for uploading to youtube, so you'll have to be satisfied with a scan of the original paper (or you can pay $40 to Springer Publishing to get a copy of the transcript for which they paid nothing.)

Basically, almost 40 years ago, Feynman did the thought experiment about whether you could simulate quantum physics--and to be complete he considered both classical discrete computers as well as then non-existent quantum computers.  Short answer?  Classical computers would never be able to tackle big enough problems in a scalable way, but if engineers ever figured out how to build usable quantum computers for the physicists to use... well, we might just be in for more interesting times.

  Amazing progress.
It was about eight years ago, as I was travelling back and forth for work, I decided to splurge and buy a top of the line memory card for my "smart" phone. This meant shelling out $100 for the biggest most massive SD card you could buy -- a whole 2GB. That was pretty awesome. I could get all kinds of music, or several videos on there (yes the Palm treo had the wonderful TCPMP media player which would play standard def AVI files scrunched down on the 320x320 screen), and I was all set for plane rides and any other occasion of idleness.

If anyone doesn't think technology is racing along at breakneck speeds, they just have to check flash densities vs what week it is,

Still about $100, not counting the fake flash card from Foxx*

That's over a trillion bits, unless the flash vendors have decided to skimp and only give us 6.75 bits per byte, kind of like the hard drive vendors redefining K=1000 instead of K=1024. Lets hope they're honest, but in any case, that's still a mind blowing number of bits.

Heck, I remember getting my hands on one of these back when I was first playing with a soldering iron,

That's an 8 by 8 array of BITS, that's right 256 whole bits of memory. Awesome. Ok, that was a long time ago. Never mind, I'm going to start copying my entire library of kids videos onto this trillion bit spec which will probably take all night.

* Yes, in this age of Kickstarters and cheap knockoffs, its pretty easy to buy some second hand flash chips, reprogram a SD controller to lie and say you have 128GB, but start failing after filling up the paltry 16GB or whatever size they back it up with. Figure half your customers will be too lazy to return it, and you have yourself a money making machine. So, any listing on Amazon with no reviews, and a price too good to be true, is probably too good to be true.


  Staying inside the lines.
When it comes to dealing with binary data, especially binary packets from public interfaces, you can't take any chances. C was built a long time ago when doing things safe was very expensive, and so they chose speed. For a very long time I built systems where speed was essential as well (in computer graphics an extra instruction can be multiplied by a billion), but I eventually moved into more general computing problems, and at the same time computers got thousands of times faster.

In a modern system, random memory access is now the killer. The CPU has cycles to burn. Some of the lessons I've learned on performance recently taught the exact opposite of what was true twenty years ago.

To that end, a modern style of dealing with containers has to be bounded. You need to know where you are, and what your limits are. There's some issue with what operator++() should do when you reach those limits, but the one answer for sure is that it can't just go stepping past and start stomping on whatever's next. To that end, my current libraries have the following concepts for both buffers and containers:

A Range is a beginning and and end.

These never own the storage, they just indicate where it is. This can be used for both the available space to write to, and for the used space containing data within a buffer or container.

A cursor is a range + an iterator.

This is where I've gone a bit past all the other work in C++ containers, but I think this is important. Modern iterators (at least the fun ones), are bidirectional or random access. That means the beginning is as important to keep track of as the end. And copying a cursor should not narrow you to the space you had left, but should allow the copy to head backwards to the front just as easily as progressing to the end.

This also gives us a great data structure for those algorithms like std::rotate() that operate on three iterators.

There have been a lot of people banging around on this for some time. Andrei Alexandrescu wrote a great paper On Iteration that had lots of stuff to say about his implementation of containers and interators for D.

Labels: ,

  The fractured pieces of C++
A lot of early programming in C++ was simply objects and polymorphism. This was pretty powerful stuff for those of us used to C, and we build some pretty big things with it. But after implementing linked lists several times, we began wondering about that other odd uncle in the toolbox: templates.

What I and many others ended up with after diving in there was encumbered lists. This is generally frowned upon by advanced C++ people who point out that there is a standard library with a number of "high quality" containers in them. But we arrived at encumbered lists, because we were making lists of polymorphic objects. The STL containers are all made for uniform types.

To provide a bridge, the boost people threw the PIMPL hammer at it.

As I understand it, the original use of PIMPL was to "hide" full class definition from users of the class (usually at library boundaries), and also to significantly speed up compile times for large systems. Compiling has never been an instantaneous process (sadly). Over my career, I've had to deal with more than one system that took hours to rebuild, so knocking time off is more than an academic curiousity.

Applied to the problem of polymorphic objects in containers, you end up with a PIMPL front-ing object, with a hidden implementation behind an abstract interface. Throw in a smart pointer as well to manage life-cycle and you suddenly have problem of memory coherency, because for your first object allocated you get:

And heaven help you if you have a string or something else dynamic in your object.

I think there's a better way, but its a shame that no one else has actually built anything that gives you the power of containers, the algorythms, but plays well with polymorphic objects.

Labels: , ,

  Auto is for temporaries, not for documentation.
One thing that C++11 added to make life easier for programmers, was the type auto. It is for temporaries, where the compiler can figure out what you wanted:

auto ptr= new std::string("Hello World") ;

You have a pretty good idea what you're going to get back from new, and the compiler can figure it out, so here you don't have to say it twice.

But when you go to call some advanced routine in the standard libraries, and you really don't have any idea what you're going to get back but you want to save the value in a structure, then having the examples use auto is really annoying. This page on std::async left me very little idea what to do if I wanted to create a data structure that captured running threads:

  auto handle = std::async(std::launch::async,
                              parallel_sum, mid, end);
  int sum = parallel_sum(beg, mid);
  return sum + handle.get();
One has to guess at this point that the return values from async are very close to un-nameable, and use templates instead to capture them. This is the very thing that std::function does to wrap up lambdas which are another construct that is practically un-nameable. Interestingly enough, one place auto saves the day when not even template argument deduction would work is with another C++11 feature called initializer lists:
templated_fn({1, 2, 3});  // FAIL - initializer list has no type, and so T cannot be deduced
auto al = {10, 11, 12};   // special magic for auto
though you can get around the template failure by spelling things out:
templated_fn<std::initializer_list<int>>({1, 2, 3});  // OK

Labels: , ,

  C++ has a long way to go for growing up.
In the second day keynote at the 2012 C++ Going Native conference, the head of the ISO C++ standards committee got up and pointed out the relative sizes of C++ language vs java/C# (they were comparable in terms of capabilities and specifications), but then he compared the scale of the C++ standard library vs C# & Java. It wasn't even close. The standard C++ libraries (even with all the new sections added) didn't even come close to one tenth the size of the C#, and Java was even worse.

He then took the independent C++ library groups to task, not for lack of effort, but for failure of consistency. Even within some of the large libraries (like Boost), pieces don't always well work together.

But unfortunately its worse than that.

Maybe I'm just stuck in my ways, but I worry about a lot of low level details. I worry about memory allocations, I worry about buffer overflow attacks. I try to keep a clear eye on system resource use and even kernel calls. So for instance I don't want to turn std::string loose on a socket and let the inbound communication possibly allocate megabytes of memory. I also want to avoid buffer overflow problems, but with a minimum of overhead, so I almost universally refer to messages as buffers which contain a pointer and a size, and I scan through them using smart pointers which check bounds, but use a minimum of overhead:

   buffer_scan pscan( abuffer) ;
   while ( pscan ) { putchar( * (pscan ++)) ; }
Other than coming up with a shorter test case for ending, this isn't much different from how you'd scan through data in C. For iterators on lists, I went even closer to historic C syntax:

  btl::tlist piter( alist) ;
  for ( ; piter ; ++ piter ) { dosomething( * piter ) ; }
But I don't even know what I'm getting myself into when I call into std algorithms. Does std::sort or std::merge allocate and use extra memory? Or are they entirely in place? As I scale up am I going to need O(n), O(log n) or just a few extra bytes? And if these things worry about multi-threading, do I end up acquiring system locks and other things just to get my 20,000 items in order?

And where are basic things like memory mapping a file? Or how do you the equivalent of a printf("%.1f\n", dtmp) using iostreams? The best I could come up with was cout << 0.1 * ( std::floor( 10 * dtmp)) << std::endl. Ugh.

So while I'm happy to read through the documentation and code for Boost, and see if there's anything cool in Poco, I'm more looking for things I can steal, than libraries I can use. And unfortunately that means I'm not helping move the state of C++ libraries forward very fast. But at least for the projects I work on, the interfaces will be solid and consistent.

Labels: , ,

  Lost by looking at the map.
I'm not really sure what the real value of is, but while looking at implementations for continued fractions in C++, my mind wandered down to the Perl 6 section and was sucked into a black hole:

Perl6 allows us to define a custom composition operator. We can then use it with the triangular reduction metaoperator, and map each resulting function with an infinite value for x.

sub infix:<⚬>(&f, &g) { -> $x { &f(&g($x)) } }
sub continued-fraction(@a, @b) { [\⚬] map { @a[$_] + @b[$_] / * }, 0 .. * }

I program a fair amount in Perl 5 and I don't recognize a thing here. I remember seeing a bit of an interview of Larry Wall, and he was talking about how much fun he was having posting code to I'm going to blame this one on him.


  Relearning sorting.
I had a prospective client ask me to provide some code samples doing sorting. They asked for three different sorts, but I ended up doing 12. Ok, so I kind of got into it.

The first version (testsort1.cpp), had some warts. I used inline macros for some boilerplate expansion, inheritance for the different sorters, and I used a self organizing global singleton list to collect the tasks to run. Plus the version I turned in had an error in one of the three versions of merge sort I wrote. Not that surprising that they didn't call back.

But I decided to press on, and see if I could do better. For one thing, the timing I was getting from shell sort (one of my favorites) was not that great; and I was watching a lot of industry experts in online conferences railing against inheritance (see Sean Parent, an Adobe principal scientist, talk about how Inheritance Is The Base Class of Evil). So my second version (testsort2.cpp), used initializer lists, but needed this really ugly adapter to deal with having two different signatures for the sort functors depending on whether they operated on iterators or raw pointers. I also had a little fight with formatting floating point numbers with iostreams, but that's a whole other story.

Finally, after running over the type erasure/pimpl pattern several times in the Going Native 2013 lectures, I decided to give it a try. Its a really convoluted strategy. Basically, instead of having an abstract base class and a descendant with its own data and extra stuff that implements the base interface, you create a concrete object that doesn't do anything, but has a safe pointer to a virtual base class, which is implemented by a descendant adapter class which is templated to dispatch all its calls to a helper object which it owns. Hopefully the compiler is cleaning up most of this mess.

Anyways, I implemented all of this in version three (testsort3.cpp) which I think is pretty good. I also templated some of the sort routines, though I should use the traits of containers to make the interface a little nicer. One interesting thing though were the results.

I ran GCC and Clang on a 1U server first, and realized these days that -O0 is pretty horrible, especially on the iterator based code:

CompilernBubble SortSelectionInsertion Sort
gcc -O0100002515.88836.98711.25156.54304.9860.87
gcc -O210000191.08182.4173.08173.0722.2922.17
gcc -O310000197.23213.6587.4987.4522.3522.2
Clang -O0100001952.49686.21675.17158.1287.8179.07
Clang -O210000215.3210.1263.9363.9222.1522.08
Clang -O310000212.86213.7263.6763.6722.4922.51

Thankfully with some optimization the difference between iterators and raw pointers is down to noise, though it also seems these days that -O3 vs -O2 is pretty much a wash.

It was also amazing how wide the spread was between the worst and best sorts:

Compilern BubbleSelectionInsertionShellMerge buf-Mergeqsort()std::sort
gcc -O2500005226.384353.58576.494.447.675.156.753.3
gcc -O3500005347.932198.94580.264.847.613.976.883.36
clang -O2500005361.561587.31578.194.777.364.816.973.12
clang -O3500005345.021587.3584.164.598.134.836.883.18

Everybody knows that bubble sort is just the worst sort ever, but I was surprised that it was 1000X worse than the cluster of the fast ones. Even selection and insertion sort are way out there, being 300X and 100X times the best times. But I was also curious how well the fast ones would fare as the set sizes continued to grow.

So I pushed on into bigger datasets with the fast ones (sorry, I wasn't going to time n=1000000 for bubble sort with it growing at O(n^2)). Letting it run overnight, I let the fast ones push the bounds of the server's physical memory:

Compilern ShellMergebuf-Mergeqsort()std::sort
gcc -O31000000121.52174.8398.2176.6983.99
gcc -O3100000001519.971950.481124.861984.32967.28
gcc -O310000000018622.1821343.2512492.7622463.410886.02
gcc -O3100000000067849.88171505.77204308.42204446.4947208.55
clang -O31000000124.86176.53119.42174.1879.76
clang -O3100000001564.631982.011392.442010.94926.81
clang -O310000000019084.5421694.5915671.1322861.9410585.51
clang -O3100000000073168.251040334.6144162.16187819.6245389.01

Here we see performance between clang and gcc varying a bit more (though not in a consistent fashion). Strangely enough as the set size gets huge, qsort() takes the biggest hit under gcc, while merge sort goes off the rails under clang.

Clearly std::sort is the winner hands down, though shell sort keeps right up with it until the end. This is important to keep in mind as shell sort's implementation is compact (the key loop is 8 lines long), and its non-recursive. Two very useful properties in embedded applications. It also remains mysterious, as academia can still not explain why it works so well. That's probably one of the reasons I like it so much.

Ok, and now for some pretty pictures:
gcc clang

And a comparison between the timings of gcc and clang for shell & std::sort. I think the units are in ns/N.

Not as fancy as using Google Visualization charts live like in this benchmark of std::containers, but it will have to do for now.

Labels: , ,

Life in the middle of nowhere, remote programming to try and support it, startups, children, and some tinkering when I get a chance.

January 2004 / February 2004 / March 2004 / April 2004 / May 2004 / June 2004 / July 2004 / August 2004 / September 2004 / October 2004 / November 2004 / December 2004 / January 2005 / February 2005 / March 2005 / April 2005 / May 2005 / June 2005 / July 2005 / August 2005 / September 2005 / October 2005 / November 2005 / December 2005 / January 2006 / February 2006 / March 2006 / April 2006 / May 2006 / June 2006 / July 2006 / August 2006 / September 2006 / October 2006 / November 2006 / December 2006 / January 2007 / February 2007 / March 2007 / April 2007 / June 2007 / July 2007 / August 2007 / September 2007 / October 2007 / November 2007 / December 2007 / January 2008 / May 2008 / June 2008 / August 2008 / February 2009 / August 2009 / February 2010 / February 2011 / March 2011 / October 2011 / March 2012 / July 2013 / August 2013 / September 2013 / October 2013 / November 2013 / December 2013 / December 2014 / February 2015 /

Paul Graham's Essays
You may not want to write in Lisp, but his advise on software, life and business is always worth listening to.
How to save the world
Dave Pollard working on changing the world .. one partially baked idea at a time.
Eric Snowdeal IV - born 15 weeks too soon, now living a normal baby life.
Land and Hold Short
The life of a pilot.

The best of?
Jan '04
The second best villain of all times.

Feb '04
Oops I dropped by satellite.
New Jets create excitement in the air.
The audience is not listening.

Mar '04
Neat chemicals you don't want to mess with.
The Lack of Practise Effect

Apr '04
Scramjets take to the air
Doing dangerous things in the fire.
The Real Way to get a job

May '04
Checking out cool tools (with the kids)
A master geek (Ink Tank flashback)
How to play with your kids

Powered by Blogger