But that's not what this is about. This is about performance.
Once you get all the bad module and dependency stuff behind you, there's some interesting speed available for getting work done. But even with this experiment, the data answers a few questions, and raises some more. But let's setup the problem first.
In Jane Herriman's video (mentioned in the last post), she pretends to be introducing us to Julia, but she instead leads us on a whirlwind tour through all kinds of interesting work. Including benchmarking some functions: built-in, hand written, and external C libraries. Because a few bits were clipped off the edge of her screen shots, I tried to re-create her work from that section, but with my own twist. She was benchmarking mean() or taking the average. I decided to do rms() or the root mean square. This is probably because I'm both a programming nerd, and a power electronics nerd, and rms is one of those useful things you do from time to time to figure out how much delivered energy you're getting from changing voltage. The name basically describes the operations involved, but TeX version would be:
$\sqrt{ \frac{1}{N} \sum_{i=1}^N x_i^2}$
(I looked into putting a pretty figure here, but Chrome doesn't support most of MathML yet, and I didn't figure there would be that many people using firefox circa 2020.)
Julia doesn't have a built-in for rms(), but the forums have a number of suggestions:
sqrt(mean(A .^ 2.)) sqrt(sum(x->x^2, A)/length(A)) norm(A)/sqrt(length(A))The first version is kind of slow (3.5 times slower than the baseline). The second one saves having to generate a temporary copy of the entire array, instead squaring each term one by one as its consumed by sum(), it comes in only four percent slower than the baseline. The third one I expected great things from, as the norm() operation is basically the root of the sum of the squares, but it was actually slower than the other version at thirteen percent over the baseline. So what was the baseline? Well, I wrote it out:
function rms(A) s= 0 @simd for e in A s += e * e end sqrt( s / length(A) ) endThat little bit of magic dust before the for is required, probably to specify that none of the iterations have any dependency on any other, and let the compiler go crazy. I actually wrote this in C as well, using two different styles (tranditional and performant), and while there was a small variation in the timings (one was a smidge faster, and the other a sliver slower), the julia version and the C versions were practically identical for the clang version. The gcc version didn't do as well. But the whole ordeal with compile flags and such is a story for another time.
Reference: benchmark_rms.jl Pluto notebook.
Labels: benchmark, julia, performance
Feb '04
Oops I dropped by satellite.
New Jets create excitement in the air.
The audience is not listening.
Mar '04
Neat chemicals you don't want to mess with.
The Lack of Practise Effect
Apr '04
Scramjets take to the air
Doing dangerous things in the fire.
The Real Way to get a job
May '04
Checking out cool tools (with the kids)
A master geek (Ink Tank flashback)
How to play with your kids