However beautiful the strategy, you should occasionally look at the results.
Americans can always be trusted to do the right thing, once all other possibilities have been exhausted.
Both of these are attributed to Churchill, but as with everything on the internet, YMMV.
But that's not what this is about. This is about performance.
Once you get all the bad module and dependency stuff behind you, there's some interesting speed available for getting work done. But even with this experiment, the data answers a few questions, and raises some more. But let's setup the problem first.
In Jane Herriman's video (mentioned in the last post), she pretends to be introducing us to Julia, but she instead leads us on a whirlwind tour through all kinds of interesting work. Including benchmarking some functions: built-in, hand written, and external C libraries. Because a few bits were clipped off the edge of her screen shots, I tried to re-create her work from that section, but with my own twist. She was benchmarking mean() or taking the average. I decided to do rms() or the root mean square. This is probably because I'm both a programming nerd, and a power electronics nerd, and rms is one of those useful things you do from time to time to figure out how much delivered energy you're getting from changing voltage. The name basically describes the operations involved, but TeX version would be:
$\sqrt{ \frac{1}{N} \sum_{i=1}^N x_i^2}$
(I looked into putting a pretty figure here, but Chrome doesn't support most of MathML yet, and I didn't figure there would be that many people using firefox circa 2020.)
Julia doesn't have a built-in for rms(), but the forums have a number of suggestions:
sqrt(mean(A .^ 2.)) sqrt(sum(x->x^2, A)/length(A)) norm(A)/sqrt(length(A))The first version is kind of slow (3.5 times slower than the baseline). The second one saves having to generate a temporary copy of the entire array, instead squaring each term one by one as its consumed by sum(), it comes in only four percent slower than the baseline. The third one I expected great things from, as the norm() operation is basically the root of the sum of the squares, but it was actually slower than the other version at thirteen percent over the baseline. So what was the baseline? Well, I wrote it out:
function rms(A) s= 0 @simd for e in A s += e * e end sqrt( s / length(A) ) endThat little bit of magic dust before the for is required, probably to specify that none of the iterations have any dependency on any other, and let the compiler go crazy. I actually wrote this in C as well, using two different styles (tranditional and performant), and while there was a small variation in the timings (one was a smidge faster, and the other a sliver slower), the julia version and the C versions were practically identical for the clang version. The gcc version didn't do as well. But the whole ordeal with compile flags and such is a story for another time.
Reference: benchmark_rms.jl Pluto notebook.
Labels: benchmark, julia, performance
Since I'm doing this all on a whim, I'm still mostly using Steve Brunton's classes as the exercises. I've down shifted to the Begining Scientific Computing series which is kind of review, so I can zip through the lectures faster. Unfortunately the videos are not at all organized, and its somewhat of a puzzle to work out the order. I'm doing my best to document what I think is the progression in the comments of each of my julia files here:
As I've been going along, one thing that has raised it head, is the form of data that plot() like to make time series, vs the form the linear algebra solvers use. The best I've come up with to pull one row of data out of a vector of vectors is [e[1] for e in vu], but I fear this is a copy of the data which makes me sad.
To brush up on my basics, I found this interesting introduction, which wasn't so much an introduction as a tour through a lot of interesting topics like benchmarking, multiple plots in one pane, inline C code, and even simd. Its also just fun watching someone get excited by the ternary operator.
using BenchmarkTools using Random using Statistics using LinearAlgebra using Plots A = 2 * rand(10^7) T_bench= @benchmark sqrt(mean(A .^ 2.)) T_bench2= @benchmark sqrt(sum(x->x*x, A)/ length( A )) T_bench3= @benchmark norm(A) / sqrt(length(A)) histogram( T_bench.times )
Inlining C code, though the video cut the right edge off and I had to guess at what was missing,
using Libdl C_code = """ #include <stddef.h> #include <math.h> double c_rms(size_t n, double * X) { double s= 0.0 ; for ( size_t i= n ; ( i -- ) ; X ++ ) { s += ( *X * *X ) ; } return sqrt( s / n ) ; } double c_rmse(size_t n, double * X) { double s= 0.0 ; for ( size_t i= 0 ; ( i < n ) ; i ++ ) { s += X[i] * X[i] ; } return sqrt( s / n ) ; } """ const Clib = tempname() open( `gcc -fPIC -O3 -msse3 -xc -shared -ffast-math -o $(Clib * "." * Libdl.dlext) -`, "w" ) do f print(f, C_code) end c_rms( X::Array{Float64}) = ccall((:c_rms, C_lib), Float64, (Csize_t, Ptr{Float64},), length(X), X ) c_rmse( X::Array{Float64}) = ccall((:c_rmse, C_lib), Float64, (Csize_t, Ptr{Float64},), length(X), X ) c_rms( A )
And finally, some parallel coding in Julia,
function rms(A) s = zero(eltype(A)) # generic versiion @simd for e in A s += e * e end sqrt( s / length(A) ) end
To try these pluto notebooks out without having to have Julia running locally, there's a Binder transform here, but I think I may eventually setup a pluto instance on my server.
Labels: julia
Feb '04
Oops I dropped by satellite.
New Jets create excitement in the air.
The audience is not listening.
Mar '04
Neat chemicals you don't want to mess with.
The Lack of Practise Effect
Apr '04
Scramjets take to the air
Doing dangerous things in the fire.
The Real Way to get a job
May '04
Checking out cool tools (with the kids)
A master geek (Ink Tank flashback)
How to play with your kids