However beautiful the strategy, you should occasionally look at the results.
Americans can always be trusted to do the right thing, once all other possibilities have been exhausted.
Both of these are attributed to Churchill, but as with everything on the internet, YMMV.
sqrt(mean(A .^ 2.)) sqrt(sum(x->x^2, A)/length(A)) norm(A)/sqrt(length(A))The first version is kind of slow (3.5 times slower than the baseline). The second one saves having to generate a temporary copy of the entire array, instead squaring each term one by one as its consumed by sum(), it comes in only four percent slower than the baseline. The third one I expected great things from, as the norm() operation is basically the root of the sum of the squares, but it was actually slower than the other version at thirteen percent over the baseline. So what was the baseline? Well, I wrote it out:
function rms(A) s= 0 @simd for e in A s += e * e end sqrt( s / length(A) ) endThat little bit of magic dust before the for is required, probably to specify that none of the iterations have any dependency on any other, and let the compiler go crazy. I actually wrote this in C as well, using two different styles (tranditional and performant), and while there was a small variation in the timings (one was a smidge faster, and the other a sliver slower), the julia version and the C versions were practically identical for the clang version. The gcc version didn't do as well. But the whole ordeal with compile flags and such is a story for another time. Reference: benchmark_rms.jl Pluto notebook.
Labels: benchmark, julia, performance
using BenchmarkTools using Random using Statistics using LinearAlgebra using Plots A = 2 * rand(10^7) T_bench= @benchmark sqrt(mean(A .^ 2.)) T_bench2= @benchmark sqrt(sum(x->x*x, A)/ length( A )) T_bench3= @benchmark norm(A) / sqrt(length(A)) histogram( T_bench.times )Inlining C code, though the video cut the right edge off and I had to guess at what was missing,
using Libdl C_code = """ #include <stddef.h> #include <math.h> double c_rms(size_t n, double * X) { double s= 0.0 ; for ( size_t i= n ; ( i -- ) ; X ++ ) { s += ( *X * *X ) ; } return sqrt( s / n ) ; } double c_rmse(size_t n, double * X) { double s= 0.0 ; for ( size_t i= 0 ; ( i < n ) ; i ++ ) { s += X[i] * X[i] ; } return sqrt( s / n ) ; } """ const Clib = tempname() open( `gcc -fPIC -O3 -msse3 -xc -shared -ffast-math -o $(Clib * "." * Libdl.dlext) -`, "w" ) do f print(f, C_code) end c_rms( X::Array{Float64}) = ccall((:c_rms, C_lib), Float64, (Csize_t, Ptr{Float64},), length(X), X ) c_rmse( X::Array{Float64}) = ccall((:c_rmse, C_lib), Float64, (Csize_t, Ptr{Float64},), length(X), X ) c_rms( A )And finally, some parallel coding in Julia,
function rms(A) s = zero(eltype(A)) # generic versiion @simd for e in A s += e * e end sqrt( s / length(A) ) endTo try these pluto notebooks out without having to have Julia running locally, there's a Binder transform here, but I think I may eventually setup a pluto instance on my server.
Labels: julia
Feb '04
Oops I dropped by satellite.
New Jets create excitement in the air.
The audience is not listening.
Mar '04
Neat chemicals you don't want to mess with.
The Lack of Practise Effect
Apr '04
Scramjets take to the air
Doing dangerous things in the fire.
The Real Way to get a job
May '04
Checking out cool tools (with the kids)
A master geek (Ink Tank flashback)
How to play with your kids