<div class=leader>
<i class="twa twa-axe"></i><i class="twa twa-carpentry-saw"></i><i class="twa twa-screwdriver"></i><i class="twa twa-wrench"></i><i class="twa twa-hammer"></i><br>
Bootstrapping Julia
# Installing Julia
Recommended method:
- Download an archive from
- Execute `julia` as-is
- Link it to your `$PATH`
Distribution packages usually work well too:
- **Debians&Ubuntus**: `apt install julia`
- **Iris/Aion**: `module add lang/Julia`
# Life in REPL
user@pc $ julia
julia> sqrt(1+1)
julia> println("Well hello there!")
Well hello there!
julia> ?
help?> sqrt
Computes the square root .....
- *If you like notebooks*, Julia kernels are available too (but in comparison
they are quite impractical)
- VSCode extension exists too (feels very much like RStudio)
# REPL modes
Julia interprets some additional keys to make our life easier:
- `?`: help mode
- `;`: shell mode
- `]`: packaging mode (looks like a box!)
- `Backspace`: quits special mode
- `Tab`: autocomplete anything
- `\`... `Tab`: expand math characters
# Managing packages from the package management environment
- Install a package
] add UnicodePlots
- Uninstall a package
] remove UnicodePlots
# Loading libraries, modules and packages
- Load a local file (with shared functions etc.)
- Load a package, add its exports to the global namespace
using UnicodePlots
# <i class="twa twa-light-bulb"> </i> How to write a standalone program?
*Your scripts should communicate well with the environment!*
(that means, among other, you)
#!/usr/bin/env julia
function process_file(filename)
@info "Processing $filename..."
# ... do something ...
if error_detected
@error "something terrible has happened"
for file in ARGS
Correct processing of commandline arguments makes your scripts *repurposable*
and *configurable*.
# <i class="twa twa-light-bulb"></i> Workflow: Make a local environment for your script
- Enter a local project with separate package versions
] activate path/to/project
- Install dependencies of the local project
] instantiate
- Execute a script with the project environment
$ julia --project=path/to/project script.jl
(Project data is stored in `Project.toml`, `Manifest.toml`.)
<div class=leader>
<i class="twa twa-rocket"></i>
<i class="twa twa-rocket"></i>
<i class="twa twa-rocket"></i><br>
Parallel Julia
# Note about MPI
If you're into MPI, you can perfectly use MPI using `MPI.jl` package.
Here we show `Distributed.jl` approach, because:
- it is slightly more user-friendly
- it is super easy to use it for any Julia code
# Julia model of distributed computation
<img src="slides/img/distrib.svg" width="50%">
# Basic parallel processing
**Using `Threads`:**
1. start Julia with parameter `-t N`
2. parallelize (some) loops with `Threads.@threads`
a = zeros(100000)
Threads.@threads for i = eachindex(a)
a[i] = hardfunction(i)
**Using `Distributed`:**
using Distributed
newVector = pmap(myFunction, myVector)
We will use the `Distributed` approach.
# Managing your workers
using Distributed
Running commands on workers:
@spawnat 3 @info "Message from worker"
@spawnat :any myid()
Getting results from workers:
job = @spawnat :any begin sleep(10); return 123+321; end
Cleaning up:
# Processing lots of data items in parallel
datafiles = ["file$i.csv" for i=1:20]
@everywhere function process_file(name)
println("Processing file $name")
# ... do something ...
pmap(process_file, datafiles)
<i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i> Doing it manually:
@sync for f in datafiles
@async @spawnat :any process_file(f)
# Gathering results from workers
items = collect(1:1000)
@everywhere compute_item(i) = 123 + 321*i
pmap(compute_item, items)
<i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i> Doing manually with `@spawnat`:
futures = [@spawnat :any compute_item(item) for item in items]
# How to design for parallelization?
**Recommended way:** *Utilize the high-level looping primitives!*
- use `map`, parallelize by just switching to `pmap`
- use `reduce` or `mapreduce`, parallelize by just switching to `dmapreduce` (DistributedData.jl)
# <i class="twa twa-light-bulb"></i> Parallel → distributed processing
It is very easy to organize *multiple computers* to work for you!
You need a working `ssh` connection:
user@pc1 $ ssh server1
Last login: Wed Jan 13 15:29:34 2021 from 2001:a18:....
user@server $ _
Spawning remote processes on remote machines:
julia> using Distributed
julia> addprocs([("server1", 10), ("pc2", 2)])
**Benefit:** No additional changes to the parallel programs!
<div class=leader>
<i class="twa twa-abacus"></i>
<i class="twa twa-laptop"></i>
<i class="twa twa-desktop-computer"></i>
<i class="twa twa-flag-luxembourg"></i><br>
Utilizing ULHPC <i class="twa twa-light-bulb"></i>
# Reminder: ULHPC (iris)
<img src="slides/img/iris.png" width="30%">
# Running Julia on the computing nodes
Start an allocation and connect to it:
0 [mkratochvil@access1 ~]$ srun -p interactive -t 30 --pty bash -i
(You can also use `si`.)
After some brief time, you should get a shell on a compute node. There you can install and start Julia as usual:
0 [mkratochvil@iris-131 ~](2696005 1N/T/1CN)$ module add lang/Julia
0 [mkratochvil@iris-131 ~](2696005 1N/T/1CN)$ julia
_ _ _(_)_ | Documentation:
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.8.5 (2023-01-08)
_/ |\__'_|_|_|\__'_| | Official release
|__/ |
# Making a HPC-compatible Julia script
Main challenges:
1. discover the available resources
2. spawn worker processes at the right place
using ClusterManagers
addprocs_slurm(parse(Int, ENV["SLURM_NTASKS"]))
# ... continue as usual
# Scheduling an analysis script
Normally, you write a "batch script" and add it to a queue using `sbatch`.
Script in `runAnalysis.sbatch`:
#SBATCH -J MyAnalysisInJulia
#SBATCH -n 10
#SBATCH -c 1
#SBATCH -t 30
#SBATCH --mem-per-cpu 4G
julia runAnalysis.jl
You start the script using:
$ sbatch runAnalysis.sbatch
<div class=leader>
<i class="twa twa-volcano"></i>
<i class="twa twa-mount-fuji"></i>
<i class="twa twa-snow-capped-mountain"></i>
<i class="twa twa-mountain"></i>
<i class="twa twa-sunrise-over-mountains"></i>
Utilizing GPUs
# Note about CUDA
Julia can serve as an extremely user-friendly front-end for CUDA, abstracting all ugly steps that you'd need to do with normal CUDA, yet still leaving enough flexibility to write high-performance low-level compute kernels.
The approach here demonstrates what `CUDA.jl` does.
There's also:
- `AMDGPU.jl`
- `Metal.jl` for <i class="twa twa-green-apple"></i>
- `Vulkan.jl` (less user friendly but works everywhere)
# Using your GPU for accelerating simple stuff
julia> data = randn(10000,10000);
julia> @time data*data;
julia> using CUDA
julia> data = cu(data);
julia> @time data*data;
# What's available?
The "high-level" API spans most of the CU* helper tools:
- broadcasting numerical operations via translation to simple kernels (`.+`, `.*`, `.+=`, `ifelse.`, `sin.`, ...)
- matrix and vector operations using `CUBLAS`
- `CUSOLVER` (solvers, decompositions etc.) via `LinearAlgebra.jl`
- ML ops (in `Flux.jl`): `CUTENSOR`
- `CUSPARSE` via `SparseArrays.jl`
- limited support for reducing operations (`findall`, `findfirst`, `findmin`, ...) -- these do not translate easily to GPU code
- very limited support for array index processing
# Programming kernels in Julia!
CUDA kernels (`__device__` functions) are generated transparently directly from Julia code.
a = cu(someArray)
function myKernel(a)
i = threadIdx().x
a[i] += 1
@cuda threads=length(a) myKernel(a)
Some Julia constructions will not be feasible on the GPU (mainly allocating complex structures); these will trigger a compiler message from `@cuda`.
# Programming kernels -- usual tricks
The amount of threads and blocks is limited by hardware; let's make a
grid-stride loop to process a lot of data quickly!
a = cu(someArray)
b = cu(otherArray)
function applySomeMath(a, b)
index = threadIdx().x + blockDim().x * (blockIdx().x-1)
gridStride = gridDim().x * blockDim().x
for i = index:gridStride:length(a)
a[i] += someMathFunction(b[i])
@cuda threads=1024 blocks=32 applySomeMath(a)
Typical CUDA trade-offs:
- too many blocks won't work, insufficient blocks won't cover your SMs
- too many threads per block will fail or spill to memory (slow), insufficient threads won't allow parallelization/latency hiding in SM
- thread divergence destroys performance
# CUDA.jl interface
Functions available in the kernel:
- `gridDim`, `blockDim`
- `blockIdx`, `threadIdx`
- `warpsize`, `laneid`, `active_mask`
- `sync_threads`, `sync_warp`, `threadfence`, ...
- `vote_all`, `vote_ballot`, `shfl_sync`, ...
Parameters for the `@cuda` spawn:
- `threads=nnn` per block
- `blocks=nnn` per grid
- `shmem=nnn` how much shared memory to request (available via `CuStaticSharedArray`)
# Julia for newcomers
## June 8th, 2022
<div style="top: 6em; left: 0%; position: absolute;">
<img src="theme/img/lcsb_bg.png">
<div style="top: 1em; left: 60%; position: absolute;">
<img src="slides/img/r3-training-logo.png" height="200px">
<img src="slides/img/julia.svg" height="200px">
<h1 style="margin-top:3ex; margin-bottom:3ex;">Julia on HPCs</h1>
Miroslav Kratochvíl, Ph.D.<br>
Laurent Heirendt, Ph.D.<br>
R3 Team - <a href=""></a><br>
<i>Luxembourg Centre for Systems Biomedicine</i>
<link rel="stylesheet" href="">
code {border: 2pt dotted #f80; padding: .4ex; border-radius: .7ex; color:#444; }
.reveal pre code {border: 0; font-size: 18pt; line-height:27pt;}
em {color: #e02;}
li {margin-bottom: 1ex;}
div.leader {font-size:400%; line-height:120%; font-weight:bold; margin: 1em;}
section {padding-bottom: 10em;}
# Motivation first!
*Why is it good to work in compiled language?*
- Programs become much faster for free.
- Even if you use the language as a package glue, at least the glue is not slow.
*What do we gain by having types in the language?*
- Generic programming, and lots of optimization possibilities for the compiler.
*Is Julia ecosystem ready for my needs? <i class="twa twa-thinking-face"></i>*
- Likely. If not, extending the packages is super easy.
- Base includes most of the functionality of Matlab, R and Python with numpy,
and many useful bits of C++
# Why Julia?
<center><img src="slides/img/whyjulia.png" width="80%"></center>
(Source: JuliaCon 2016, Arch D. Robison)
<div class=leader>
<i class="twa twa-blue-circle"></i>
<i class="twa twa-red-circle"></i>
<i class="twa twa-green-circle"></i>
<i class="twa twa-purple-circle"></i><br>
<span style="color:#888">$OTHERLANG</span> to Julia<br>in 15 minutes
# Always remember
- you can `Tab` through almost anything in REPL
- functions have useful help with examples, try `?cat`
- `typeof(something)` may give good info
# Everything has a type that determines storage and value handling
- `Vector{Int}`
[1, 2, 5, 10]
- `Matrix{Float64}`
[1.0 2.0; 2.0 1.0]
- `Tuple`
(1, 2.0, "SomeLabel")
- `Set{Int}`
- `Dict{Int,String}`
# Basic functionality and expectable stuff
Most concepts from C, Python and MATLAB are portable as they are.
Surprising parts:
- arrays are indexed from `1` (for a relatively good reason)
- Arrays: `array[1]`, `array[2:5]`, `array[begin+1:end-1]`, `size`, `length`, `cat`, `vcat`, `hcat`, ...
- code blocks `begin` and `end` with keywords
- you can stuff everything on one line!
- all functions can (and should) be overloaded
- simply add a type annotation to parameter with `::` to distinguish between implementations for different types
- overloading is cheap
- *specialization to known simple types types* is precisely the reason why compiled code can be *fast*
- adding type annotations to code and parameters helps the compiler to do the right thing
# <i class="twa twa-light-bulb"></i> Structured cycles
Using functional-style loops is *much less error-prone* to indexing
- Transform an array, original:
for i=eachindex(arr)
arr[i] = sqrt(arr[i])
map(sqrt, [1,2,3,4,5])
map((x,y) -> (x^2 - exp(y)), [1,2,3], [-1,0,1])
- Summarize an array:
reduce(+, [1,2,3,4,5])
reduce((a,b) -> "$b $a", ["Use", "the Force", "Luke"])
reduce(*, [1 2 3; 4 5 6], dims=1)
**Tricky question (<i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i>):** What is the overhead of the "nice" loops?
# Array-creating loops and generators
julia> [i*10 + j for i = 1:3, j = 1:5]
3×5 Matrix{Int64}:
11 12 13 14 15
21 22 23 24 25
31 32 33 34 35
julia> join(sort([c for word in ["the result is 123", "what's happening?", "stuff"]
for c in word
if isletter(c)]))
julia> Dict('a'+i => i for i=1:26)
Dict{Char, Int64} with 26 entries:
'n' => 13
'f' => 5
# Control flow: subroutines (functions)
- Multi-line function definition
function combine(a,b)
return a + b
- "Mathematical" neater definition
combine(a,b) = a + b
- <i class="twa twa-light-bulb"></i> Definition with types specified (prevents errors, allows optimizations!)
function combine(a::Int, b::Int)::Int
return a + b
function combine(a::Vector, b::Vector)::Vector
return a .+ b
combine(a::String, b::String)::String = "$a and $b"
# Broadcasting over iterable things (aka The Magic Dot)
- Broadcasting operators by prepending a dot
matrix[row, :] .+= vector1 .* vector2
- Broadcasting a function
x = [1,2,3,4]
x' .* x
- Making generators
``` julia
myarray_index = Dict(myarray .=> eachindex(myarray))
<i class="twa twa-light-bulb"></i> The "magic dot" is a shortcut for calling `broadcast(...)`.
# Overview
1. Why would you learn another programming language again?
2. `$OTHERLANG` to Julia in 15 minutes
3. Running distributed Julia on ULHPC
4. Easy GPU programming with CUDA.jl
<div class=leader>
<i class="twa twa-bar-chart"></i>
<i class="twa twa-blue-book"></i>
<i class="twa twa-computer-disk"></i>
<i class="twa twa-chart-increasing"></i><br>
Packages for <br>doing useful things
# How do I do ... ?
- Structuring the data: `DelimitedFiles`, `CSV`, `DataFrames`
- Working with large data: `DistributedArrays`, `LabelledArrays`
- Stats: `Distributions`, `StatsBase`, `Statistics`
- Math: `ForwardDiff`, `Symbolics`
- Problem solving: `JuMP`, `DifferentialEquations`
- ML: `Flux`
- Bioinformatics: `BioSequences`, `GenomeGraphs`
- Plotting: `Makie`, `UnicodePlots`
- Writing notebooks: `Literate`
# Data frames
Package `DataFrames.jl` provides a work-alike of the data frames from
other environments (pandas, `data.frame`, tibbles, ...)
using DataFrames
mydata = DataFrame(id = [32,10,5], text = ["foo", "bar", "baz"])
mydata.text[ .>= 10]
Main change from `Matrix`: *columns are labeled and their types differ*, also entries may be missing
# DataFrames
Popular way of importing data:
using CSV
df ="database.csv", DataFrame) # can also do a Matrix
CSV.write("backup.csv", df)
Popular among computer users:
using XLSX
x = XLSX.readxlsx("important_results.xls")
DataFrame(XLSX.gettable(x["Results sheet"])...)
<small>(Please do not export data to XLSX.)</small>
# Plotting
<img src="slides/img/unicodeplot.png" width="40%" />
<div class=leader>
<i class="twa twa-blueberries"></i>
<i class="twa twa-red-apple"></i>
<i class="twa twa-melon"></i>
<i class="twa twa-grapes"></i><br>
# Thank you!
<center><img src="slides/img/r3-training-logo.png" height="200px"></center>
Contact us if you need help:
<a href=""></a>
