Parallel Julia
# Julia model of distributed computation
<img src="slides/img/distrib.svg" width="50%">
# Basic parallel processing
**Using `Threads`:**
1. start Julia with parameter `-t N`
2. parallelize (some) loops with `Threads.@threads`
a = zeros(100000)
Threads.@threads for i = eachindex(a)
a[i] = hardfunction(i)
**Using `Distributed`:**
using Distributed
newVector = pmap(myFunction, myVector)
We will use the `Distributed` approach.
# Managing your workers
using Distributed
Running commands on workers:
@spawnat 3 @info "Message from worker"
@spawnat :any myid()
Getting results from workers:
job = @spawnat :any begin sleep(10); return 123+321; end
Cleaning up:
# Processing lots of data items in parallel
datafiles = ["file$i.csv" for i=1:20]
@everywhere function process_file(name)
println("Processing file $name")
# ... do something ...
pmap(process_file, datafiles)
<i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i> Doing it manually:
@sync for f in datafiles
@async @spawnat :any process_file(f)
# Gathering results from workers
items = collect(1:1000)
@everywhere compute_item(i) = 123 + 321*i
pmap(compute_item, items)
<i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i> Doing manually with `@spawnat`:
futures = [@spawnat :any compute_item(item) for item in items]
# How to design for parallelization?
**Recommended way:** *Utilize the high-level looping primitives!*
- use `map`, parallelize by just switching to `pmap`
- use `reduce` or `mapreduce`, parallelize by just switching to `dmapreduce` (DistributedData.jl)
# <i class="twa twa-light-bulb"></i> Parallel → distributed processing
It is very easy to organize *multiple computers* to work for you!
You need a working `ssh` connection:
user@pc1 $ ssh server1
Last login: Wed Jan 13 15:29:34 2021 from 2001:a18:....
user@server $ _
Spawning remote processes on remote machines:
julia> using Distributed
julia> addprocs([("server1", 10), ("pc2", 2)])
**Benefit:** No additional changes to the parallel programs!
Utilizing ULHPC <i class="twa twa-light-bulb"></i>
# What does the cluster look like? (Iris)
<img src="slides/img/iris.png" width="30%">
# Running Julia on the computing nodes
Start an allocation and connect to it:
0 [mkratochvil@access1 ~]$ srun -p interactive -t 30 --pty bash -i
(You can also use `si`.)
After some brief time, you should get a shell on a compute node. There you can install and start Julia as usual:
0 [mkratochvil@iris-131 ~](2696005 1N/T/1CN)$ module add lang/Julia
0 [mkratochvil@iris-131 ~](2696005 1N/T/1CN)$ julia
_ _ _(_)_ | Documentation:
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.6.2 (2021-07-14)
_/ |\__'_|_|_|\__'_| | Official release
|__/ |
# Making a HPC-compatible Julia script
Main challenges:
1. discover the available resources
2. spawn worker processes at the right place
using ClusterManagers
addprocs_slurm(parse(Int, ENV["SLURM_NTASKS"]))
# ... continue as usual
# Scheduling an analysis script
Normally, you write a "batch script" and add it to a queue using `sbatch`.
Script in `runAnalysis.sbatch`:
# SBATCH -J MyAnalysisInJulia
# SBATCH -n 10
# SBATCH -c 1
# SBATCH -t 30
# SBATCH --mem-per-cpu 4G
julia runAnalysis.jl
You start the script using:
$ sbatch runAnalysis.sbatch
Lets do some hands-on problem solving (expected around 15 minutes)