Skip to content
Snippets Groups Projects

julia training slides for 2022-06-08

Merged Miroslav Kratochvil requested to merge mk-juliatraining into develop
3 files
+ 28
25
Compare changes
  • Side-by-side
  • Inline
Files
3
<div class=leader>
<i class="twa twa-rocket"></i>
<i class="twa twa-rocket"></i>
<i class="twa twa-rocket"></i><br>
Parallel Julia
</div>
# Julia model of distributed computation
<center>
<img src="slides/img/distrib.svg" width="50%">
</center>
# Basic parallel processing
**Using `Threads`:**
1. start Julia with parameter `-t N`
2. parallelize (some) loops with `Threads.@threads`
```julia
a = zeros(100000)
Threads.@threads for i = eachindex(a)
a[i] = hardfunction(i)
end
```
**Using `Distributed`:**
```julia
using Distributed
addprocs(N)
newVector = pmap(myFunction, myVector)
```
We will use the `Distributed` approach.
# Managing your workers
```julia
using Distributed
addprocs(4)
myid()
workers()
```
Running commands on workers:
```julia
@spawnat 3 @info "Message from worker"
@spawnat :any myid()
```
Getting results from workers:
```julia
job = @spawnat :any begin sleep(10); return 123+321; end
fetch(job)
```
Cleaning up:
```julia
rmprocs(workers())
```
# Processing lots of data items in parallel
```julia
datafiles = ["file$i.csv" for i=1:20]
@everywhere function process_file(name)
println("Processing file $name")
# ... do something ...
end
pmap(process_file, datafiles)
```
<i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i> Doing it manually:
```julia
@sync for f in datafiles
@async @spawnat :any process_file(f)
end
```
# Gathering results from workers
```julia
items = collect(1:1000)
@everywhere compute_item(i) = 123 + 321*i
pmap(compute_item, items)
```
<i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i> Doing manually with `@spawnat`:
```julia
futures = [@spawnat :any compute_item(item) for item in items]
fetch.(futures)
```
# How to design for parallelization?
**Recommended way:** *Utilize the high-level looping primitives!*
- use `map`, parallelize by just switching to `pmap`
- use `reduce` or `mapreduce`, parallelize by just switching to `dmapreduce` (DistributedData.jl)
# <i class="twa twa-light-bulb"></i> Parallel → distributed processing
It is very easy to organize *multiple computers* to work for you!
You need a working `ssh` connection:
```sh
user@pc1 $ ssh server1
Last login: Wed Jan 13 15:29:34 2021 from 2001:a18:....
user@server $ _
```
Spawning remote processes on remote machines:
```julia
julia> using Distributed
julia> addprocs([("server1", 10), ("pc2", 2)])
```
**Benefit:** No additional changes to the parallel programs!
<div class=leader>
<i class="twa twa-abacus"></i>
<i class="twa twa-laptop"></i>
<i class="twa twa-desktop-computer"></i>
<i class="twa twa-flag-luxembourg"></i><br>
Utilizing ULHPC <i class="twa twa-light-bulb"></i>
</div>
# What does the cluster look like? (Iris)
<center>
<img src="slides/img/iris.png" width="30%">
<br>
<tt>hpc-docs.uni.lu/systems/iris</tt>
</center>
# Running Julia on the computing nodes
Start an allocation and connect to it:
```sh
0 [mkratochvil@access1 ~]$ srun -p interactive -t 30 --pty bash -i
```
(You can also use `si`.)
After some brief time, you should get a shell on a compute node. There you can install and start Julia as usual:
```
0 [mkratochvil@iris-131 ~](2696005 1N/T/1CN)$ module add lang/Julia
0 [mkratochvil@iris-131 ~](2696005 1N/T/1CN)$ julia
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.6.2 (2021-07-14)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia>
```
# Making a HPC-compatible Julia script
Main challenges:
1. discover the available resources
2. spawn worker processes at the right place
```julia
using ClusterManagers
addprocs_slurm(parse(Int, ENV["SLURM_NTASKS"]))
# ... continue as usual
```
# Scheduling an analysis script
Normally, you write a "batch script" and add it to a queue using `sbatch`.
Script in `runAnalysis.sbatch`:
```sh
#!/bin/bash
# SBATCH -J MyAnalysisInJulia
# SBATCH -n 10
# SBATCH -c 1
# SBATCH -t 30
# SBATCH --mem-per-cpu 4G
julia runAnalysis.jl
```
You start the script using:
```sh
$ sbatch runAnalysis.sbatch
```
<div class=leader>
<i class="twa twa-blueberries"></i>
<i class="twa twa-red-apple"></i>
<i class="twa twa-melon"></i>
<i class="twa twa-grapes"></i><br>
Questions?
</div>
Lets do some hands-on problem solving (expected around 15 minutes)
Loading