-
Miroslav Kratochvil authoredMiroslav Kratochvil authored
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
- Julia model of distributed computation
- Basic parallel processing
- Managing your workers
- Processing lots of data items in parallel
- Gathering results from workers
- How to design for parallelization?
- 💡 Parallel → distributed processing
- What does ULHPC look like?
- Making a HPC-compatible Julia script
- Scheduling the script
distributed.md 3.06 KiB
Parallel Julia
Julia model of distributed computation
Basic parallel processing
Using Threads
:
- start Julia with parameter
-t N
- parallelize (some) loops with
Threads.@threads
a = zeros(100000)
Threads.@threads for i = eachindex(a)
a[i] = hardfunction(i)
end
Using Distributed
:
using Distributed
addprocs(N)
newVector = pmap(myFunction, myVector)
We will use the Distributed
approach.
Managing your workers
using Distributed
addprocs(4)
myid()
workers()
Running commands on workers:
@spawnat 3 @info "Message from worker"
@spawnat :any myid()
Getting results from workers:
job = @spawnat :any begin sleep(10); return 123+321; end
fetch(job)
Cleaning up:
rmprocs(workers())
Processing lots of data items in parallel
datafiles = ["file$i.csv" for i=1:20]
@everywhere function process_file(name)
println("Processing file $name")
# ... do something ...
end
@sync for f in datafiles
@async @spawnat :any process_file(f)
end
Gathering results from workers
items = collect(1:1000)
@everywhere compute_item(i) = 123 + 321*i
pmap(compute_item, items)
@spawnat
:
futures = [@spawnat :any compute_item(item) for item in items]
fetch.(futures)
How to design for parallelization?
Recommended way: Utilize the high-level looping primitives!
- use
map
, parallelize by just switching topmap
- use
reduce
ormapreduce
, parallelize by just switching todmapreduce
(DistributedData.jl)
💡 Parallel → distributed processing
It is very easy to organize multiple computers to work for you!
You need a working ssh
connection:
user@pc1 $ ssh server1
Last login: Wed Jan 13 15:29:34 2021 from 2001:a18:....
user@server $ _
Spawning remote processes on remote machines:
julia> using Distributed
julia> addprocs([("server1", 10), ("pc2", 2)])
Benefit: No additional changes to the parallel programs!
Utilizing ULHPC
What does ULHPC look like?

hpc-docs.uni.lu/systems/iris
Making a HPC-compatible Julia script
Main challenges:
- discover the available resources
- spawn worker processes at the right place
using ClusterManagers
addprocs_slurm(parse(Int, ENV["SLURM_NTASKS"]))
# ... continue as usual
Scheduling the script
Normally, you write a "batch script" and add it to a queue using sbatch
.
Script in runAnalysis.sbatch
:
#!/bin/bash
# SBATCH -J MyAnalysisInJulia
# SBATCH -n 10
# SBATCH -c 1
# SBATCH -t 30
# SBATCH --mem-per-cpu 4G
julia runAnalysis.jl
You start the script using:
$ sbatch runAnalysis.sbatch
Questions?
Lets do some hands-on problem solving (expected around 15 minutes)