Skip to content
Snippets Groups Projects
Commit ab75a4da authored by Miroslav Kratochvil's avatar Miroslav Kratochvil :bicyclist:
Browse files

Merge branch 'develop' into 'master'

Develop

See merge request R3/school/courses!171
parents 6f044354 127ab9ca
No related branches found
No related tags found
No related merge requests found
Showing
with 255 additions and 0 deletions
<div class=leader>
<i class="twa twa-rocket"></i>
<i class="twa twa-rocket"></i>
<i class="twa twa-rocket"></i>
<i class="twa twa-rocket"></i>
<br>
Parallel Julia
</div>
# Usual ways to gain performance:
1. Do not work on data that is too far *(cache)*
2. Do not waste energy on organizing trivial stuff *(SIMD/SIMT)*
3. Do not waste time waiting for data *(HT/GPU)*
4. Organize the computation so that more computers don't trip over each other on the task *(SMP)*
5. Move the computers closer to data *(distributed computing)*
Let's spend a moment explaining these technologies...
# Data distance and physical limits
<div class=leader>
How far does light go in 1 cycle of a 3GHz CPU?
</div>
# CPUs vs GPUs (SIMD vs SIMT)
<center>
<img src="slides/img/cpu.png" width="40%" />
<img src="slides/img/gpu.png" width="40%" />
</center>
# SIMD problem: Your Data Looks Like This™
<center>
<img src="slides/img/maze.jpg" width="75%" />
</center>
# Parallel programming
<center>
<img src="slides/img/threads.jpeg" width="50%" /><br>
(notice the false sharing at threaddog 5)
</center>
Distributed computing can give you:
- more memory
- more total memory bandwidth (!)
- more synchronization problems
# Julia tools
Let's implement a matrix multiplication manually, and try:
- checking if SIMD instructions are used
- reordering the loops
- using smaller floats
- tiling
- `@threads`
<center><img src="slides/img/tiling.jpeg" width="33%" /></center>
# Distributed computing with Julia
```julia
using Distributed
addprocs(10)
pmap(myfunction, mydata, workers=workers())
```
(Spoiler: you can add processes on remote machines using SSH.)
# Distributed computing with Julia (on the HPC)
```julia
using Distributed, ClusterManagers
addprocs_slurm(parse(Int, ENV["SLURM_NTASKS"]))
pmap(myfunction, mydata, workers=workers())
```
You typically want to load the data locally at the workers.
For more complex schemes:
- `Dagger.jl` provides complex synchronization/task dependency schemes
- `DistributedData.jl` provides primitives for manipulating the data precisely
# How to use a GPU?
```julia
using CUDA
A = cu(randn(1000,1000));
B = cu(randn(1000,1000));
A = A * B;
```
...transparently uses CUDA, cuBLAS, cuSPARSE, cuDNN and many other libraries to do stuff quicker.
# How to actually program a GPU?
CUDA.jl can compile Julia code into CUDA kernel code.
```julia
function fill_with_indexes!(array)
index = threadIdx().x + blockDim().x * (blockIdx().x - 1)
stride = gridDim().x * blockDim().x
for i = index:stride:length(arr)
arr[i] = i
end
return
end
A = cu(zeros(9999999))
@cuda threads=1024 blocks=16 fill_with_indexes!(A)
```
# Homeworks
## Homework 2 update
Feel free to apply whatever we did today for bonus points.
- think about cache efficiency
- remember that most forces are repulsive forces
- `@threads` may help for larger graphs
## Homework 3
- We will learn to handle ugly and hairy data.
- Last lecture is going to go over the methodology, but you can try earlier.
# Homework 3
We will simulate a cookie distribution network:
- there's one central *cookie factory*
- cookie *transports* move cookies in loads of N cookies
- they require 1 cookie to sustain themselves for each batch of cookies transported
- N is different for each transport
- cookie *distribution points* divide incoming cookies among next paths in networks
- exact distribution ratio among the paths
- no cookies consumed
- cookie *munchers* are at the end of the transport chain
- each of them munches N cookies per day (again different for each muncher)
# Homework 3 (Data)
```json
{ type: "distribution point",
serves: [
{ type: "muncher", consumption: 3 },
{ type: "transport",
capacity: 5,
serves: {
type: "distribution point",
serves: [
{ type: "muncher", consumption: 7 },
{ type: "muncher", consumption: 2 },
{ type: "muncher", consumption: 1 }
]
ratios: [1,1,1]
}
}
],
ratios: [1,5]
}
```
Pretty-printed (one possibility):
```
1 -> munch 3
5 -> transport 5 -> 1 -> munch 7
1 -> munch 2
1 -> munch 1
```
# Homework 3 (Assignment)
Tasks:
- read the cookie network from a JSON file (we'll provide example data, use `JSON.jl`)
- make a nice data structure to hold this problem, make sure the input is valid
- make functions that:
- find the length of the *longest chain* (by transport "steps") from the factory to the muncher
- find out how many cookies the factory needs to produce daily so that *all munchers are fed*
- construct a network where all *transports are split in half*, each half with half cookie consumption
- find out *how many cookies are wasted* by being routed to munchers who can't eat them
- construct a network where the *distribution points are balanced* so that no cookies get wasted
- *BONUS: print the network nicely*
- for simplicity, data structures and functions may be recursive
- performance optimization _is not_ a goal
- nice short code _is_ a goal
2023/2023-03-21_ProgrammingWithJulia-5/slides/img/cpu.png

684 KiB

2023/2023-03-21_ProgrammingWithJulia-5/slides/img/favicon.ico

116 KiB

2023/2023-03-21_ProgrammingWithJulia-5/slides/img/gpu.png

1.88 MiB

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 153.14 98.64"><defs><style>.cls-1{fill:#1a1a1a}.cls-2{fill:#4d64ae}.cls-3{fill:#ca3c32}.cls-4{fill:#9259a3}.cls-5{fill:#399746}</style></defs><title>Asset 2</title><g id="Layer_2" data-name="Layer 2"><g id="Layer_1-2" data-name="Layer 1"><g id="layer1"><g id="g3855"><g id="g945"><g id="g984"><g id="g920"><path id="path3804" d="M93.14,80.94h-13V21.13l13-3.58Z" class="cls-1"/><g id="g898"><g id="g893"><path id="path19" d="M22.17,36.33a8.9,8.9,0,1,1,8.9-8.9A8.91,8.91,0,0,1,22.17,36.33Z" class="cls-2"/></g><path id="path3819" d="M29.14,80.83A26.48,26.48,0,0,1,27.83,90a12.12,12.12,0,0,1-3.62,5.4A12.33,12.33,0,0,1,18.57,98a36.64,36.64,0,0,1-7.32.67,22.47,22.47,0,0,1-4.81-.47A13,13,0,0,1,2.9,96.93,6,6,0,0,1,.76,95.07,3.62,3.62,0,0,1,0,92.88,4.26,4.26,0,0,1,1.59,89.5a6.47,6.47,0,0,1,4.33-1.35,5,5,0,0,1,1.87.32,6,6,0,0,1,1.43.79,12,12,0,0,1,1.16,1.07c.31.4.59.77.83,1.12A7.58,7.58,0,0,0,12.72,93a2.3,2.3,0,0,0,1.15.4,1.85,1.85,0,0,0,1-.28,2,2,0,0,0,.71-1,7.18,7.18,0,0,0,.4-1.91,23.12,23.12,0,0,0,.16-3.06V40.48l13-3.58Z" class="cls-1"/></g><path id="path3802" d="M48.14,37.94V68a6.14,6.14,0,0,0,.47,2.39A6.45,6.45,0,0,0,50,72.24a7,7,0,0,0,2,1.27,6.12,6.12,0,0,0,2.4.48,4.2,4.2,0,0,0,1.61-.4,8.42,8.42,0,0,0,1.8-1.12,13.27,13.27,0,0,0,1.81-1.66,12.92,12.92,0,0,0,1.61-2.11V37.94h13v43h-13v-4a22.47,22.47,0,0,1-5.43,3.53,13.62,13.62,0,0,1-5.59,1.28,16.52,16.52,0,0,1-5.9-1,15.59,15.59,0,0,1-4.76-2.89,13.56,13.56,0,0,1-3.17-4.28,12.41,12.41,0,0,1-1.15-5.29V37.94Z" class="cls-1"/><g id="g905"><g id="g890"><path id="path13" d="M105.79,36.33a8.9,8.9,0,1,1,8.91-8.9A8.91,8.91,0,0,1,105.79,36.33Z" class="cls-3"/><path id="path25" d="M127.18,36.33a8.9,8.9,0,1,1,8.91-8.9A8.91,8.91,0,0,1,127.18,36.33Z" class="cls-4"/><path id="path31" d="M116.49,17.8a8.9,8.9,0,1,1,8.9-8.9,8.89,8.89,0,0,1-8.9,8.9Z" class="cls-5"/></g><path id="path3823" d="M100.14,40.6l13-3.58V80.94h-13Z" class="cls-1"/></g><path id="path3808" d="M140.14,58.77a37.64,37.64,0,0,0-3.77,1.87,21.89,21.89,0,0,0-3.46,2.3,12.77,12.77,0,0,0-2.55,2.67,5.12,5.12,0,0,0-1,2.94,8.53,8.53,0,0,0,.32,2.34,7,7,0,0,0,.87,1.91,5.15,5.15,0,0,0,1.23,1.27,2.67,2.67,0,0,0,1.51.48,6.3,6.3,0,0,0,3.18-1,41.31,41.31,0,0,0,3.62-2.47Zm13,22.17h-13V77.52c-.71.61-1.42,1.17-2.11,1.67a14.2,14.2,0,0,1-2.3,1.35,13.56,13.56,0,0,1-2.82.88,19.75,19.75,0,0,1-3.78.31,16,16,0,0,1-5.33-.83,12.23,12.23,0,0,1-4-2.31,10.23,10.23,0,0,1-2.51-3.53,11,11,0,0,1-.87-4.37,10.27,10.27,0,0,1,.91-4.42,13.11,13.11,0,0,1,2.55-3.57,19.36,19.36,0,0,1,3.77-2.86,40.26,40.26,0,0,1,4.65-2.31c1.67-.69,3.4-1.32,5.17-1.91l5.25-1.71,1.43-.31V49.34a11.91,11.91,0,0,0-.44-3.45,5.82,5.82,0,0,0-1.15-2.31,4,4,0,0,0-1.79-1.31,6.6,6.6,0,0,0-2.34-.4,7.38,7.38,0,0,0-2.59.4,4.37,4.37,0,0,0-1.67,1.11,3.94,3.94,0,0,0-.91,1.59,6.52,6.52,0,0,0-.28,2,9.51,9.51,0,0,1-.28,2.35,4.85,4.85,0,0,1-.91,2A4.47,4.47,0,0,1,126,52.6a6.84,6.84,0,0,1-2.9.52,7.51,7.51,0,0,1-2.51-.4,6.16,6.16,0,0,1-1.91-1.15,6,6,0,0,1-1.27-1.75,5.59,5.59,0,0,1-.44-2.18,6.42,6.42,0,0,1,1.51-4.1,13.16,13.16,0,0,1,4.06-3.3,23.45,23.45,0,0,1,5.92-2.14,31.07,31.07,0,0,1,7.12-.8,32.21,32.21,0,0,1,7.87.84,16.37,16.37,0,0,1,5.49,2.34,9.55,9.55,0,0,1,3.18,3.66,10.91,10.91,0,0,1,1,4.81Z" class="cls-1"/></g></g></g></g></g></g></g></svg>
\ No newline at end of file
2023/2023-03-21_ProgrammingWithJulia-5/slides/img/maze.jpg

241 KiB

2023/2023-03-21_ProgrammingWithJulia-5/slides/img/threads.jpeg

69.7 KiB

2023/2023-03-21_ProgrammingWithJulia-5/slides/img/tiling.jpeg

92.5 KiB

# Programming with Julia
## March 2023
<div style="top: 6em; left: 0%; position: absolute;">
<img src="theme/img/lcsb_bg.png">
</div>
<div style="top: 1em; left: 60%; position: absolute;">
<img src="slides/img/julia.svg" height="200px">
<h1 style="margin-top:3ex; margin-bottom:3ex;">5: Performance and parallelism</h1>
<h4>
Miroslav Kratochvíl<br>
Laurent Heirendt<br>
LCSB, DSSE<br>
</h4>
</div>
<link rel="stylesheet" href="https://lcsb-biocore.github.io/icons-mirror/twemoji-amazing.css">
<style>
code {border: 2pt dotted #f80; padding: .4ex; border-radius: .7ex; color:#444; }
.reveal pre code {border: 0; font-size: 18pt; line-height:27pt;}
em {color: #e02;}
li {margin-bottom: 1ex;}
div.leader {font-size:400%; line-height:120%; font-weight:bold; margin: 1em;}
section {padding-bottom: 10em;}
</style>
[
{ "filename": "index.md" },
{ "filename": "5a-para.md" }
]
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment