Skip to content
Snippets Groups Projects
Verified Commit 73953545 authored by Laurent Heirendt's avatar Laurent Heirendt :airplane:
Browse files

add some content

parent 9e5c90e4
No related tags found
No related merge requests found
Showing
with 381 additions and 6 deletions
<div class=leader>
<i class="twa twa-blue-circle"></i>
<i class="twa twa-red-circle"></i>
<i class="twa twa-green-circle"></i>
<i class="twa twa-purple-circle"></i><br>
<span style="color:#888">Session I - practical</span>
</div>
<div class=leader>
<i class="twa twa-blue-circle"></i>
<i class="twa twa-red-circle"></i>
<i class="twa twa-green-circle"></i>
<i class="twa twa-purple-circle"></i><br>
<span style="color:#888">Session I</span>
</div>
# Motivation first!
*Why is it good to work in compiled language?*
- Programs become much faster for free.
- Even if you use the language as a package glue, at least the glue is not slow.
*What do we gain by having types in the language?*
- Generic programming, and lots of optimization possibilities for the compiler.
*Is Julia ecosystem ready for my needs? <i class="twa twa-thinking-face"></i>*
- Likely. If not, extending the packages is super easy.
- Base includes most of the functionality of Matlab, R and Python with numpy,
and many useful bits of C++
# Why Julia?
<center><img src="slides/img/whyjulia.png" width="80%"></center>
(Source: JuliaCon 2016, Arch D. Robison)
# Always remember
- you can `Tab` through almost anything in REPL
- functions have useful help with examples, try `?cat`
- `typeof(something)` may give good info
# Everything has a type that determines storage and value handling
- `Vector{Int}`
```julia
[1, 2, 5, 10]
```
- `Matrix{Float64}`
```julia
[1.0 2.0; 2.0 1.0]
```
- `Tuple`
```julia
(1, 2.0, "SomeLabel")
```
- `Set{Int}`
- `Dict{Int,String}`
# Basic functionality and expectable stuff
Most concepts from C, Python and MATLAB are portable as they are.
Surprising parts:
- arrays are indexed from `1` (for a relatively good reason)
- Arrays: `array[1]`, `array[2:5]`, `array[begin+1:end-1]`, `size`, `length`, `cat`, `vcat`, `hcat`, ...
- code blocks `begin` and `end` with keywords
- you can stuff everything on one line!
- all functions can (and should) be overloaded
- simply add a type annotation to parameter with `::` to distinguish between implementations for different types
- overloading is cheap
- *specialization to known simple types types* is precisely the reason why compiled code can be *fast*
- adding type annotations to code and parameters helps the compiler to do the right thing
# <i class="twa twa-light-bulb"></i> Structured cycles
Using functional-style loops is *much less error-prone* to indexing
errors.
- Transform an array, original:
```julia
for i=eachindex(arr)
arr[i] = sqrt(arr[i])
end
```
Structured:
```julia
map(sqrt, [1,2,3,4,5])
map((x,y) -> (x^2 - exp(y)), [1,2,3], [-1,0,1])
```
- Summarize an array:
```julia
reduce(+, [1,2,3,4,5])
reduce((a,b) -> "$b $a", ["Use", "the Force", "Luke"])
reduce(*, [1 2 3; 4 5 6], dims=1)
```
**Tricky question (<i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i>):** What is the overhead of the "nice" loops?
# Array-creating loops and generators
```julia
julia> [i*10 + j for i = 1:3, j = 1:5]
3×5 Matrix{Int64}:
11 12 13 14 15
21 22 23 24 25
31 32 33 34 35
julia> join(sort([c for word in ["the result is 123", "what's happening?", "stuff"]
for c in word
if isletter(c)]))
"aaeeeffghhhiilnnpprssssttttuuw"
julia> Dict('a'+i => i for i=1:26)
Dict{Char, Int64} with 26 entries:
'n' => 13
'f' => 5
...
```
# Control flow: subroutines (functions)
- Multi-line function definition
```julia
function combine(a,b)
return a + b
end
```
- "Mathematical" neater definition
```julia
combine(a,b) = a + b
```
- <i class="twa twa-light-bulb"></i> Definition with types specified (prevents errors, allows optimizations!)
```julia
function combine(a::Int, b::Int)::Int
return a + b
end
function combine(a::Vector, b::Vector)::Vector
return a .+ b
end
combine(a::String, b::String)::String = "$a and $b"
```
# Broadcasting over iterable things (aka The Magic Dot)
- Broadcasting operators by prepending a dot
```julia
matrix[row, :] .+= vector1 .* vector2
```
- Broadcasting a function
```julia
sqrt.(1:10)
maximum.(eachcol(rand(100,100)))
x = [1,2,3,4]
x' .* x
```
- Making generators
``` julia
myarray_index = Dict(myarray .=> eachindex(myarray))
```
<i class="twa twa-light-bulb"></i> The "magic dot" is a shortcut for calling `broadcast(...)`.
<div class=leader>
<i class="twa twa-blue-circle"></i>
<i class="twa twa-red-circle"></i>
<i class="twa twa-green-circle"></i>
<i class="twa twa-purple-circle"></i><br>
<span style="color:#888">Session II - practical</span>
</div>
\ No newline at end of file
<div class=leader>
<i class="twa twa-blue-circle"></i>
<i class="twa twa-red-circle"></i>
<i class="twa twa-green-circle"></i>
<i class="twa twa-purple-circle"></i><br>
<span style="color:#888">Session II</span>
</div>
<div class=leader>
<i class="twa twa-blue-circle"></i>
<i class="twa twa-red-circle"></i>
<i class="twa twa-green-circle"></i>
<i class="twa twa-purple-circle"></i><br>
<span style="color:#888">Session III - practical</span>
</div>
\ No newline at end of file
<div class=leader>
<i class="twa twa-blue-circle"></i>
<i class="twa twa-red-circle"></i>
<i class="twa twa-green-circle"></i>
<i class="twa twa-purple-circle"></i><br>
<span style="color:#888">Session III</span>
</div>
<div class=leader>
<i class="twa twa-blue-circle"></i>
<i class="twa twa-red-circle"></i>
<i class="twa twa-green-circle"></i>
<i class="twa twa-purple-circle"></i><br>
<span style="color:#888">Session IV - practical</span>
</div>
\ No newline at end of file
<div class=leader>
<i class="twa twa-blue-circle"></i>
<i class="twa twa-red-circle"></i>
<i class="twa twa-green-circle"></i>
<i class="twa twa-purple-circle"></i><br>
<span style="color:#888">Session IV</span>
</div>
# Note about CUDA
Julia can serve as an extremely user-friendly front-end for CUDA, abstracting all ugly steps that you'd need to do with normal CUDA, yet still leaving enough flexibility to write high-performance low-level compute kernels.
The approach here demonstrates what `CUDA.jl` does.
There's also:
- `AMDGPU.jl`
- `Metal.jl` for <i class="twa twa-green-apple"></i>
- `Vulkan.jl` (less user friendly but works everywhere)
# Using your GPU for accelerating simple stuff
```julia
julia> data = randn(10000,10000);
julia> @time data*data;
julia> using CUDA
julia> data = cu(data);
julia> @time data*data;
```
# What's available?
The "high-level" API spans most of the CU* helper tools:
- broadcasting numerical operations via translation to simple kernels (`.+`, `.*`, `.+=`, `ifelse.`, `sin.`, ...)
- matrix and vector operations using `CUBLAS`
- `CUSOLVER` (solvers, decompositions etc.) via `LinearAlgebra.jl`
- ML ops (in `Flux.jl`): `CUTENSOR`
- `CUFFT`
- `CUSPARSE` via `SparseArrays.jl`
- limited support for reducing operations (`findall`, `findfirst`, `findmin`, ...) -- these do not translate easily to GPU code
- very limited support for array index processing
(See: https://github.com/NVIDIA/CUDALibrarySamples)
# Programming kernels in Julia!
CUDA kernels (`__device__` functions) are generated transparently directly from Julia code.
```julia
a = cu(someArray)
function myKernel(a)
i = threadIdx().x
a[i] += 1
return
end
@cuda threads=length(a) myKernel(a)
```
Some Julia constructions will not be feasible on the GPU (mainly allocating complex structures); these will trigger a compiler message from `@cuda`.
# Programming kernels -- usual tricks
The amount of threads and blocks is limited by hardware; let's make a
grid-stride loop to process a lot of data quickly!
```julia
a = cu(someArray)
b = cu(otherArray)
function applySomeMath(a, b)
index = threadIdx().x + blockDim().x * (blockIdx().x-1)
gridStride = gridDim().x * blockDim().x
for i = index:gridStride:length(a)
a[i] += someMathFunction(b[i])
end
return
end
@cuda threads=1024 blocks=32 applySomeMath(a)
```
Typical CUDA trade-offs:
- too many blocks won't work, insufficient blocks won't cover your SMs
- too many threads per block will fail or spill to memory (slow), insufficient threads won't allow parallelization/latency hiding in SM
- thread divergence destroys performance
# CUDA.jl interface
Functions available in the kernel:
- `gridDim`, `blockDim`
- `blockIdx`, `threadIdx`
- `warpsize`, `laneid`, `active_mask`
- `sync_threads`, `sync_warp`, `threadfence`, ...
- `vote_all`, `vote_ballot`, `shfl_sync`, ...
Parameters for the `@cuda` spawn:
- `threads=nnn` per block
- `blocks=nnn` per grid
- `shmem=nnn` how much shared memory to request (available via `CuStaticSharedArray`)
2023/2023-10-24_julia-meluxina/slides/img/whyjulia.png

2.32 MiB

[ [
{ "filename": "index.md" }, { "filename": "index.md" },
{ "filename": "overview.md" }, { "filename": "1-session.md" },
{ "filename": "1-practical.md" },
{ "filename": "2-session.md" },
{ "filename": "2-practical.md" },
{ "filename": "3-session.md" },
{ "filename": "3-practical.md" },
{ "filename": "4-session.md" },
{ "filename": "4-practical.md" },
{ "filename": "thanks.md" } { "filename": "thanks.md" }
] ]
# Overview
0. Subject 1
1. Subject 2
# Thank you.
<div class=leader>
<i class="twa twa-blueberries"></i>
<i class="twa twa-red-apple"></i>
<i class="twa twa-melon"></i>
<i class="twa twa-grapes"></i><br>
Questions?
</div>
# Thank you!
<center><img src="slides/img/r3-training-logo.png" height="200px"></center> <center><img src="slides/img/r3-training-logo.png" height="200px"></center>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment