add some content

73953545 · Laurent Heirendt · 9e5c90e4 · 73953545 · 73953545 · 73953545
Verified Commit 73953545 authored 1 year ago by Laurent Heirendt
--- a/2023/2023-10-24_julia-meluxina/slides/1-practical.md
+++ b/2023/2023-10-24_julia-meluxina/slides/1-practical.md
+<div class=leader>
+<i class="twa twa-blue-circle"></i>
+<i class="twa twa-red-circle"></i>
+<i class="twa twa-green-circle"></i>
+<i class="twa twa-purple-circle"></i><br>
+<span style="color:#888">Session I - practical</span>
+</div>
--- a/2023/2023-10-24_julia-meluxina/slides/1-session.md
+++ b/2023/2023-10-24_julia-meluxina/slides/1-session.md
+<div class=leader>
+<i class="twa twa-blue-circle"></i>
+<i class="twa twa-red-circle"></i>
+<i class="twa twa-green-circle"></i>
+<i class="twa twa-purple-circle"></i><br>
+<span style="color:#888">Session I</span>
+</div>
+# Motivation first!
+*Why is it good to work in compiled language?*
+- Programs become much faster for free.
+- Even if you use the language as a package glue, at least the glue is not slow.
+*What do we gain by having types in the language?*
+- Generic programming, and lots of optimization possibilities for the compiler.
+*Is Julia ecosystem ready for my needs? <i class="twa twa-thinking-face"></i>*
+- Likely. If not, extending the packages is super easy.
+- Base includes most of the functionality of Matlab, R and Python with numpy,
+  and many useful bits of C++
+# Why Julia?
+<center><img src="slides/img/whyjulia.png" width="80%"></center>
+(Source: JuliaCon 2016, Arch D. Robison)
+# Always remember
+- you can `Tab` through almost anything in REPL
+- functions have useful help with examples, try `?cat`
+- `typeof(something)` may give good info
+# Everything has a type that determines storage and value handling
+- `Vector{Int}`
+```julia
+[1, 2, 5, 10]
+```
+- `Matrix{Float64}`
+```julia
+[1.0 2.0; 2.0 1.0]
+```
+- `Tuple`
+```julia
+(1, 2.0, "SomeLabel") 
+```
+- `Set{Int}`
+- `Dict{Int,String}`
+# Basic functionality and expectable stuff
+Most concepts from C, Python and MATLAB are portable as they are.
+Surprising parts:
+- arrays are indexed from `1` (for a relatively good reason)
+  - Arrays: `array[1]`, `array[2:5]`, `array[begin+1:end-1]`, `size`, `length`, `cat`, `vcat`, `hcat`, ...
+- code blocks `begin` and `end` with keywords
+  - you can stuff everything on one line!
+- all functions can (and should) be overloaded
+  - simply add a type annotation to parameter with `::` to distinguish between implementations for different types
+  - overloading is cheap
+  - *specialization to known simple types types* is precisely the reason why compiled code can be *fast*
+  - adding type annotations to code and parameters helps the compiler to do the right thing
+# <i class="twa twa-light-bulb"></i> Structured cycles
+Using functional-style loops is *much less error-prone* to indexing
+errors.
+- Transform an array, original:
+```julia
+for i=eachindex(arr)
+   arr[i] = sqrt(arr[i])
+end
+```
+  Structured:
+```julia
+map(sqrt, [1,2,3,4,5])
+map((x,y) -> (x^2 - exp(y)), [1,2,3], [-1,0,1])
+```
+- Summarize an array:
+```julia
+reduce(+, [1,2,3,4,5])
+reduce((a,b) -> "$b $a", ["Use", "the Force", "Luke"])
+reduce(*, [1 2 3; 4 5 6], dims=1)
+```
+**Tricky question (<i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i><i class="twa twa-light-bulb"></i>):** What is the overhead of the "nice" loops?
+# Array-creating loops and generators
+```julia
+julia> [i*10 + j for i = 1:3, j = 1:5]
+3×5 Matrix{Int64}:
+11   12   13   14   15
+21   22   23   24   25
+31   32   33   34   35
+julia> join(sort([c for word in ["the result is 123", "what's happening?", "stuff"]
+                    for c in word
+                    if isletter(c)]))
+"aaeeeffghhhiilnnpprssssttttuuw"
+julia> Dict('a'+i => i for i=1:26)
+Dict{Char, Int64} with 26 entries:
+  'n' => 13
+  'f' => 5
+      ...
+```
+# Control flow: subroutines (functions)
+- Multi-line function definition
+```julia
+function combine(a,b)
+  return a + b
+end
+```
+- "Mathematical" neater definition
+```julia
+combine(a,b) = a + b
+```
+- <i class="twa twa-light-bulb"></i> Definition with types specified (prevents errors, allows optimizations!)
+```julia
+function combine(a::Int, b::Int)::Int
+    return a + b
+end
+function combine(a::Vector, b::Vector)::Vector
+    return a .+ b
+end
+combine(a::String, b::String)::String = "$a and $b"
+```
+# Broadcasting over iterable things (aka The Magic Dot)
+-   Broadcasting operators by prepending a dot
+```julia
+matrix[row, :] .+= vector1 .* vector2
+```
+-   Broadcasting a function
+```julia
+sqrt.(1:10)
+maximum.(eachcol(rand(100,100)))
+x = [1,2,3,4]
+x' .* x
+```
+-   Making generators
+``` julia
+myarray_index = Dict(myarray .=> eachindex(myarray))
+```
+<i class="twa twa-light-bulb"></i> The "magic dot" is a shortcut for calling `broadcast(...)`.
--- a/2023/2023-10-24_julia-meluxina/slides/2-practical.md
+++ b/2023/2023-10-24_julia-meluxina/slides/2-practical.md
+<div class=leader>
+<i class="twa twa-blue-circle"></i>
+<i class="twa twa-red-circle"></i>
+<i class="twa twa-green-circle"></i>
+<i class="twa twa-purple-circle"></i><br>
+<span style="color:#888">Session II - practical</span>
+</div>
\ No newline at end of file
--- a/2023/2023-10-24_julia-meluxina/slides/2-session.md
+++ b/2023/2023-10-24_julia-meluxina/slides/2-session.md
+<div class=leader>
+<i class="twa twa-blue-circle"></i>
+<i class="twa twa-red-circle"></i>
+<i class="twa twa-green-circle"></i>
+<i class="twa twa-purple-circle"></i><br>
+<span style="color:#888">Session II</span>
+</div>
--- a/2023/2023-10-24_julia-meluxina/slides/3-practical.md
+++ b/2023/2023-10-24_julia-meluxina/slides/3-practical.md
+<div class=leader>
+<i class="twa twa-blue-circle"></i>
+<i class="twa twa-red-circle"></i>
+<i class="twa twa-green-circle"></i>
+<i class="twa twa-purple-circle"></i><br>
+<span style="color:#888">Session III - practical</span>
+</div>
\ No newline at end of file
--- a/2023/2023-10-24_julia-meluxina/slides/3-session.md
+++ b/2023/2023-10-24_julia-meluxina/slides/3-session.md
+<div class=leader>
+<i class="twa twa-blue-circle"></i>
+<i class="twa twa-red-circle"></i>
+<i class="twa twa-green-circle"></i>
+<i class="twa twa-purple-circle"></i><br>
+<span style="color:#888">Session III</span>
+</div>
--- a/2023/2023-10-24_julia-meluxina/slides/4-practical.md
+++ b/2023/2023-10-24_julia-meluxina/slides/4-practical.md
+<div class=leader>
+<i class="twa twa-blue-circle"></i>
+<i class="twa twa-red-circle"></i>
+<i class="twa twa-green-circle"></i>
+<i class="twa twa-purple-circle"></i><br>
+<span style="color:#888">Session IV - practical</span>
+</div>
\ No newline at end of file
--- a/2023/2023-10-24_julia-meluxina/slides/4-session.md
+++ b/2023/2023-10-24_julia-meluxina/slides/4-session.md
+<div class=leader>
+<i class="twa twa-blue-circle"></i>
+<i class="twa twa-red-circle"></i>
+<i class="twa twa-green-circle"></i>
+<i class="twa twa-purple-circle"></i><br>
+<span style="color:#888">Session IV</span>
+</div>
+# Note about CUDA
+Julia can serve as an extremely user-friendly front-end for CUDA, abstracting all ugly steps that you'd need to do with normal CUDA, yet still leaving enough flexibility to write high-performance low-level compute kernels.
+The approach here demonstrates what `CUDA.jl` does.
+There's also:
+- `AMDGPU.jl`
+- `Metal.jl` for <i class="twa twa-green-apple"></i>
+- `Vulkan.jl` (less user friendly but works everywhere)
+# Using your GPU for accelerating simple stuff
+```julia
+julia> data = randn(10000,10000);
+julia> @time data*data;
+julia> using CUDA
+julia> data = cu(data);
+julia> @time data*data;
+```
+# What's available?
+The "high-level" API spans most of the CU* helper tools:
+- broadcasting numerical operations via translation to simple kernels (`.+`, `.*`, `.+=`, `ifelse.`, `sin.`, ...)
+- matrix and vector operations using `CUBLAS`
+- `CUSOLVER` (solvers, decompositions etc.) via `LinearAlgebra.jl`
+- ML ops (in `Flux.jl`): `CUTENSOR`
+- `CUFFT`
+- `CUSPARSE` via `SparseArrays.jl`
+- limited support for reducing operations (`findall`, `findfirst`, `findmin`, ...) -- these do not translate easily to GPU code
+- very limited support for array index processing
+(See: https://github.com/NVIDIA/CUDALibrarySamples)
+# Programming kernels in Julia!
+CUDA kernels (`__device__` functions) are generated transparently directly from Julia code.
+```julia
+a = cu(someArray)
+function myKernel(a)
+    i = threadIdx().x
+    a[i] += 1
+    return
+end
+@cuda threads=length(a) myKernel(a)
+```
+Some Julia constructions will not be feasible on the GPU (mainly allocating complex structures); these will trigger a compiler message from `@cuda`.
+# Programming kernels -- usual tricks
+The amount of threads and blocks is limited by hardware; let's make a
+grid-stride loop to process a lot of data quickly!
+```julia
+a = cu(someArray)
+b = cu(otherArray)
+function applySomeMath(a, b)
+    index = threadIdx().x + blockDim().x * (blockIdx().x-1)
+    gridStride = gridDim().x * blockDim().x
+    for i = index:gridStride:length(a)
+        a[i] += someMathFunction(b[i])
+    end
+    return
+end
+@cuda threads=1024 blocks=32 applySomeMath(a)
+```
+Typical CUDA trade-offs:
+- too many blocks won't work, insufficient blocks won't cover your SMs
+- too many threads per block will fail or spill to memory (slow), insufficient threads won't allow parallelization/latency hiding in SM
+- thread divergence destroys performance
+# CUDA.jl interface
+Functions available in the kernel:
+- `gridDim`, `blockDim`
+- `blockIdx`, `threadIdx`
+- `warpsize`, `laneid`, `active_mask`
+- `sync_threads`, `sync_warp`, `threadfence`, ...
+- `vote_all`, `vote_ballot`, `shfl_sync`, ...
+Parameters for the `@cuda` spawn:
+- `threads=nnn` per block
+- `blocks=nnn` per grid
+- `shmem=nnn` how much shared memory to request (available via `CuStaticSharedArray`)
--- a/2023/2023-10-24_julia-meluxina/slides/img/whyjulia.png
+++ b/2023/2023-10-24_julia-meluxina/slides/img/whyjulia.png
--- a/2023/2023-10-24_julia-meluxina/slides/list.json
+++ b/2023/2023-10-24_julia-meluxina/slides/list.json
 [
    { "filename": "index.md" },
-    { "filename": "overview.md" },
+    { "filename": "1-session.md" },
+    { "filename": "1-practical.md" },
+    { "filename": "2-session.md" },
+    { "filename": "2-practical.md" },
+    { "filename": "3-session.md" },
+    { "filename": "3-practical.md" },
+    { "filename": "4-session.md" },
+    { "filename": "4-practical.md" },
    { "filename": "thanks.md" }
 ]
--- a/2023/2023-10-24_julia-meluxina/slides/overview.md
+++ b/2023/2023-10-24_julia-meluxina/slides/overview.md
-# Overview
-0. Subject 1
-1. Subject 2
--- a/2023/2023-10-24_julia-meluxina/slides/thanks.md
+++ b/2023/2023-10-24_julia-meluxina/slides/thanks.md
-# Thank you.
+<div class=leader>
+<i class="twa twa-blueberries"></i>
+<i class="twa twa-red-apple"></i>
+<i class="twa twa-melon"></i>
+<i class="twa twa-grapes"></i><br>
+Questions?
+</div>
+# Thank you!
 <center><img src="slides/img/r3-training-logo.png" height="200px"></center>