Skip to content
Snippets Groups Projects
Commit 9d9aaee6 authored by Miroslav Kratochvil's avatar Miroslav Kratochvil :bicyclist: Committed by Miroslav Kratochvil
Browse files

julia training slides v0 (still a few todos)

parent 88f3e8ff
No related branches found
No related tags found
2 merge requests!138[release] Regular merge of develop,!131julia training slides for 2022-06-08
This commit is part of merge request !131. Comments created here will be created in the context of that merge request.
Showing
with 1972 additions and 0 deletions
<div class=leader>
Bootstrapping Julia
</div>
# Installing Julia
Recommended method:
- Download an archive from https://julialang.org/downloads/
- Execute `julia` or `julia.exe` as-is
- Link it to your `$PATH`
Distribution packages usually work well too:
- **Debians&Ubuntus**: `apt install julia`
- **Iris/Aion**: `module add lang/Julia`
# Life in REPL
```julia
user@pc $ julia
julia> sqrt(1+1)
1.4142135623730951
julia> println("Well hello there!")
Well hello there!
julia> ?
help?> sqrt
sqrt(x)
Computes the square root .....
```
# REPL modes
Julia interprets some additional keys to make our lifes easier:
- `?`: help mode
- `;`: shell mode
- `]`: packaging mode (looks like a box!)
- `Backspace`: quits special mode
- `Tab`: autocomplete anything
- `\`... `Tab`: expand math characters
# Loading libraries, modules and packages
- Load a local file (with shared functions etc.)
```julia
include("mylibrary.jl")
```
- Load a package, add its exports to the global namespace
using UnicodePlots
- Load a package without exports
```julia
import UnicodePlots
```
- Trick: load package exports to a custom namespace
```julia
module Plt
using UnicodePlots
end
```
# Managing packages from the package management environment
- Install a package
```julia
] add UnicodePlots
```
- Uninstall a package
```julia
] remove UnicodePlots
```
- Enter a local project with separate package versions
```julia
] activate path/to/project
```
- Install dependencies of the local project
```julia
] instantiate
```
(Project data is stored in `Project.toml`, `Manifest.toml`)
# Workflow: Testing in REPL
- Code in REPL
- Paste pieces of code back and forth to editor/IDE
- VS Code etc.: `Ctrl`+`Enter`
- Linuxes: magic middleclick
- A script is eventually materialized
- ...or picked from history in `.julia/logs/` :)
# Workflow: Write a good standalone script
*Your scripts should communicate well with the environment!* (that means, among other, you)
```julia
#!/usr/bin/env julia
global_param = get(ENV, "MY_SETTING", "default")
function process_file(fn::String)
println("working on $fn...")
#...
if error_detected
@error "something terrible has happened" fn
exit(1)
end
end
process_file.(ARGS)
exit(0)
```
# Workflow: What makes your script sustainable?
Main UNIX facilities:
- Commandline arguments tell the script where to do the work (make it *repurposable*)
- Environment lets you customize stuff that doesn't easily fit into arguments (makes it *reconfigurable*)
- Proper success & error reporting tells the other programs that something broke (makes the pipeline *robust*)
- `#!` (aka "shabang") converts your script to a normal program (makes the user (you) much happier)
<div class=leader>
PAUSE
</div>
Let's have *10 minutes* for a coffee or something.
(Questions?)
<div class=leader>
Parallel Julia on HPCs
</div>
# Julia model of distributed computation
<center>
<img src="slides/img/distrib.svg" width="50%">
</center>
# What does ULHPC look like?
<center>
<img src="slides/img/iris.png" width="30%">
<br>
<tt>hpc-docs.uni.lu/systems/iris</tt>
</center>
# Basic parallel processing
**Using Threads:**
1. start Julia with parameter `-t N`
2. parallelize any loops with `Threads.@threads`
**Using `Distributed`:**
```julia
using Distributed
addprocs(N)
newVector = pmap(function, oldVector)
```
# How to design for parallelization?
- *Divide software into completely independent parts*
- avoid shared writeable state (to allow reentrancy)
- avoid global variables (to allow separation from the "mother" process)
- avoid complicated intexing in arrays (to allow slicing)
- avoid tiny computation steps (to allow high-yield computation)
- *Design for utilization of the high-level looping primitives*
- use `map`
- use `reduce` or `mapreduce`
- parallelize programs using `pmap` and `dmapreduce` (DistributedData.jl)
# Parallel → distributed processing
You need a working `ssh`
connection to the server, ideally with keys:
```sh
user@pc1 $ ssh server1
Last login: Wed Jan 13 15:29:34 2021 from 2001:a18:....
user@server $ _
```
Spawning remote processes on remote machines:
```julia
julia> using Distributed
julia> addprocs([("server1", 10), ("pc2", 2)])
```
**Benefit:** No additional changes to the parallel programs!
# Making a HPC-compatible script
Main problems:
1. discover the available resources
2. spawn worker processes at the right place
```julia
using ClusterManagers
addprocs_slurm(parse(Int, ENV["SLURM_NTASKS"]))
# ... continue as usual
```
# Scheduling the script
Normally, you write a "batch script" and add it to a queue using `sbatch`.
Script in `runAnalysis.sbatch`:
```sh
#!/bin/bash
# SBATCH -J MyAnalysisInJulia
# SBATCH -n 10
# SBATCH -c 1
# SBATCH -t 30
# SBATCH --mem-per-cpu 4G
julia runAnalysis.jl
```
You start the script using:
```sh
$ sbatch runAnalysis.sbatch
```
<div class=leader>
Questions?
</div>
Lets do some hands-on problem solving (expected around 15 minutes)
This diff is collapsed.
2022/2022-06-08_JuliaForNewcomers/slides/img/favicon.ico

39.9 KiB

2022/2022-06-08_JuliaForNewcomers/slides/img/iris.png

3.37 MiB

2022/2022-06-08_JuliaForNewcomers/slides/img/r3-training-logo.png

32.4 KiB

2022/2022-06-08_JuliaForNewcomers/slides/img/unicodeplot.png

9.93 KiB

# Julia for newcomers
## June 8th, 2022
<div style="top: 6em; left: 0%; position: absolute;">
<img src="theme/img/lcsb_bg.png">
</div>
<div style="top: 5em; left: 60%; position: absolute;">
<img src="slides/img/r3-training-logo.png" height="200px">
<br><br><br>
<h1>Julia for newcomers</h1>
<br><br><br>
<h4>
Laurent Heirendt, Ph.D.<br>
Miroslav Kratochvíl, Ph.D.<br><br>
R3 Team - <a href="mailto:lcsb-r3@uni.lu">lcsb-r3@uni.lu</a><br>
<i>Luxembourg Centre for Systems Biomedicine</i>
</h4>
</div>
<style>
code {border: 2pt dotted #f80; padding: .4ex; border-radius: .7ex; color:#444; }
pre code {border: 0;}
em {color: #e02;}
li {margin-bottom: 1ex;}
div.leader {font-size:400%; font-weight:bold; margin: 1em;}
</style>
# Motivation first!
*Why is it good to work in compiled language?*
- Programs become much faster for free.
*What do we gain by having types in the language?*
- Generic programming, and lots of optimization possibilities for the compiler.
*Is Julia ecosystem ready for my needs?*
- Likely. If not, extending the packages is unbelievably easy.
# How to lose performance?
Type `a+1` in a typical interpreted language.
Computer has to do this:
1. Check if `a` exists in the available variables
2. Find the address of `a`
3. Check if `a` is an actual object or null
4. Find if there is `__add__` in the object, get its address
5. Find if `__add__` is a function with 2 parameters
6. Load the value of `a`
7. Call the function, push call stack
8. Find if 1 is an integer and can be added
9. Check if `a` has a primitive representation (ie. not a big-int)
10. Run the `add` instruction (this takes 1 CPU cycle!)
11. Pop call stack
12. Save the result to the place where the runtime can work with it
<div class=leader>
Working with data
</div>
# What do we usually need?
- some nice abstraction over "long" tabular data → DataFrames
- getting data in and out → IO functions
- making pictures → plotting packages
# Loading plaintext files
TODO
# Writing plaintext files
TODO
# DataFrames
Package `DataFrames.jl` provides a work-alike of the data frames from
other environments (pandas, `data.frame`, tibbles, ...)
```julia
using DataFrames
mydata = DataFrame(id = [32,10,5], text = ["foo", "bar", "baz"])
mydata.text
```
Main change from `Matrix`: *column types differ*
# DataFrames
TODO CSV.jl, XLSX.jl
# Plotting
<center>
<img src="slides/img/unicodeplot.png" height="80%" />
</center>
# Usual plotting packages
- `UnicodePlots.jl` (useful in terminal)
- `Plots.jl` (matplotlib workalike)
- `GLMakie.jl` (interactive plots)
- `CairoMakie` (PDF export of Makie plots)
Native `ggplot` and `cowplot` ports are in development.
<div class=leader>
Julia language primer
</div>
# Expressions and types
Expressions and types You can discover types of stuff using `typeof`.
Common types:
- `Bool`
```julia
false, true
```
- `Char`
```julia
'a', 'b', ...
```
- `String`
```julia
"some random text"
```
- `Int`
```julia
1, 0, -1,
```
- `Float64`
```julia
1.1, -1e6, ...
```
# Types may have parameters (usually "contained type")
- `Vector{Int}`
```julia
1, 2, 5, 10
```
- `Matrix{Float64}`
```julia
[1.0 2.0; 2.0 1.0]
```
- `Tuple`
```julia
(1, 2.0, "SomeLabel")
```
- `Set{Int}`
- `Dict{Int,String}`
(default parameter value is typically `Any`)
# Supertype hierarchy
Types possess a single supertype, which allows you to easily group
multiple types under e.g. `Real`, `Function`, `Type`, `Any`, ...
```julia
julia> Int
Int64
julia> Int.super
Signed
julia> Int.super.super
Integer
julia> Int.super.super.super
Real
julia> Int.super.super.super.super
Number
julia> Int.super.super.super.super.super
Any
```
These are useful when restricting what can go into your functions!
# Basic functionality and expectable stuff
- Math: `+`, `-`, `*`, `/`, `^`, ...
- Logic: `==`, `!=`, `<`, `>`, `<=`, `>=`, `&&`, `||`, `!`, ...
- Assignment: `=`, `+=`, `-=`, `*=`, ...
- I/O: `open`, `println`, `read`, `readlines`, ...
- Arrays: `array[1]`, `array[2:5]`, `array[begin+1:end-1]`, `size`, `length`, `cat`, `vcat`, `hcat`, ...
Most functions are *overloaded* to *efficiently* work with multiple types of data.
Functionality is easy to discover by just `Tab`bing the definitions, also `methods(...)` and `methodswith(...)`.
# Control flow: Commands and code blocks
Typically you write 1 command per 1 line.
Commands can be separated by semicolons, and grouped using code blocks:
```julia
begin
a = 10
b = 20; b += 20
a + b # implicit return!
end
```
Many constructions (cycles, function definitions) start the block
automatically, you only write `end`.
# Control flow: Conditional execution
- Traditional `if`:
```julia
if condition
actions
else # optional
actions # optional
end
```
- Onesided shell-like shortcuts:
```julia
a<0 && (a = 0)
isfinite(a) || throw_infinite_a_error()
```
- Shorter inline condition:
```julia
myfunction( index<=10 ? array[index] : default_value )
```
# Control flow: Doing stuff many times
Iteration count-based loop:
```julia
for var = iterable # , var2 = iterable2, ...
code(variable, variable2)
end
```
Examples:
```julia
for i=1:10
@info "iterating!" i
end
for i=1:10, j=1:10
matrix[i,j] = i*j
end
```
Utilities: `eachindex`, `enumerate`
# Control flow: Doing stuff many times
Condition satisfaction-based loop:
```julia
while condition
do_something() # condition is true
end
# condition is false
```
Example:
```julia
number = 123519
digit_sum = 0
while number > 0
digit_sum += number % 10
number ÷= 10
end
@info "We've got results!" digit_sum
```
# Structured cycles!
Using functional-style loops is *much less error-prone* to indexing
errors.
- Transform an array:
```julia
map(sqrt, [1,2,3,4,5])
map((x,y) -> (x^2 - exp(y)), [1,2,3], [-1,0,1])
```
- Summarize an array:
```julia
reduce(+, [1,2,3,4,5])
reduce((a,b) -> "$b $a", ["Use", "the Force", "Luke"])
reduce(*, [1 2 3; 4 5 6], dims=1)
```
# Making new arrays with loops
```julia
julia> [i*10 + j for i=1:3, j=1:5]
3×5 Matrix{Int64}:
11 12 13 14 15
21 22 23 24 25
31 32 33 34 35
julia> join(sort([c for word = ["the result is 123", "what's happening?", "stuff"]
for c = word
if isletter(c)]))
"aaeeeffghhhiilnnpprssssttttuuw"
```
# Control flow: subroutines (functions and methods)
- Multi-line function definition
```julia
function f(a,b)
return a + b
end
```
- "Mathematical" definition
```julia
f(a,b) = a + b
```
- Definition with types specified (creates a *method* of a function)
```julia
f(a::Int, b::Int)::Int = a + b
```
- Overloading (adds another *method* to the function)
```julia
f(a::Complex, b::Complex)::Complex = complex(a.re+b.re, a.im+b.im)
```
(Upon calling the function, Julia picks the *most specific* method.)
# Function arguments
- Keyword arguments (can not be used for overloading)
```julia
function f(a, b=0; extra=0)
return a + b + extra
end
f(123, extra=321)
```
- Managing arguments en masse
```julia
euclidean(x; kwargs...) = sqrt.(sum(x.^2; kwargs...))
max(args...) = maximum(args)
```
# Broadcasting over iterable things
- Broadcasting operators by prepending a dot
```julia
matrix[row, :] .+= vector1 .* vector2
```
- Broadcasting a function
```julia
sqrt.(1:10)
maximum.(eachcol(rand(100,100)))
x = [1,2,3,4]
x' .* x
```
Internally handled by `broadcast()`.
# Advanced container types
- Dictionaries (`Dict{KeyType, ValueType`) allow O(log n) indexing, great for
lookups or keyed data structures. Contents may be typed for increased
efficiency.
```julia
person = Dict("name" => "John", "surname" => "Foo", "age" => 30)
person["age"]
indexof(v::Vector) = Dict(v .=> eachindex(v))
```
- Sets are value-less dictionaries.
```julia
function unique(x)
elems = Set{eltype(x)}()
push!.(Ref(elems), x)
return collect(elems)
end
```
[
{ "filename": "index.md" },
{ "filename": "overview.md" },
{ "filename": "intro.md" },
{ "filename": "bootstrap.md" },
{ "filename": "language.md" },
{ "filename": "io.md" },
{ "filename": "distributed.md" },
{ "filename": "thanks.md" }
]
# Overview today
0. Why would you learn another programming language again?
1. Bootstrapping, working with packages, writing a script (45m), pause (10m)
2. Language and syntax primer (20m)
3. Getting the data in and out (15m)
4. Running programs on ULHPC (15m)
5. Questions, hands-on, time buffer (15m)
# Thank you!
<center><img src="slides/img/r3-training-logo.png" height="200px"></center>
Contact us if you need help:
<a href="mailto:lcsb-r3@uni.lu">lcsb-r3@uni.lu</a>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment