Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • R3/school/courses
  • vilem.ded/courses
  • todor.kondic/courses
  • noua.toukourou/courses
  • nene.barry/courses
  • laurent.heirendt/courses
  • marina.popleteeva/courses
  • jenny.tran/courses
8 results
Show changes
Showing
with 825 additions and 16 deletions
../../../2021-04-20_IT101-DM/slides/img/scripts
\ No newline at end of file
../../../2021-04-20_IT101-DM/slides/img/undraw_secure_server_s9u8.png
\ No newline at end of file
../../../2021-04-20_IT101-DM/slides/img/wordcloud.png
\ No newline at end of file
../../2021-04-20_IT101-DM/slides/ingestion.md
\ No newline at end of file
# Data housekeeping
## Available data storage
<div class='fragment' style="position:absolute">
<img src="slides/img/LCSB_storages_full.png" height="750px">
</div>
<div class='fragment' style="position:relative">
<img src="slides/img/LCSB_storages_personal-crossed.png" height="750px">
<div style="position:absolute;left:65%;top:60%">
* Unless consortium/project has formally agreed to use a secure commercial cloud
</div>
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div>
# Data ingestion: Transfer and Integrity
* When sending data: <font color="red">Do not use emails, use secure platforms (Cloud, Aspera, Atlas share...)!</font>
<div class="fragment">
Data can be corrupted:
* (non-)malicious modification
* faulty file transfer
* disk corruption
</div>
<div class="fragment">
### Solution
* disable write access to the source data
* generate checksums!
<div style="position:absolute;left:40%;top:30%">
<img src="slides/img/checksum.png" width="500px">
</div>
</div>
<div class="fragment" style="position:relative; left:0%">
## When to generate checksums?
* before data transfer
- new dataset from collaborator
- upload to remote repository
* long term storage
- master version of dataset
- snapshot of data for publication
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div>
# Data ingestion/Integrity
## Encryption
<div class='fragment' style="position:relative;left:25%;top:60%">
<img align="middle" height="300px" src="slides/img/encryption.png">
</div>
<div class='fragment'>
* Guaranted confidentiality
</div>
<div class='fragment'>
* Encryption key need to be kept safe
</div>
<div class='fragment'>
* <font color= red>Loosing your encryption key means loosing your data!</font>
</div>
<div class='fragment'>
* Make a off-site backup of your data
</div>
../../2021-04-20_IT101-DM/slides/introduction.md
\ No newline at end of file
# Introduction
<div class="fragment" style="position:absolute">
<img height="450px" src="slides/img/wordcloud.png"><br>
## Learning objectives
* How to manage your data
* How to look and analyze your data
* Solving issues with computers
* Reproduciblity in the research data life cycle
</div>
<div class="fragment" style="position:relative;left:50%; width:40%">
<div >
<center>
<img height="405px" src="slides/img/rudi_balling.jpg"><br>
Prof. Dr. Rudi Balling, director
</center>
</div>
## Pertains to practically all people at LCSB
* Scientists
* PhD candidates
* Technicians
* Administrators
</div>
../../2021-04-20_IT101-DM/slides/list.json
\ No newline at end of file
[
{ "filename": "index.md" },
{ "filename": "introduction.md" },
{ "filename": "access_management.md" },
{ "filename": "data-introduction.md" },
{ "filename": "data_flow.md" },
{ "filename": "ingestion.md" },
{ "filename": "storage_setup.md" },
{ "filename": "data-housekeeping.md" },
{ "filename": "howtos.md" },
{ "filename": "reproducibility.md" },
{ "filename": "code_versioning.md" },
{ "filename": "visualization.md" },
{ "filename": "data_life_cycle.md" },
{ "filename": "problem_solving.md" },
{ "filename": "fair-principles.md" },
{ "filename": "r3_group.md" },
{ "filename": "thanks.md" }
]
\ No newline at end of file
../../2021-04-20_IT101-DM/slides/overview.md
\ No newline at end of file
## Overview
0. Introduction - learning objectives + targeted audience
1. Data workflow
1. Ingestion:
* receiving/sending/sharing data
* file naming
* checksums
* backup
1. making data tidy
* what is table
*
1. Learning to code workflows and analyses - excel files, coding
1. Code versioning and reproducibility
1. Visualization
* see the data
1. problem solving
* guide
* rubberducking
* google for help
* oracle
1. R3 team
1. Acknowledgment
1. data minimization
../../2021-04-20_IT101-DM/slides/reproducibility.md
\ No newline at end of file
# Reproducibility
* ensures credibility
* key requirement for follow-up and collaborative studies
<div style="position:absolute">
<img src="slides/img/reproducibility_nature.png" height="650px">
</div>
<div class="fragment" style="position:relative;left:50%">
## Why is our workflow not reproducible?
Lack of provenance:
* Input data downloaded from “some website”
* Copy & paste operations
* Manual text entry
* Analysis not coded
</div>
# Reproducibility
## Learning to code workflows and analyses
<div style="display:inline-grid;grid-gap: 40px;grid-template-columns: auto auto;position:relative;left:12%">
<div class="fragment">
<div class="content-box">
<div class="box-title red">Spreadsheets alone</div>
<div class="content">
* Is great for looking at data.
* Data entry is fast.
* Analysis flow is hidden and not in focus.
</div>
</div>
<div style="text-align:center">
<img src="slides/img/excel_data-sheet.png" height="280px">
</div>
</div>
<div class="fragment">
<div class="content-box">
<div class="box-title">Coding</div>
<div class="content">
* Is great for controlling analysis
* Data is hidden.
* Flow is visible.
</div>
</div>
<img src="slides/img/code-example.png" height="280px">
</div>
</div>
<div class="content-box fragment" style="left:15%;width:60%;position:relative">
<div class="box-title green">Develop data science skills</div>
<div class="content">
* Develop good data management and analysis habits.
* Start coding your analysis within spreadsheets.
* Make yourself familiar with a statistics environment such as R, Python or Matlab
* No need to learn a high level programming language such as C++ or Java.
</div>
</div>
</div>
# Table
<div style="position:absolute">
"Tabular format of data"
### Header
* one line!
* **good** names of columns
### Rows
* represent observations/entities
### Columns
* represent property of the observations
* one data type
</div>
<div style="left:50%; position:relative; top:-2em">
<img src="slides/img/excel_data-sheet.png" width="700px">
<div class="fragment" data-fragment-index="3" style="position:absolute">
<img src="slides/img/excel_analyses-sheet.jpeg" width="700px"><br>
</div>
<div class="fragment" data-fragment-index="4" style="position:relative">
<img src="slides/img/red-cross.png" width="700px"><br>
</div>
</div>
../../2021-04-20_IT101-DM/slides/storage_setup.md
\ No newline at end of file
# Storage set-up
* Download Anti-virus software
* Regularly update your SW/OS
* Encrypt movable media
<div class="fragment" >
### Backup
* take care of your own backups!
* don't work on your backup copy!
* minimum is <b>3-2-1 backup rule</b>
<div style="position:absolute;right:10%;top:10%">
<img src="slides/img/undraw_secure_server_s9u8.png" height="750px">
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div>
</div>
<div class="fragment">
### Passwords
* Strong passwords
* Password manager
* Safe password exchange channels
* Expiration time on password share
</div>
# Storage set-up
## Password exchange channels
<div style="position:relative">
<img src="slides/img/privateBin.png" height="350px">
</div>
<div style="position:absolute;left:65%;top:85%">
* Free service provided by LSCB at <a href="https://privatebin.lcsb.uni.lu" style="color:blue; font-size:0.8em;">privatebin.lcsb.uni.lu</a>
* **LUMS** account is required
* Set expiry period
* Can expire upon first access
* Password only accessible by sender and recipient
</div>
# Storage set-up
## Backup - Central IT/LCSB
<div style="position:relative">
<img src="slides/img/LCSB_storages_backed-up.png" height="750px">
</div>
<div style="position:absolute;left:65%;top:60%">
Server administrators take care of:
* server backups
* LCSB OwnCloud backups
* group/application server backups (not always)
</div>
# Storage set-up
## Backup - personal research data
<div style="position:relative">
<img src="slides/img/LCSB_storages_backup.png" height="750px">
</div>
<div style="position:absolute;left:55%;top:70%">
<font color="red">One version should reside on Atlas!</font>
</div>
../../2021-04-20_IT101-DM/slides/thanks.md
\ No newline at end of file
# Thank you.<sup> </sup>
<center><img src="slides/img/r3-training-logo.png" height="200px"></center>
<br>
<br>
<br>
<br>
<center>
Contact us if you need help:
<a href="mailto:lcsb-r3@uni.lu">lcsb-r3@uni.lu</a>
</center>
<div style="position:absolute">
Links:
HowTo Cards / Policies: https://howto.lcsb.uni.lu/
Course Slides: https://courses.lcsb.uni.lu/
Internal Presentations: https://presentations.lcsb.uni.lu/
LCSB GitLab: https://gitlab.lcsb.uni.lu/
HPC: https://hpc.uni.lu/
Service Portal: https://service.uni.lu/sp
LCSB intranet: https://intranet.uni.lux
</div>
<div style="position:relative;top:1.5em;left:55%;width:45%">
Avalable SW and tools:
<div style="margin-left: 20px;">
SIU managed:
&ensp; - Service Portal > All Catalogs > IT > Softwares
</div>
<div style="margin-left: 20px;">
LCSB managed:
&ensp; - Service Portal > Knowledge > FAQ - Corporate Software\
&ensp; - LCSB intranet > Science tab > Tools
</div>
</div>
../../2021-04-20_IT101-DM/slides/visualization.md
\ No newline at end of file
# Visualization
<center>
**Plot your data!**
<figure>
<img src="slides/img/DinoSequentialSmaller.gif" height="500px">
<blockquote>"never trust summary statistics alone; always visualize your data"</blockquote>
<figcaption>--Alberto Cairo</figcaption>
</figure>
</center>
# Visualization
<center>
**Plot your data!**
<figure>
<img src="slides/img/plot-data.png" height="800px">
</figure>
</center>
# Best practices
* `pull` before `push` and, generally, before starting to work
* Work on your <font color="red">own</font> branch (in your own fork), and **not** on `master` and **not** on `develop`
* Do **not push** to `master`, but **submit a Pull Request (PR)**
* Get your code **reviewed** by your peers (submit a PR!)
* Submit a PR **often**!
* `clone` a repository, do not download the `.zip` file.
* Do **not** combine `git` commands
```bash
$ git commit -am "myMessage" # do not do this
```
* Stage only 1 file at once using
```bash
$ git add myFile.md
```
* Commit **only a few files** at once (after multiple separate `git add` commands)
* `Push` often - avoid conflicts
Remember: **A `push` a day keeps conflicts away!**