Merge branch 'file_naming' into 'develop'

File naming See merge request R3/outreach/presentations!21

Merge branch 'file_naming' into 'develop'
File naming See merge request R3/outreach/presentations!21
d3b1e26d · Laurent Heirendt · 170dc8f0 · d9bcb166 · d3b1e26d · d3b1e26d
Commit d3b1e26d authored 5 years ago by Laurent Heirendt
--- a/2019/2019-06-25_file-naming/slides/checksums_intro.md
+++ b/2019/2019-06-25_file-naming/slides/checksums_intro.md
+## Checksums - What is it?
+
+<center><img src="slides/img/checksum.png" height="600px"></center>
+
+
+
+# Checksums - When?
+
+* transmission
+  - new dataset from collaborator
+  - upload to remote repository
+
+* long term storage
+  - master version of dataset
+  - snapshot of data for publication
+
+
+
+# Checksums - How?
+
+* Linux and MacOS
+  - command-line tools shasum, md5sum (Linux), md5 (Mac)
+```bash
+shasum -a 256 my_file.csv > my_file.sha256
+```
+  - verifying the checksum
+```bash
+shasum -c my_file.sha256
+my_file.csv: OK
+```
+
+* Windows
+  - creation
+```
+certutil -hashfile my_file.csv SHA256 > my_file.sha256
+```
+  - verification - re-run the **certutil** and manually compare generated checksums
+
+  - **MD5Summer** - free checksum tool with GUI
--- a/2019/2019-06-25_file-naming/slides/final-notes.md
+++ b/2019/2019-06-25_file-naming/slides/final-notes.md
+
+# Final notes
+
+* plan ahead
+
+* follow guidelines of your team
+
+* ensure proper dissemination of your naming policy
+
+* consistency overrules personal preference
--- a/2019/2019-06-25_file-naming/slides/human_readable.md
+++ b/2019/2019-06-25_file-naming/slides/human_readable.md
+
+# Human readable
+name carries information about content
+* use English
+
+* balance name length against clarity
+
+* don't be too creative and stay professional
+<!--   "PEPA_d-pic.jpeg" - a fourth **pic**ture from your paper on **P**erformance **E**valuation **P**rocess **A**lgebra -->
+
+
+* **never** use suffixes (or prefixes) like **"final"**, **"old"**, **"new"**, **"current"**, **"obsolete"**, **"recent"**, **"latest"**, **"best"**...
+
+* leave out meaningless words, e.g. "the", "and", "a", "file", "data" ...
+
+
+
+# Human readable
+* do not forget about extensions
+  ```
+  Iris-setosa-table.csv  
+  video-2019-annual_meeting.avi
+  2019-12-11-notes.log  
+  ATAC-seq1-London-mapped.bam  
+  A2452-description_tutorial.info
+  ```
+* be specific
+
+| Bad named                 | Better name                                           |
+| ------------------------- | ----------------------------------------------------- |
+| myabstract.txt            | John-White_Sensitivity-of-PLFA-analyses_abstract.txt  |
+| samples_project_start.csv | PA324_samples_2019-12-11.csv                          |
+| ms_cresp_final.doc        | John-White_Cell-respiration_manuscript_2019-12-11.doc |
+| fig_1.png                 | John-White_Cell-respiration_fig-1_2019-12-11.png      |
--- a/2019/2019-06-25_file-naming/slides/img/checksum.png
+++ b/2019/2019-06-25_file-naming/slides/img/checksum.png
--- a/2019/2019-06-25_file-naming/slides/img/favicon.ico
+++ b/2019/2019-06-25_file-naming/slides/img/favicon.ico
--- a/2019/2019-06-25_file-naming/slides/img/r3-training-logo.png
+++ b/2019/2019-06-25_file-naming/slides/img/r3-training-logo.png
--- a/2019/2019-06-25_file-naming/slides/index.md
+++ b/2019/2019-06-25_file-naming/slides/index.md
+## File naming
+<br><br>
+#### June 25th, 2019
+
+<div style="top: 6em; left: 0%; position: absolute;">
+    <img src="theme/img/lcsb_bg.png">
+</div>
+
+<div style="top: 5em; left: 60%; position: absolute;">
+    <img src="slides/img/r3-training-logo.png" height="200px">
+    <br><br><br><br>
+    <h3>Naming files and checksums</h3>
+    <br><br><br><br>
+    <h4>
+        Vilem Ded<br><br>
+        vilem.ded@uni.lu<br><br>
+        <i>Luxembourg Centre for Systems Biomedicine</i>
+    </h4>
+</div>
--- a/2019/2019-06-25_file-naming/slides/list.json
+++ b/2019/2019-06-25_file-naming/slides/list.json
+[
+    {
+        "filename": "index.md",
+        "attr": {
+        }
+    },
+    {
+        "filename": "three_principles.md",
+        "attr": {
+        }
+    },
+    {
+        "filename": "machine_readable.md",
+        "attr": {
+        }
+    },
+    {
+        "filename": "human_readable.md",
+        "attr": {
+        }
+    },
+    {
+        "filename": "ordering.md",
+        "attr": {
+        }
+    },
+    {
+        "filename": "final-notes.md",
+        "attr": {
+        }
+    },
+    {
+        "filename": "checksums_intro.md",
+        "attr": {
+        }
+    },
+    {
+        "filename": "thanks.md",
+        "attr": {
+        }
+    }
+]
--- a/2019/2019-06-25_file-naming/slides/machine_readable.md
+++ b/2019/2019-06-25_file-naming/slides/machine_readable.md
+
+# Machine readable
+
+* case sensitivity
+
+* <span style="color:red">special characters</span>:
+ **&#35;&#36;&#37;&#38;&#39;&#40;&#34;&#41;&#42;&#43;&#44; &#45;&#46;&#47;&#58;&#59;&#60;&#61;&#62;&#63;&#64;&#91;&#92; &#93;&#94;&#95;&#96;&#123;&#124;&#125;&#126;** and
+white characters like **space** or **tabulator**
+
+* <span style="color:red">accented characters</span>:
+  **&#231;**, **&#228;**, **&#244;**,
+ **&#283;**, **&#341;**, ...
+
+* <span style="color:red">lack of consistency</span>:
+```
+2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFractions_B03.csv
+2013-06-26_BRAFWTNEGASSAY_Plasmid-Celline-100-1MutantFraction_B03.csv
+2013-06-26_BRAFWTNEGASSAY_Plazmid-Cellline-100-1MutantFraction_B03.csv
+2013-06-26_BRAFWTNEGASSAY_plasmid-Celline-100-1MutantFraction_B03.csv
+```
+
+
+
+# Machine readable
+So what should I use?
+
+* alphanumeric characters
+* hyphen "**&#45;**" to combine t-h-i-n-g-s
+* underscore "**&#95;**" to separate
+
+Example:  
+
+"projectID_method-name_sampleID_YYYY-MM-DD.ext"  
+"Author-Name_Paper-Name_manuscript_YYYY-MM-DD.doc"
+
+
+
+
+# Machine readable
+**Ugly** names:
+```
+ "Hlad.jez.M-L-průtoky JíObj.z Ohře-od 10-2011.xlsx"
+ "Finacial detailes BIocore 19/11/12.xls"
+ "ATACseq1Londonmapped.bam"
+```
+
+**Nice** names:
+```
+Hipo-1631_CA1_iba-1488_GFAP-647.tiff
+2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A01.csv
+2013-06-26_BRAFWTNEGASSAY_FFPEDNA-CRC-1-41_A01.csv
+PI102_Mouse12_EEG_2018-11-03_1245.tsv
+Bioinfiniti_FullProposal_2018-11-15_1655.docx
+```
+
+
+
+# Machine readable
+ * globbing and pattern search
+ ```{r}
+  > list.files(pattern = "Plasmid")
+ ```
+<!--   Python:   re.search("*Plasmid*", txt)
+  Bash:     ls -l *Plasmid* -->
+ Result
+ ```
+ ...
+2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A01.csv  
+2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A02.csv  
+2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A03.csv  
+2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_B01.csv  
+2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_B02.csv  
+2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_B03.csv  
+...
+```
+
+
+
+# Machine readable
+ * (meta)data extraction
+ ```{r}
+str_split_fixed(flist, "[_\\.]", 5)
+```
+<!-- names(flist_df) <- c("Date", "Project", "Origin", "SampleID", "Format") -->
+
+| Date         | Project          | Origin                                 | SampleID | Format |
+|--------------|------------------|----------------------------------------|----------|--------|
+| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "A01"    | csv    |
+| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "A02"    | csv    |
+| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "A03"    | csv    |
+| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "B01"    | csv    |
+| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "B02"    | csv    |
+| "2013-06-26" | "BRAFWTNEGASSAY" | "Plasmid-Cellline-100-1MutantFraction" | "B03"    | csv    |
--- a/2019/2019-06-25_file-naming/slides/ordering.md
+++ b/2019/2019-06-25_file-naming/slides/ordering.md
+
+## Default ordering
+use of inbuilt ordering
+
+*  terms in general-to-specific order - logical ordering  
+"projectID_method-name_sampleID_YYYY-MM-DD.ext"
+<!-- TODO -->
+* date first - chronological ordering
+
+* digits first - explicit ordering
+
+
+
+## Default ordering
+use of inbuilt ordering
+
+* terms in general-to-specific order - logical ordering
+
+<!-- TODO -->
+* date first - chronological ordering
+
+  ```
+  2013-06-26_Plasmid_A01.csv
+  2014-06-26_Plasmid_C02.csv  
+  2015-06-30_Plasmid_A03.csv  
+  2015-07-12_Plasmid_B01.csv  
+  2015-07-13_Plasmid_B02.csv  
+  2015-11-10_Plasmid_B03.csv  
+  ```
+* digits first - explicit ordering
+
+
+
+## Default ordering
+use of inbuilt ordering
+
+* terms in general-to-specific order - logical ordering
+
+<!-- TODO -->
+* date first - chronological ordering
+
+* digits first - explicit ordering
+  ```
+  01_Plasmid_A01_2013-06-26.csv
+  02_Plasmid_C02_2014-06-26.csv  
+  03_Plasmid_A03_2015-06-30.csv  
+  10_Plasmid_B01_2015-07-12.csv  
+  11_Plasmid_B02_2015-07-13.csv  
+  25_Plasmid_B03_2015-11-10.csv  
+  ```
--- a/2019/2019-06-25_file-naming/slides/summary.md
+++ b/2019/2019-06-25_file-naming/slides/summary.md
+## Three main principles
+* Machine readable:
+   * easily search for files later
+   * easily narrow file lists based on names
+   * easily extract info from file names, e.g. by splitting, regex,...
+* Human readable:
+   * easily understand what the file is and what it contains
+   * easily share files with others
+* Plays well with default ordering:
+   * logically ordered/clustered
+   * fast manual search
--- a/2019/2019-06-25_file-naming/slides/thanks.md
+++ b/2019/2019-06-25_file-naming/slides/thanks.md
+## Thank you.
+
+<center><img src="slides/img/r3-training-logo.png" height="150px"></center>
+
+Contact us if you need help:
+<a href="mailto:r3lab.core@uni.lu">r3lab.core@uni.lu</a>
+<br><br>
+
+Resources:  
+Jenny Brian's [slides](https://speakerdeck.com/jennybc/how-to-name-files) on "Naming things" from Reproducible Science Workshop, Duke, 2015  
+Semantic versioning - [semverdoc.org](https://semverdoc.org/)  
+LCSB *IT101* training [presentation](https://git-r3lab.uni.lu/R3/labCards/uploads/738930b9a533a2f308cc62c431d9246f/it101.html)  
--- a/2019/2019-06-25_file-naming/slides/three_principles.md
+++ b/2019/2019-06-25_file-naming/slides/three_principles.md
+
+
+## Three main principles
+* Machine readable:
+   * search for files later
+   * narrow file lists based on names
+   * extract info from file names, e.g. by splitting, regex,...
+
+* Human readable:
+   * understand what the file is and what it contains
+   * share files with others
+
+* Plays well with default ordering:
+   * logical ordering/clustered
+   * fast manual search