formatting of naming card

aeb4d844 · Laurent Heirendt · a09bf719 · aeb4d844
Verified Commit aeb4d844 authored 5 years ago by Laurent Heirendt
--- a/external/integrity/naming/file_naming.md
+++ b/external/integrity/naming/file_naming.md
@@ -7,63 +7,65 @@ redirect_from:
  - /external/cards/integrity:naming
 ---
 # Naming files
+
 (Re)Naming a file is very easy operation usually one or two clicks away (*right click+rename, F2, ...*). Maybe thats why people do not pay enough attention when choosing a proper file name  even though it can have a big impact on their ability to find those files later and to understand what they contain.

 Good file name follows three basic principles:

- * machine readable
- * human readable
- * plays well with default ordering
+* machine readable
+* human readable
+* plays well with default ordering

 ## Machine readable
-  Special characters can have different meaning for different operation system or software. The most commonly found are
+
+Special characters can have different meaning for different operation system or software. The most commonly found are

 **&#35;&#36;&#37;&#38;&#39;&#40;&#34;&#41;&#42;&#43;&#44;&#45;&#46;&#47;&#58;&#59;&#60;&#61;&#62;&#63;&#64;&#91;&#92;&#93;&#94;&#95;&#96;&#123;&#124;&#125;&#126;**
  and
 white characters like **space** or **tabulator**.

-  The only two which are recommended in file names are hyphen "**&#45;**" and underscore "**&#95;**". You can use underscore to separate and hyphen to combine.
-  File name
-
-```
-2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A01.csv
-```
-
+The only two which are recommended in file names are hyphen "**&#45;**" and underscore "**&#95;**". You can use underscore to separate and hyphen to combine.
+The file name `2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A01.csv`
 gives us already some information about date of creation (2013-06-26), assay (BRAFWTNEGASSAY), sample set (Plasmid-Cellline-100-1MutantFraction) and well (A01). While following names

-```
+```text
 2013-06-26-BRAFWTNEGASSAY-Plasmid-Cellline-100-1MutantFraction-A01.csv
 .csv
 2013_06_26_BRAFWTNEGASSAY_Plasmid_Cellline_100_1MutantFraction_A01.csv
 ```
+
 are much more prone to misinterpretation.
-#### Accented characters
-  Your language might be very rich on various accented or special characters
-  but both colleagues and your machines will have hard time to work with them.
-  Special letters like  **&#231;**, **&#228;**, **&#244;**,
-  **&#283;**, **&#341;**, etc. require special encoding and might cause troublesome issues when used in file names.

+## Accented characters
+
+Your language might be very rich on various accented or special characters
+but both colleagues and your machines will have hard time to work with them.
+Special letters like  **&#231;**, **&#228;**, **&#244;**,
+**&#283;**, **&#341;**, etc. require special encoding and might cause troublesome issues when used in file names.

 Beware of typos and avoid using multiple names varying in small ways unless it has some true meaning. Following file names are distinct, but can you tell where exactly?

-```
+```text
 2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFractions_B03.csv
 2013-06-26_BRAFWTNEGASSAY_Plasmid-Celline-100-1MutantFraction_B03.csv
 2013-06-26_BRAFWTNEGASSAY_Plazmid-Cellline-100-1MutantFraction_B03.csv
 ```

-#### Exploiting machine readable names
+## Exploiting machine readable names
+
 You may already have a lot of files collected for your project or you have received big dataset from one of your collaborators. Then you might think about organizing and renaming them to be compliant with your new or existing naming policy.
 If the names are consistent and you don't want to loose time renaming them by hand, you may try to use dedicated tools (e.g. [PSRenamer](https://www.powersurgepub.com/products/psrenamer/index.html)) or simple commands in your command line (**rename** for Mac and Linux, **ren** for Windows).

 Once your skills develop, you will be able to use machines and machine readable file names to perform advanced operations on them, e.g. search using regular expression.
 Imagine folder with thousands of files. Running simple R command
-```
+
+```R
 flist <- list.files(pattern = "Plasmid")
 ```
+
 will give you all file names containing word "Plasmid".

-```
+```text
 2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A01.csv
 2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A02.csv
 2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A03.csv
@@ -74,7 +76,7 @@ will give you all file names containing word "Plasmid".

 This result can be easily further processed into an awesome meta-data table by applying split in places of underscore and dot:

-```
+```R
 flist_df <- stringr::str_split_fixed(flist, "[_\\.]", 5)
 names(flist_df) <- c("Date", "Assay", "Sample_set", "Well", "Format")
 ```
@@ -90,24 +92,24 @@ names(flist_df) <- c("Date", "Assay", "Sample_set", "Well", "Format")

 Of course, similarly simple and powerful commands can be found in every programming language/interpreter (Python, Bash, ...)

-#### Case sensitivity
+## Case sensitivity
+
 It is generally recommended **not** to use upper case letters.
 Firstly, matching patterns and splitting names with upper case letters is much harder and error prone. Another drawback might be the fact, that Windows file system is case insensitive (unlike Mac or Linux OS).

 If you really want to extend hyphen-underscore semantic separation, you can use so called [**camelCase**](https://en.wikipedia.org/wiki/Camel_case) - substituting spaces between words by upper-casing their first letters.

+## Machine readable names allow us

-
-
-#### Machine readable names allow us:
- * easily search for files later
- * easily narrow file lists based on names
- * easily extract info from file names, e.g. by splitting
+* easily search for files later
+* easily narrow file lists based on names
+* easily extract info from file names, e.g. by splitting

 Remember that the rules on machine readability apply also for naming your **folders** (now containing your nicely named files). In fact, it is a good practice to stick to these rules even when naming **variables** in your data files.
+
 ## Human readable

-  * Be specific.  It is generally better to create longer file name which is fulfilling its purpose than using short abbreviations which might be hard to grasp by your colleagues, eventually by yourself after some time. Stay away from cryptic names and non-standard or unclear abbreviations.
+* Be specific.  It is generally better to create longer file name which is fulfilling its purpose than using short abbreviations which might be hard to grasp by your colleagues, eventually by yourself after some time. Stay away from cryptic names and non-standard or unclear abbreviations.

 | Bad named                 | Better name                                           |
 | ------------------------- | ----------------------------------------------------- |
@@ -116,92 +118,110 @@ Remember that the rules on machine readability apply also for naming your **fold
 | ms_cresp_final.doc        | John-White_Cell-respiration-manuscript_2019-12-11.doc |
 | fig_1.png                 | John-White_Cell-respiration_fig-1_2019-12-11.png      |

-  * Usually, file extension is already telling you some information about the file itself.
-  Here are some examples of file names which are unnecessarily long and could be easily shortened:
-  `````
-  Iris-setosa_table.csv
-  video_2019_annual-meeting.avi
-  2019-12-11_notes.log
-  ATAC_seq1_London_mapped.bam
-  A2452_description-tutorial.info
-  `````
-  * Never use suffixes (or prefixes) like **"final"**, **"old"**, **"new"**, **"current"**, **"obsolete"**, **"recent"**, **"latest"**, **"best"**...
-  File is hardly in such states and it will change sooner or later anyway.
-
-  * Name should naturally explain why the file exists. If you have to search for additional information (either asking your colleagues or reading some README files), the file name is probably not chosen properly. Name file in a way that even a total stranger could get it easily.
-
-  * Leave out meaningless or redundant words, e.g. "the", "and", "a", "file", "data" ...
-
-  * Do not be too creative, do not pun and stay professional. Bad examples:
-
-  ```
-  bio-rect_UM.csv - data related to bio-reactors at University of Michigan
-  PEPA_d-pic.jpeg - a fourth picture from your paper on Performace Evaluation Process Algebra
-  ```
-#### Semantic versioning
+* Usually, file extension is already telling you some information about the file itself.
+
+Here are some examples of file names which are unnecessarily long and could be easily shortened:
+
+```text
+Iris-setosa_table.csv
+video_2019_annual-meeting.avi
+2019-12-11_notes.log
+ATAC_seq1_London_mapped.bam
+A2452_description-tutorial.info
+```
+
+* Never use suffixes (or prefixes) like **"final"**, **"old"**, **"new"**, **"current"**, **"obsolete"**, **"recent"**, **"latest"**, **"best"**...
+File is hardly in such states and it will change sooner or later anyway.
+
+* Name should naturally explain why the file exists. If you have to search for additional information (either asking your colleagues or reading some README files), the file name is probably not chosen properly. Name file in a way that even a total stranger could get it easily.
+
+* Leave out meaningless or redundant words, e.g. "the", "and", "a", "file", "data" ...
+
+* Do not be too creative, do not pun and stay professional. Bad examples:
+
+```text
+bio-rect_UM.csv - data related to bio-reactors at University of Michigan
+PEPA_d-pic.jpeg - a fourth picture from your paper on Performace Evaluation Process Algebra
+```
+
+## Semantic versioning
+
 If your files or documents change very often and you want to track the versions manually instead of using some sophisticated versioning software<!-- TODO: link to GIT howto-card -->, you might follow semantic versioning scheme widely used in software development.
 It is based on adding several numbers, standard is 3, into a suffix of your file name where:

-    * first number called **MAJOR** version is increased once the document has undergone **significant changes**
-    * second number called **MINOR** version is incremented once some new information is added to the document or something is deleted
-    * last number called **PATCH** should refer to very minor changes like fixing of typos or rephrasing a sentence.
-
-  These can be be headed by the letter „V“ in order to indicate the following version information.
+* first number called **MAJOR** version is increased once the document has undergone **significant changes**
+* second number called **MINOR** version is incremented once some new information is added to the document or something is deleted
+* last number called **PATCH** should refer to very minor changes like fixing of typos or rephrasing a sentence.

+These can be be headed by the letter „V“ in order to indicate the following version information.

 Human readable names allow us:
-  * easily understand what the file is and what it contains
-  * easily share files with others
+
+* easily understand what the file is and what it contains
+* easily share files with others

 ## Default ordering
+
 Inbuilt tools (e.g. file explorer) allows you to order files by name in alphanumerical order. Make the best out of this great feature.

-  * Put the terms in general-to-specific order. That way, you will have files grouped in logical order and related files will be naturally close to each other.
-    ```
-    Ares-triticum_samples_redundant_2010-04-12.csv
-    Ares-hordeum_samples_redundant_2010-05-12.csv
-    Iris-setosa_samples_1927_05_12.csv
-    Iris-setosa_samples_1954-06-24.csv
-    Iris-versicolor_samples_1945-04-12.csv
-    ```
-  * Put the date first to get chronological ordering:
-    ```
-    2013-06-26_Plasmid_A01.csv
-    2014-06-26_Plasmid_C02.csv
-    2015-06-30_Plasmid_A03.csv
-    2015-07-12_Plasmid_B01.csv
-    2015-07-13_Plasmid_B02.csv
-    2015-11-10_Plasmid_B03.csv
-    ```
-  * Put number defining explicit order as first. Remember that the ordering is done by character, not by the whole number, so you might want to add leading zeros just to be sure that the ordering will be correct with growing number of your files.
-    ```
-    01_Plasmid_A01_2013-06-26.csv
-    02_Plasmid_C02_2014-06-26.csv
-    03_Plasmid_A03_2015-06-30.csv
-    10_Plasmid_B01_2015-07-12.csv
-    11_Plasmid_B02_2015-07-13.csv
-    25_Plasmid_B03_2015-11-10.csv
-    ```
+* Put the terms in general-to-specific order. That way, you will have files grouped in logical order and related files will be naturally close to each other.
+
+```text
+Ares-triticum_samples_redundant_2010-04-12.csv
+Ares-hordeum_samples_redundant_2010-05-12.csv
+Iris-setosa_samples_1927_05_12.csv
+Iris-setosa_samples_1954-06-24.csv
+Iris-versicolor_samples_1945-04-12.csv
+```
+
+* Put the date first to get chronological ordering:
+
+```text
+2013-06-26_Plasmid_A01.csv
+2014-06-26_Plasmid_C02.csv
+2015-06-30_Plasmid_A03.csv
+2015-07-12_Plasmid_B01.csv
+2015-07-13_Plasmid_B02.csv
+2015-11-10_Plasmid_B03.csv
+```
+
+* Put number defining explicit order as first. Remember that the ordering is done by character, not by the whole number, so you might want to add leading zeros just to be sure that the ordering will be correct with growing number of your files.
+
+```text
+01_Plasmid_A01_2013-06-26.csv
+02_Plasmid_C02_2014-06-26.csv
+03_Plasmid_A03_2015-06-30.csv
+10_Plasmid_B01_2015-07-12.csv
+11_Plasmid_B02_2015-07-13.csv
+25_Plasmid_B03_2015-11-10.csv
+```

 ## Dates
-  Including date in your file names allows you to sort them easily and find exactly the one you want in very short time.
+
+Including date in your file names allows you to sort them easily and find exactly the one you want in very short time.
 Remember that recording dates using anything else than numbers (e.g. month abbreviations) can due to different language background result in formats like "*11dic2019*" or "*11Dez2019*", etc., which doesn't have to be recognized as date at all.
 It is much better to use only numeric format but even then it can be written in endless variations which are hard to read or more importantly make them ambiguous, like date **11th of December 2019** in following examples:
+
+```text
+19/11/12
+19/12/11
+20191112
+11.12.2019
+11-12-19
+...
 ```
-  19/11/12
-  19/12/11
-  20191112
-  11.12.2019
-  11-12-19
-  ...
+
+Luckily, there is a standard for date format, YYYY-MM-DD ([*ISO 8601*](https://en.wikipedia.org/wiki/ISO_8601)), which really nicely comply with all three principles above. Therefore, the **only** correct format of 11th of December 2019 is:
+
+```text
+2019-12-11
 ```
-  Luckily, there is a standard for date format, YYYY-MM-DD ([*ISO 8601*](https://en.wikipedia.org/wiki/ISO_8601)), which really nicely comply with all three principles above. Therefore, the **only** correct format of 11th of December 2019 is:
- ```
-  2019-12-11
- ```
+
 <!-- TODO: stability of names in shared repository which is not read-only - e.g. someone gets nuts and starts to rename everything. Dangerous if there is any analyses link directly to a file. -->
 <!-- TODO: do some guidelines/rules/recommendations apply to different classes of files - source code, data, documents -->
+
 ## Final notes
+
 When starting your project or creating a new repository, give yourself a time to set a proper naming design.
 Remember that it should be also accepted by your teammates and other collaborators accessing the files.
 To make dissemination of the naming design as easy as possible, don't forget to document it and include it into policies of your group/project.
@@ -212,7 +232,8 @@ But the truth is that it will pay off once the projects get more complex and you
 If you don't agree with naming rules which are adopted in your group, follow them or make an effort to change it globally.
 The **consistency** is much more important than your preferred naming.

-# Resources
-Jenny Brian's [slides](https://speakerdeck.com/jennybc/how-to-name-files) on "Naming things" from Reproducible Science Workshop, Duke, 2015
-Semantic versioning - [semverdoc.org](https://semverdoc.org/)
-LCSB *IT101* training [presentation](https://git-r3lab.uni.lu/R3/howto-cardsrds/uploads/738930b9a533a2f308cc62c431d9246f/it101.html)
+## Resources
+
+* Jenny Brian's [slides](https://speakerdeck.com/jennybc/how-to-name-files) on "Naming things" from Reproducible Science Workshop, Duke, 2015
+* Semantic versioning - [semverdoc.org](https://semverdoc.org/)
+* LCSB *IT101* training [presentation](https://git-r3lab.uni.lu/R3/howto-cardsrds/uploads/738930b9a533a2f308cc62c431d9246f/it101.html)