Expired
Milestone
Feb 28, 2022–Mar 31, 2022
PathoFact v2
Milestone ID: 407
Notes
To be updated
Requirements
- new(er/est) version of
snakemake
-
rgi
: should now be able to handle*
in FAA files-
important: does removing
*
from FAA sequences affect some results or should that be done in any case (?)
-
important: does removing
-
signalp
, v6 - optional:
plasflow
--> should alternatives be added?
Configuration
Config
- no runtime configuration (only in rules and profiles)
- rm
runtime
,mem
- rm
- sample table (see also below)
- ID, FNA, (FAA, FNA/FAA mapping)
- no
project
anddatadir
- either only one output path or use work folder via profiles
- paths to DBs: single path? group by attribute?
- allow multiple steps (see also below)
Sample table
- columns: ID, FNA, (FAA, FNA/FAA mapping)
Profiles
- different types:
- generic (w/o a scheduler)
- HPC w/
slurm
(simple setup)
working directory (???)
Workflow
- standardized structure
- standardized rule structure
- benchmark and log files
- use
tmp
, shadow rules for temp output - conda env. YAML files: update/clean
- config validation
- schemas for config and sample table
- save config to output
- set working directory
- replace checkpoints with split/gather
- consider whether splitting should be kept for all steps or not
- seqkit: split2
- todo: need to kno w the total number of seq.s
- todo:
signalp
limit for number of seq.s?
- allow different step combinations
- output sub-folder per sample
# example for step a combination
workflows:
- vir: true
- amr: true
- tox: false
Rules
Testing
- Use
snakemake
's unit test utility - CI
GitLab
Assign some issues to this milestone.