Something is wrong with one of the rules for long reads: try to comment out all the other targets in rule ASSEMBLE_AND_COVERAGE and try again.
I did that on my branch where I am currently cleaning up: I added your changes, kept only "assembly_annotation" in the steps in the config file, kept only the target "assemble_and_coverage.done" in the main Snakemake file, commented out all targets in rule ASSEMBLE_AND_COVERAGE.
MissingInputException in line 265 of /mnt/irisgpfs/users/vgalata/projects/ONT_pilot/workflow/rules/assembly_annotation.smk:Missing input files for rule build_minimap2_ont_index:results/assembly/megahit/lr/merged/no_barcode/assembly.fna
That might cause the problem if there are no other similar issues.
Since we have ASSEMBLERS, these lines below are resolving to including megahit, metaspades etc. We may need additional flags in the CONFIG.yaml stating LR_assembler maybe (thoughts ??). Or we keep this on a per rule's target basis in the .smk files.
90 rule ASSEMBLE_AND_COVERAGE: 91 input: 92 # long reads on long read contigs 93 expand(os.path.join(RESULTS_DIR, "genomecov/lr/merged/{barcode}/{barcode}_reads-x-{barcode}-{assembler}_contigs.avg_cov.txt"), barcode=BARCODES, assembler=ASSEMBLERS),
Can you please elaborate? Not sure what you mean.
Surely, we can have LR_ASSEMBLER and SR_ASSEMBLER variables.
However, one of the main points of snakemake use here is that the dependencies are automatically based on the target filenames.
In Snakefile we can create list of all, list of SR and list of LR tools.
One can use these in wildcard_constraints or in expand(...) to restrict the rules to specific assemblers.
I think we should use the dependency utility, and specify the assemblers within the targets as I did above.
Example: In our config file, we have assembler=["flye", "megahit", "metaspades", "metaspades_hybrid"]
This causes an issue when we have generic targets as shown here:
# long reads on long read contigsexpand(os.path.join(RESULTS_DIR, "genomecov/lr/merged/{barcode}/{barcode}_reads-x-{barcode}-{assembler}_contigs.avg_cov.txt"), barcode=BARCODES, assembler=ASSEMBLERS)
Since this has to do with LR contigs only, in the .smk files, we'd have to specify each time, assembler="flye"
Was wondering if it's better to get away from that some other way. Now that I think about it - maybe too much work, and is trying to achieve perfection. Let's not go there yet ;)
Probably easier to work around this via the wildcard_constraints suggestion by @valentina.galata. Would make it more transparent to the user as the handling of such cases would occur at the level of rules.