10 Converting scripts to Nextflow
Practically, starting with Nextflow often involves converting a set of scripts from an existing project. How does one choose how to separate the code into processes?
Example:
tool1 --opts file.txt > file.tool1.txt
tool2 --opts file.tool1.txt > file.tool2.a.txt
tool2 --opts file.tool2.a.txt > file.tool2.b.txt
while read line; do
tool2 --opts $line >> file.tool2.c.txt
done < file.tool2.b.txt
tool2 --opts file.tool2.c.txt > file.tool2.d.txt
In this example. First code blocks for tool1
and tool2
would be separated into their own processes. Typically these would also have their own unique containers too. Then the code block for tool2
would be examined for parallelization possibilities. So lines 2-3 would be one process, line 5 would be in another process, and line 7 in yet another process. The parallelization of line 4/6 would be handled in the workflow
block.
workflow {
ch_input = Channel.fromPath( params.infile, checkIfExists: true )
TOOL1( ch_input )
TOOL2_AB( TOOL1.out.txt )
TOOL2_C( TOOL2_AB.out.b_txt.splitText() )
TOOL2_D( TOOL2_C.out.txt.collectFile( name:'tool2.c.txt' ) )
}
process TOOL1 {
input:
path file
script:
"""
tool1 --opts $file > ${file.baseName}.tool1.txt
"""
output:
path "*.tool1.txt", emit: txt
}
process TOOL2_AB {
input:
path file
script:
"""
tool2 --opts $file > ${file.simpleName}.tool2.a.txt
tool2 --opts ${file.simpleName}.tool2.a.txt > ${file.simpleName}.tool2.b.txt
"""
output:
path "*.tool2.a.txt", emit: a_txt
path "*.tool2.b.txt", emit: b_txt
}
process TOOL2_C {
input:
val record
script:
"""
tool2 --opts $record > tool2.c.txt
"""
output:
path "tool2.c.txt", emit: txt
}
process TOOL2_D {
input:
path file
script:
"""
tool2 --opts $file > ${file.simpleName}.tool2.d.txt
"""
output:
path "*.tool2.d.txt", emit: txt
}