This lesson is still being designed and assembled (Pre-Alpha version)

The Nextflow Language

Overview

Teaching: 40 min
Exercises: 20 min
Questions
  • What language is nextflow written in?

  • How do I write a nextflow script?

  • How can I run a nextflow script?

Objectives
  • Describe the Groovy syntax.

  • Write a Nextflow workflow.

  • Run a Nextflow workflow.

Nextflow is a Domain Specific Language (DSL) designed to ease the writing of computational pipelines. It is an extension of the computer language Groovy, which is a superset of the computer language Java. As such, Nextflow can execute any piece of Groovy code or use any library for the JVM platform. Full documentation can be found here

This section describes how to write and run a nextflow workflow.

The Groovy language

What’s your scripting language experience?

  • How many of you use a scripting language on a regular basis?
    • Bash
    • Python
    • Perl
    • R
    • Groovy

Before writing a workflow, here are some fundamental concepts of the Groovy language.

Groovy is very syntax-rich and supports many more operations. A full description of Groovy semantics can be found in the Groovy Documentation.

Can I do X in Groovy / Nextflow too?

  • Are there other data and control structures that you commonly use in your analyses?
  • What kind of things would you like to do in your computational pipeline?

Writing a workflow

To ease the writing of computational pipelines Nextflow introduces two high-level data structures; channels, and processes (See Basic concepts).

A channel is a data-flow object that passes data asynchronously from one process to another. Channels provide methods for reading in data from various sources. Nextflow was developed with a strong emphasis on supporting bioinformatics, and as such includes methods for supporting common file formats like fasta and fastq. Channels send data in a first in, first out manner (FIFO), however data may arrive at the next channel in a different order (asynchrony) due to process execution time, or manipulation of channel values by channel operators.

A process is a task that executes a user script. The user script can be written in any interpreted language, although the default is bash. Each task defined by a process is executed independently, and in isolation, and so input must be communicated using channels.

Example (example.nf):

#! /usr/bin/env nextflow

number_ch = Channel.of(1,2,3,4)

process Sequence {

    input:
    val num from number_ch

    output:
    stdout into out_ch

    script:
    """
    echo $num
    """

}

out_ch.view()

Write a nextflow script

Create your own Nextflow script containing the following:

  • A directive to use the Nextflow interpreter.
  • A channel containing the words “This”, “is”, “my”, “Nextflow”, “script”.
  • A statement to see the contents of the above channel using the view method.
  • A process that prints each channel value using the shell command echo.

Solution

#! /usr/bin/env nextflow

word_ch = Channel.of("This","is","my","nextflow","script")
word_ch.view()

process Display_Words {

    input:
    val word from word_ch

    script:
    """
    echo $word
    """

}

Running a workflow.

A Nextflow workflow is executed using the nextflow run <script.nf> command. Each task is executed locally (on your computer) by default, and expects all the commands in your process scripts to be available on the command line. While local execution is suitable for small scale data processing, Nextflow integrates support for several third-party softwares enabling large scale data processing through various package management tools, job schedulers, and distributed compute infrastructure tools, covered later on.

$ nextflow run example.nf
N E X T F L O W  ~  version 20.01.0
Launching `example.nf` [marvelous_ride] - revision: 614fc2b804
executor >  local (4)
[3e/7b764f] process > Sequence [100%] 4 of 4 ✔
0
1
2
3

Nextflow is also able to run workflows from online version control repositories. If a script is not locally available, Nextflow will attempt to connect to a GitHub repository. The repository and other settings can be configured as described in the Pipeline Sharing documentation. Configuration is discussed in more detail in the Configuration section.

Parameters to both Nextflow and the pipeline script can also be passed on the command line (Use nextflow help run to see all all the available Nextflow options).

# nextflow run -c <config> <workflow_script> --<workflow_parameter> <value>
nextflow run -c nxf.conf my_workflow.nf --welcome_message hello

Parameters to the Nextflow workflow engine are prefixed with a single dash -, while parameters used in the workflow script are prefixed with a double dash --. Additional information on parameter passing is provided later.

Run your own workflows

  • Run your myscript.nf script

Solution

$ nextflow run myscript.nf
  • Modify and run myscript.nf to display the output of the Display_Words process.

Solution

#! /usr/bin/env nextflow

word_ch = Channel.from("This","is","my","nextflow","script")
word_ch.view()

process Display_Words {

    input:
    val word from word_ch

    output:
    stdout into out_ch

    script:
    """
    echo $word
    """

}

out_ch.view()
$ nextflow run myscript.nf
  • Run the hello script from https://github.com/nextflow-io/hello

Solution

$ nextflow run nextflow-io/hello

or

$ nextflow run https://github.com/nextflow-io/hello

References

Key Points

  • Nextflow is written in the Groovy computer language.

  • Channels and Processes are the fundamental data structures of Nextflow.

  • A workflow script is run using nextflow run <script.nf>.