This lesson is still being designed and assembled (Pre-Alpha version)

Channels

Overview

Teaching: 30 min
Exercises: 20 min
Questions
  • How do I read data into the workflow?

  • How do I pass data around the workflow?

Objectives
  • Learn how to read data into the workflow.

  • Learn how data is passed between processes.

  • Learn how to use channel operations to wrangle data into the required input form.

What is a Channel?

A Channel is a data structure designed to efficiently pass data from one process to another. The primary property of channels is they are asynchronous. As soon as a process completes, the results are put in the next channel for processing, without waiting for other processes. This allows a following process to start sooner.

Nextflow utilises two types of Channel, queue and value channels. Queue type channels consume data in a first in, first out manner to create process input declarations. The data in value type channels however can be reused when constructing process input declarations. Since a process must perform operations on input channels to make an input declaration and spawn a task, each process needs it’s own input channel. This is achieved using the into operator on a channel to create two or more channels that can be used by different processes.

Channel.of(1,2,3,4).into { sq_ch; db_ch }

process square_it {

    input:
    val x from sq_ch

    shell:
    """
    echo $x*$x | bc -l
    """
}

process double_it {

    input:
    val x from sq_ch

    shell:
    """
    echo 2*$x | bc -l
    """
}

Reading data into a workflow

The primary method of reading data into a Nextflow workflow is to use Channel factories; methods that produce channels.

There are several channel factories available:

Information on these and other channel factories can be found in the Nextflow Channel factory documentation

Channel operators

Channels pass data from one process to another using the into and from keywords, which put data in, and take data out of the channels. The into keyword creates a queue type channel which is named by the creating process and is available from the global scope to another process.

process WriteHello {

    output:
    file "myfile.txt" into file_ch

    script:
    """
    <commands>
    """
}

process AddWorld {

    input:
    file 'myfile.txt' from file_ch

    script:
    """
    <commands>
    """
 }

Channel operators allow you to manipulate data within channels. Some common examples are:

Many more channel operators are described in the Nextflow Channel Operator Documentation.

Multiple input channels

It is important to understand how multiple input channels are processed. When two or more channels are declared as process inputs, the process waits until it receives an input value from all the channels declared as input.

Two or more queue type channels.

process foo {

    echo true

    input:
    val x from Channel.of(1,2)
    val y from Channel.of('a','b','c')

    script:
    """
    echo $x and $y
    """
}

In this case the process foo will only run two times since there are only two inputs in the first channel. Channel values are consumed, and so there is nothing left to pair with 'c', which is discarded.

In the example above, it should be noted that while the process will execute on the pairings 1 and a, and 2 and b, that for more complex workflows, the queues are asynchronous meaning there’s no guarantee of having the pairing 1 and a, and 2 and b. The emission of 'c' may happen first resulting in a 1 and c pairing. If certain files must be processed together, use one of the queue combining operators such as join or groupBy to generate the correct pairing before being passed as input.

Value channels with queue channels.

process foo {

    echo true

    input:
    val x from Channel.value(1)
    val y from Channel.of('a','b','c')

    script:
    """
    echo $x and $y
    """
}

In this example, the process foo runs three times since the input data from the value type channel can be reused to make a complete input declaration. This produces the pairings 1 and a, 1 and b, and 1 and c.

Exercises

  • mix operator
  • join operator
  • each -> cross

Key Points

  • Channels pass data into and out of processes.

  • There are two types of channel, queue and value channels.

  • Each channel must have it’s own input channel.

  • Channels can be manipulated using channel operators.