2 Summary of work habits
- Use an organised folder structure.
- Make a private Project repository on Github, and clone it on Uppmax and then locally.
- Have a stable
main
git branch. - Git branches are used to develop new features and add exploratory analyses.
- Make a test data set for development purposes.
- Use toy examples for exploring Nextflow functionality.
- Write processes in a modular way to use existing containers.
- Use Docker to make containers for tools which are not available as existing container images.
- Use Nextflow to manage intermediate files.
- If a script is failing, debug it in the Nextflow work directory.
- parameters and config are included in version control.
2.1 Folder structure
The structure I’ve found that works well for me is:
/proj/naiss20XX-YY-ZZ/NBIS_support_<id>/ (NAISS Compute Allocation)
|
| - README.md Project details summary
|
| - analyses/ Analysis configuration files
| | - 01_workflow_dev_dardel Configuration to use test data
| \ - 02_full_data_analysis_dardel Configuration to use all the data
| - conda/nextflow-env Conda Environment containing tools and dependancies to run Nextflow
| - docs/ Project documentation
\ - workflow/ Nextflow workflow
| - bin Custom script folder
| - configs General workflow configuration
\ - containers Custom container definitions
/proj/naiss20xx-yy-zz/ (NAISS Storage Allocation)
|
| - nobackup/nxf-work Intermediate computations
\ - NBIS_support_<id>_data/ Project data
| - deliveries Read only copy of data from sequencing center
| - raw_data Symlinked reorganized relevant raw data in deliveries
| - outputs Saved outputs from the workflow
\ - frozen Curated outputs for publishing
This is flexible enough for both data analysis and pipeline development projects. For public pipeline development projects the public GitHub repo is used, instead of a repository workflow
folder.
The files and folders README.md
, analyses
, docs
, and workflow
are also tracked using Git.