-
MARTIN Pierre authoredMARTIN Pierre authored
metagWGS
Introduction
metagWGS is a Nextflow bioinformatics analysis pipeline used for metagenomic Whole Genome Shotgun sequencing data (Illumina HiSeq3000 or NovaSeq, paired, 2*150bp).
Pipeline graphical representation
The workflow processes raw data from .fastq
or .fastq.gz
inputs and do the modules represented into this figure:
metagWGS steps
metagWGS is splitted into different steps that correspond to different parts of the bioinformatics analysis:
-
01_clean_qc
(can ke skipped)- trims adapters sequences and deletes low quality reads (Cutadapt, Sickle)
- suppresses host contaminants (BWA + Samtools + Bedtools)
- controls the quality of raw and cleaned data (FastQC)
- makes a taxonomic classification of cleaned reads (Kaiju MEM + kronaTools + Generate_barplot_kaiju.py + merge_kaiju_results.py)
-
02_assembly
- assembles cleaned reads (combined with
01_clean_qc
step) or raw reads (combined with--skip_01_clean_qc
parameter) (metaSPAdes or Megahit) - assesses the quality of assembly (metaQUAST)
- deduplicates cleaned reads (combined with
01_clean_qc
step) or raw reads (combined with--skip_01_clean_qc
parameter) (BWA + Samtools + Bedtools)
- assembles cleaned reads (combined with
-
03_filtering
(can be skipped)- filters contigs with low CPM value (Filter_contig_per_cpm.py + metaQUAST)
-
04_structural_annot
- makes a structural annotation of genes (Prokka + Rename_contigs_and_genes.py)
-
05_alignment
-
06_func_annot
- makes a sample and global clustering of genes (cd-hit-est + cd_hit_produce_table_clstr.py)
- quantifies reads that align with the genes (featureCounts + Quantification_clusters.py)
- makes a functional annotation of genes and a quantification of reads by function (eggNOG-mapper + best_bitscore_diamond.py + merge_abundance_and_functional_annotations.py + quantification_by_functional_annotation.py)
-
07_taxo_affi
- taxonomically affiliates the genes (Samtools + aln2taxaffi.py)
- taxonomically affiliates the contigs (Samtools + aln2taxaffi.py)
- counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level (Samtools + merge_contig_quantif_perlineage.py + quantification_by_contig_lineage.py)
-
08_binning
from nf-core/mag 1.0.0- makes binning of contigs (MetaBAT2)
- assesses bins (BUSCO + metaQUAST + summary_busco.py and combine_tables.py from nf-core/mag)
- taxonomically affiliates the bins (BAT)
A report html file is generated at the end of the workflow with MultiQC.
The pipeline is built using Nextflow, a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
Three Singularity containers are available making installation trivial and results highly reproducible.
Documentation
metagWGS documentation is available here.
License
metagWGS is distributed under the GNU General Public License v3.
Copyright
2021 INRAE
Funded by
Anti-Selfish (Labex ECOFECT – N° 00002455-CT15000562) France Génomique National Infrastructure (funded as part of Investissement d’avenir program managed by Agence Nationale de la Recherche, contract ANR-10-INBS-09) With participation of SeqOccIn members financed by FEDER-FSE MIDI-PYRENEES ET GARONNE 2014-2020.
Citation
metagWGS has been presented at JOBIM 2020:
Poster "Whole metagenome analysis with metagWGS", J. Fourquet, C. Noirot, C. Klopp, P. Pinton, S. Combes, C. Hoede, G. Pascal.
https://www.sfbi.fr/sites/sfbi.fr/files/jobim/jobim2020/posters/compressed/jobim2020_poster_9.pdf
metagWGS has been presented at JOBIM 2019 and at Genotoul Biostat Bioinfo day:
Poster "Whole metagenome analysis with metagWGS", J. Fourquet, A. Chaubet, H. Chiapello, C. Gaspin, M. Haenni, C. Klopp, A. Lupo, J. Mainguy, C. Noirot, T. Rochegue, M. Zytnicki, T. Ferry, C. Hoede.