Skip to content
Snippets Groups Projects

panTEannot

panTEannot is described in this preprint available on BioRxiv

Description

panTEannot serially annotates transposable elements (TE) from multiple whole-genome assemblies using a common reference library. panTEannot is a light version of TEannot from REPET package. Specifically, as we are interested in inter-individual TE variability, which is due in particular to recent TE transpositions, we focused on non-degenerate sequences, i.e. complete TE copies as a proxy.

The annotations files (Gff3) can be used as input for the panREPET tool.

panTEannot_figure1

Installation

Install Conda (>=4.12.0)

Install Snakemake via mamba (mamba >=0.22.1, snakemake >=7.3.8)

Install Singularity (>=3.8.7-1.el7)

Load the TEfinder2.31 Singularity Image File (SIF) containing the cutterDB, Blaster, Matcher and pathnum2id scripts/tools:

cd containers
singularity pull --arch amd64 library://hquesneville/default/te_finder:2.31

For more details about Blaster and Matcher tools: Quesneville, H., Nouaud, D. & Anxolabéhère, D. Detection of New Transposable Element Families in Drosophila melanogaster and Anopheles gambiae Genomes . J Mol Evol57 (Suppl 1), S50–S59 (2003). https://doi.org/10.1007/s00239-003-0007-2.

Directory tree

.
├── README.md
├── containers
│   └── how_install_te-finder.txt
├── workflow
│   ├── Snakefile
│   ├── config
│           └── test.yaml 
│   ├── run_Snakemake.sh
│   ├── dag.png
│   ├── results
│   └── scripts
│       │   ├── convCoord.py
│       │   └── AnnotationStats
│       │   │   ├── AnnotationStats.py
│       │   │   ├── getCumulLengthFromTEannot.py
│       │   │   ├── Stat.py
└──     └── └── └── AnnotationStatsWriter.py 

Build DAG in png

To visualize your DAG (Directed Acyclic Graph):

cd workflow
snakemake --forceall --dag --configfile config/test.yaml | dot -Tpng > dag.png

Execution

How set up configfile:

project_dir : project path

TE_ref: library of TE references, please specify absolute path

accessions: genomes paths to annotate, for example:

accessions :
  acc1: acc1.fa
  acc2: acc2.fa

param :

  • batch_size: genomes are chunked then batched, choose size of the batches (10 by default)
  • blaster_sensitivity : sensitivity of Blaster tool (1 by default)

containers : specify the TEfinder singularity container pathway (tefinder2.31.sif)

statistics: Simple or Long (Simple by default)

Test

You can execute a test based on data you can find in data folder composed of:

Please specify in workflow/run_Snakemake.sh file, the argument --singularity-args '--bind /home/myhome' if necessary.

The configfile in workflow/config/test.yaml is already set up.

cd workflow
nohup ./run_Snakemake.sh &> test.log &

Expected results are in results folder (results/*/run_blaster-S2.align.clean_match.path.total.out.coordchr.merged.gff and results/*/annotStats/run_blaster-S2.align.clean_match.path.total.out.coordchr.merged.globalAnnotStatsPerTE.txt). The test execution took 78 minutes (real time) on 8 cores 16 RAM (specify --cores 8 --resources mem_gb=16).