panTEannot
panTEannot is described in this preprint available on BioRxiv
Description
panTEannot serially annotates transposable elements (TE) from multiple whole-genome assemblies using a common reference library. panTEannot is a light version of TEannot from REPET package. Specifically, as we are interested in inter-individual TE variability, which is due in particular to recent TE transpositions, we focused on non-degenerate sequences, i.e. complete TE copies as a proxy.
The annotations files (Gff3) can be used as input for the panREPET tool.
Installation
Install Conda (>=4.12.0)
Install Snakemake via mamba (mamba >=0.22.1, snakemake >=7.3.8)
Install Singularity (>=3.8.7-1.el7)
Load the TEfinder2.31 Singularity Image File (SIF) containing the cutterDB, Blaster, Matcher and pathnum2id scripts/tools:
cd containers
singularity pull --arch amd64 library://hquesneville/default/te_finder:2.31
For more details about Blaster and Matcher tools: Quesneville, H., Nouaud, D. & Anxolabéhère, D. Detection of New Transposable Element Families in Drosophila melanogaster and Anopheles gambiae Genomes . J Mol Evol57 (Suppl 1), S50–S59 (2003). https://doi.org/10.1007/s00239-003-0007-2.
Directory tree
.
├── README.md
├── containers
│ └── how_install_te-finder.txt
├── workflow
│ ├── Snakefile
│ ├── config
│ └── test.yaml
│ ├── run_Snakemake.sh
│ ├── dag.png
│ ├── results
│ └── scripts
│ │ ├── convCoord.py
│ │ └── AnnotationStats
│ │ │ ├── AnnotationStats.py
│ │ │ ├── getCumulLengthFromTEannot.py
│ │ │ ├── Stat.py
└── └── └── └── AnnotationStatsWriter.py
Build DAG in png
To visualize your DAG (Directed Acyclic Graph):
cd workflow
snakemake --forceall --dag --configfile config/test.yaml | dot -Tpng > dag.png
Execution
How set up configfile:
project_dir : project path
TE_ref: library of TE references, please specify absolute path
accessions: genomes paths to annotate, for example:
accessions :
acc1: acc1.fa
acc2: acc2.fa
param :
- batch_size: genomes are chunked then batched, choose size of the batches (10 by default)
- blaster_sensitivity : sensitivity of Blaster tool (1 by default)
containers : specify the TEfinder singularity container pathway (tefinder2.31.sif)
statistics: Simple or Long (Simple by default)
Test
You can execute a test based on data you can find in data folder composed of:
- 2 whole-genome assemblies of Brachypodium distachyon from Phytozome database (Gordon et al. 2017) : data/Bdis_TEdenovoGr.fa (Bd21 v3.2) and data/ABR2_337.fa.formated
- a TE library built denovo with TEdenovo pipeline from REPET v3.0 (https://urgi.versailles.inra.fr/Tools/REPET, Flutre et al. 2011) on Bd21 v3.2 genome (data/Bdis_TEannotGr_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE_FullLengthCopy.fa)
Please specify in workflow/run_Snakemake.sh file, the argument --singularity-args '--bind /home/myhome'
if necessary.
The configfile in workflow/config/test.yaml is already set up.
cd workflow
nohup ./run_Snakemake.sh &> test.log &
Expected results are in results folder (results/*/run_blaster-S2.align.clean_match.path.total.out.coordchr.merged.gff and results/*/annotStats/run_blaster-S2.align.clean_match.path.total.out.coordchr.merged.globalAnnotStatsPerTE.txt). The test execution took 78 minutes (real time) on 8 cores 16 RAM (specify --cores 8 --resources mem_gb=16).