Newer
Older
panTEannot is described in this preprint available on [BioRxiv](https://doi.org/10.1101/2024.06.17.598857)
panTEannot serially annotates transposable elements (TE) from multiple whole-genome assemblies using a common reference library. panTEannot is a light version of [TEannot](https://doi.org/10.1371/journal.pcbi.0010022) from [REPET](https://urgi.versailles.inra.fr/Tools/REPET) package.
Specifically, as we are interested in inter-individual TE variability, which is due in particular to recent TE transpositions, we focused on non-degenerate sequences, i.e. complete TE copies as a proxy.
The annotations files (Gff3) can be used as input for the [panREPET](https://forgemia.inra.fr/urgi-anagen/panREPET) tool.
Install [Conda](https://docs.conda.io/en/latest/miniconda.html) (>=4.12.0)
Install Snakemake via [mamba](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) (mamba >=0.22.1, snakemake >=7.3.8)
Install [Singularity](https://docs.sylabs.io/guides/3.0/user-guide/installation.html) (>=3.8.7-1.el7)
Load the TEfinder2.31 Singularity Image File (SIF) containing the cutterDB, Blaster, Matcher and pathnum2id scripts/tools:
```
cd containers
singularity pull --arch amd64 library://hquesneville/default/te_finder:2.31
```
For more details about Blaster and Matcher tools: Quesneville, H., Nouaud, D. & Anxolabéhère, D. Detection of New Transposable Element Families in Drosophila melanogaster and Anopheles gambiae Genomes . J Mol Evol57 (Suppl 1), S50–S59 (2003). https://doi.org/10.1007/s00239-003-0007-2.
│ └── scripts
│ │ ├── convCoord.py
│ │ └── AnnotationStats
│ │ │ ├── AnnotationStats.py
│ │ │ ├── getCumulLengthFromTEannot.py
│ │ │ ├── Stat.py
snakemake --forceall --dag --configfile config/test.yaml | dot -Tpng > dag.png
accessions: genomes paths to annotate, for example:
```
accessions :
acc1: acc1.fa
acc2: acc2.fa
```
* batch_size: genomes are chunked then batched, choose size of the batches (10 by default)
* blaster_sensitivity : sensitivity of Blaster tool (1 by default)
containers : specify the TEfinder singularity container pathway (tefinder2.31.sif)
statistics: Simple or Long (Simple by default)
### Test
You can execute a test based on data you can find in data folder composed of:
- 2 whole-genome assemblies of Brachypodium distachyon from [Phytozome database](https://phytozome.jgi.doe.gov/) (Gordon _et al._ 2017) : [data/Bdis_TEdenovoGr.fa](https://phytozome.jgi.doe.gov/info/556) (Bd21 v3.2) and [data/ABR2_337.fa.formated](https://phytozome.jgi.doe.gov/info/337)
- a TE library built denovo with TEdenovo pipeline from REPET v3.0 (https://urgi.versailles.inra.fr/Tools/REPET, Flutre _et al._ 2011) on Bd21 v3.2 genome (data/Bdis_TEannotGr_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE_FullLengthCopy.fa)
Please specify in workflow/run_Snakemake.sh file, the argument `--singularity-args '--bind /home/myhome'` if necessary.
The configfile in workflow/config/test.yaml is already set up.
```
cd workflow
nohup ./run_Snakemake.sh &> test.log &
```
Expected results are in results folder (results/\*/run_blaster-S2.align.clean_match.path.total.out.coordchr.merged.gff and results/\*/annotStats/run_blaster-S2.align.clean_match.path.total.out.coordchr.merged.globalAnnotStatsPerTE.txt). The test execution took 78 minutes (real time) on 8 cores 16 RAM (specify --cores 8 --resources mem_gb=16).