Skip to content
Snippets Groups Projects
README.md 4.13 KiB
Newer Older
Johann Confais's avatar
Johann Confais committed
# panTEannot
SAIDI SOMIA's avatar
SAIDI SOMIA committed

Johann Confais's avatar
Johann Confais committed
panTEannot is described in this preprint available on [BioRxiv](https://doi.org/10.1101/2024.06.17.598857)
Johann Confais's avatar
Johann Confais committed

Johann Confais's avatar
Johann Confais committed
## Description

Johann Confais's avatar
Johann Confais committed
panTEannot serially annotates transposable elements (TE) from multiple whole-genome assemblies using a common reference library. panTEannot is a light version of [TEannot](https://doi.org/10.1371/journal.pcbi.0010022) from [REPET](https://urgi.versailles.inra.fr/Tools/REPET) package. 
Johann Confais's avatar
Johann Confais committed
Specifically, as we are interested in inter-individual TE variability, which is due in particular to recent TE transpositions, we focused on non-degenerate sequences, i.e. complete TE copies as a proxy. 

Johann Confais's avatar
Johann Confais committed
The annotations files (Gff3) can be used as input for the [panREPET](https://forgemia.inra.fr/urgi-anagen/panREPET) tool.
Johann Confais's avatar
Johann Confais committed

SAIDI SOMIA's avatar
SAIDI SOMIA committed
![panTEannot_figure1](panTEannot.png)

Johann Confais's avatar
Johann Confais committed
## Installation
SAIDI SOMIA's avatar
SAIDI SOMIA committed

Install [Conda](https://docs.conda.io/en/latest/miniconda.html) (>=4.12.0)

Install Snakemake via [mamba](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) (mamba >=0.22.1, snakemake >=7.3.8)

Install [Singularity](https://docs.sylabs.io/guides/3.0/user-guide/installation.html) (>=3.8.7-1.el7)

SAIDI SOMIA's avatar
SAIDI SOMIA committed
Load the TEfinder2.31 Singularity Image File (SIF) containing the cutterDB, Blaster, Matcher and pathnum2id scripts/tools:
```
cd containers
singularity pull --arch amd64 library://hquesneville/default/te_finder:2.31
```
For more details about Blaster and Matcher tools: Quesneville, H., Nouaud, D. & Anxolabéhère, D.  Detection of New Transposable Element Families in Drosophila melanogaster and Anopheles gambiae Genomes . J Mol Evol57 (Suppl 1), S50–S59 (2003). https://doi.org/10.1007/s00239-003-0007-2.

Johann Confais's avatar
Johann Confais committed
## Directory tree
SAIDI SOMIA's avatar
SAIDI SOMIA committed

```
.
├── README.md
├── containers
SAIDI SOMIA's avatar
SAIDI SOMIA committed
│   └── how_install_te-finder.txt
SAIDI SOMIA's avatar
SAIDI SOMIA committed
├── workflow
│   ├── Snakefile
│   ├── config
SAIDI SOMIA's avatar
SAIDI SOMIA committed
│           └── test.yaml 
SAIDI SOMIA's avatar
SAIDI SOMIA committed
│   ├── run_Snakemake.sh
SAIDI SOMIA's avatar
SAIDI SOMIA committed
│   ├── dag.png
│   ├── results
SAIDI SOMIA's avatar
SAIDI SOMIA committed
│   └── scripts
│       │   ├── convCoord.py
│       │   └── AnnotationStats
│       │   │   ├── AnnotationStats.py
│       │   │   ├── getCumulLengthFromTEannot.py
│       │   │   ├── Stat.py
SAIDI SOMIA's avatar
SAIDI SOMIA committed
└──     └── └── └── AnnotationStatsWriter.py 
SAIDI SOMIA's avatar
SAIDI SOMIA committed
```

SAIDI SOMIA's avatar
SAIDI SOMIA committed
## Build DAG in png
SAIDI SOMIA's avatar
SAIDI SOMIA committed

SAIDI SOMIA's avatar
SAIDI SOMIA committed
To visualize your DAG (Directed Acyclic Graph):
SAIDI SOMIA's avatar
SAIDI SOMIA committed

```
cd workflow
SAIDI SOMIA's avatar
SAIDI SOMIA committed
snakemake --forceall --dag --configfile config/test.yaml | dot -Tpng > dag.png
SAIDI SOMIA's avatar
SAIDI SOMIA committed
```

SAIDI SOMIA's avatar
SAIDI SOMIA committed
## Execution

Johann Confais's avatar
Johann Confais committed
### How set up configfile:
SAIDI SOMIA's avatar
SAIDI SOMIA committed

project_dir : project path

SAIDI SOMIA's avatar
SAIDI SOMIA committed
TE_ref: library of TE references, please specify absolute path
SAIDI SOMIA's avatar
SAIDI SOMIA committed

accessions: genomes paths to annotate, for example:

```
accessions :
  acc1: acc1.fa
  acc2: acc2.fa
```

SAIDI SOMIA's avatar
SAIDI SOMIA committed
param :
SAIDI SOMIA's avatar
SAIDI SOMIA committed

* batch_size: genomes are chunked then batched, choose size of the batches (10 by default)
SAIDI SOMIA's avatar
SAIDI SOMIA committed
* blaster_sensitivity : sensitivity of Blaster tool (1 by default)
SAIDI SOMIA's avatar
SAIDI SOMIA committed

containers : specify the TEfinder singularity container pathway (tefinder2.31.sif)

statistics: Simple or Long (Simple by default)
SAIDI SOMIA's avatar
SAIDI SOMIA committed

### Test

You can execute a test based on data you can find in data folder composed of:

- 2 whole-genome assemblies of Brachypodium distachyon from [Phytozome database](https://phytozome.jgi.doe.gov/) (Gordon _et al._ 2017) : [data/Bdis_TEdenovoGr.fa](https://phytozome.jgi.doe.gov/info/556) (Bd21 v3.2) and [data/ABR2_337.fa.formated](https://phytozome.jgi.doe.gov/info/337)
- a TE library built denovo with TEdenovo pipeline from REPET v3.0 (https://urgi.versailles.inra.fr/Tools/REPET, Flutre _et al._ 2011) on Bd21 v3.2 genome (data/Bdis_TEannotGr_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE_FullLengthCopy.fa)

Please specify in workflow/run_Snakemake.sh file, the argument `--singularity-args '--bind /home/myhome'` if necessary.

The configfile in workflow/config/test.yaml is already set up.

```
cd workflow
nohup ./run_Snakemake.sh &> test.log &
```

Expected results are in results folder (results/\*/run_blaster-S2.align.clean_match.path.total.out.coordchr.merged.gff and results/\*/annotStats/run_blaster-S2.align.clean_match.path.total.out.coordchr.merged.globalAnnotStatsPerTE.txt). The test execution took 78 minutes (real time) on 8 cores 16 RAM (specify --cores 8 --resources mem_gb=16).