Skip to content
Snippets Groups Projects
README.md 7.09 KiB
Newer Older
Juliette Cooke's avatar
Juliette Cooke committed
# SAMBA
## Description
SAMpling Biomarker Analysis is a toolset for running flux sampling on metabolic networks, predicting biomarkers 
Juliette Cooke's avatar
Juliette Cooke committed
(or metabolic profiles) for specific metabolic conditions, and representing the results visually.
Juliette Cooke's avatar
Juliette Cooke committed

It uses a Snakemake pipeline to manage the entire workflow, starting from a metabolic network and a set of scenarios of reactions or genes to disrupt, and ending with a change score (z-score) for each exchange metabolite in the network, for each disruption scenario.

Juliette Cooke's avatar
Juliette Cooke committed

## Getting started
### Requirements
- [cobrapy](https://pypi.org/project/cobra/)
- [sambaflux](https://pypi.org/project/sambaflux/) (see `cluster_install.sh` for installation commands.)
Juliette Cooke's avatar
 
Juliette Cooke committed
- A solver (GLPK, CPLEX, GUROBI) (CPLEX 12.10 does not work with Python 3.8+)  
- Access to a computer cluster or powerful computer for large metabolic networks

### Installation and usage
0. Connect to a cluster using `ssh`
Juliette Cooke's avatar
Juliette Cooke committed
    - *Example*: `ssh <username>@genologin.toulouse.inra.fr`
1. On the cluster, use `pipeline/cluster_install.sh` to install environments and requirements. You can submit the file to a cluster job using `sbatch cluster_install.sh`, or connect interactively to a node using `srun --pty bash` and running the commands manually. Make sure to set the ENVPATH and WORKINGDIR paths beforehand, and make sure the folders exist. ENVPATH is where the Python environment will be created. WORKINGDIR is where the SAMBA project (this project) will be cloned to, and will be where you run the cluster jobs from.
Juliette Cooke's avatar
Juliette Cooke committed
    - You can also run `WORKINGDIR=/path/to/folder` to be able to use `$WORKINGDIR` in your cluster or local terminal in the next steps.
    - If you're using CPLEX, make sure to set up CPLEX by adding CPLEX to the PYTHONPATH (see commented line in `cluster_install.sh`).
    - Make sure the Python module you're using contains Snakemake.
    - Once installed, you will only need to run the `git clone --depth 1 https://forgemia.inra.fr/metexplore/cbm/samba-project/samba.git` to a different folder to launch a new sampling run with different parameters.

2. Using your preferred file copy method, send the metabolic network file and file with reactions/genes to KO to the cluster, to your `$WORKINGDIR/samba/pipeline/data/`. 
Juliette Cooke's avatar
Juliette Cooke committed
    - *Example from a local PC*: `rsync -aP /path/to/local/folder/data/ <username>@genologin.toulouse.inra.fr:$WORKINGDIR/samba/pipeline/data/`
2. `cd` into your WORKINGDIR and edit `config.yaml` with the correct parameters. You can change these to use different models and KO files.
Juliette Cooke's avatar
Juliette Cooke committed
    - *Example*: 
Juliette Cooke's avatar
 
Juliette Cooke committed
    ```bash
Juliette Cooke's avatar
 
Juliette Cooke committed
    ```
Juliette Cooke's avatar
Juliette Cooke committed
    `:q` to quit vim, `:x` to save and quit vim.
    - **Parameters**:
        - `model`: model filename added in `data/`
        - `samba_path`: path to the samba scripts folder. Since you will be running files from the pipeline/ folder, this can be set to a relative path `../local/scripts`, or absolute path `$WORKINGDIR/samba/local/scripts`.
        - `ids`: whether you are using an input gene/reaction KO file (`"simple"`) or a scenario-based KO (`"scenario"`).
Juliette Cooke's avatar
Juliette Cooke committed
        - `ids_file`: file containing genes or reactions to KO if `ids: "simple"`. First column `Scenario` contains descriptive scenario names (do not have to be unique), second column `IDs` contains the gene or reaction IDs to KO. Multiple IDs in one scenario should be separated using a space. Third (optional) column Reduction contains a value between 0 and 1 of the percentage of maximum flux for that group of reactions or genes to be set to  (e.g. 0.3 means the corresponding reactions will be set to 30% of their maximum fluxes). This option is ignored if using `ids: "scenario"`.
        - `scale`: if using `ids: "scenario"`, set the scale to `"pathway"` to KO within pathways o r `"network"` to KO within the entire network. Is ignored if using `ids: "ids"`.
        - `scenario`: if using `ids: "scenario"`, set the scenario to `"singlerxn"` to KO one random reaction or `"allrxn"` to KO all reactions in <scale>. Is ignored if using `ids: "ids"`.
        - `each`: if using `ids: "scenario"`, set each to `"--each"` to enable looping over <scale> to generate <scenario> KO IDs. Is ignored if using `ids: "ids"`.
        - `nsamples`: number of samples to use. `100000` is recommended for large human metabolic networks.
        - `biomass`: minimum amount of biomass to optimise for.
        - `biomassfile`: tsv file containing a `Model` column with model names, and `Biomass` with the model's biomass reaction. You can add or replace a row to the existing file where needed. Only used if `biomass` != 0.
        - `exchangemin`: value to set the exchange reaction lower bounds to (will be negative), e.g. `1` will results in exchange reactions being set to `[-1, 1000]`.
        - `rxns_to_output`: reactions to output flux samples for: "all" "exchanges" "<filename>" containing IDs
        - `fva`: `--fva` or `""` to also calculate FVA bounds in the same conditions as sampling.
        - `onlyfva`: `--onlyfva` or `""` to only run FVA instead of sampling.
Juliette Cooke's avatar
Juliette Cooke committed
        - `zscoresample`: Percent of total samples to sample from to calculate z-scores, between 0 and 1. For example, setting `zscoresample` to 0.6 will make SAMBA use (randomly sampled) 60% of all samples to calculate the z-scores.
3. Edit `submit_slurm.sh` to set ENVPATH to the same path you used before. You can also change the job name, error and out filenames, and add other SBATCH parameters.
4. Make sure you are in `$WORKINGDIR/samba/pipeline/`, and run `sbatch submit_slurm.sh`.
Juliette Cooke's avatar
Juliette Cooke committed
    - You can watch the job submissions using `watch squeue -u <username>` (`Ctrl+C` to exit the watch).
5. Results can be found in `<model_name>_<KO_file_name>/output/`.
    - `zscores.tsv` contains the z-scores for all exchange metabolites and all scenarios.
    - `densities.json` contains a density version of the sampling distributions for all exchange metabolites and all scenarios. Used in R for plotting purposes.
Juliette Cooke's avatar
Juliette Cooke committed
6. You may also need the files in `<model_name>_<KO_file_name>/dict/` to convert from the unique scenario IDs back to the original scenarios, and to convert metabolite IDs to names for plotting/readability purposes.
Juliette Cooke's avatar
Juliette Cooke committed
SAMBAR is not an installation requirement to run SAMBA: it is however useful for importing and plotting SAMBA results in R scripts.
Juliette Cooke's avatar
Juliette Cooke committed
Requirements for SAMBAR:
- R (currently used version: 4.2.2)
- R.utils
- optparse
- ggh4x
Juliette Cooke's avatar
Juliette Cooke committed
- [sambar](https://forgemia.inra.fr/metexplore/cbm/samba-project/sambar/-/releases/permalink/latest/downloads/sambar)
Install via bash:
```bash
Rscript -e 'install.packages("https://forgemia.inra.fr/metexplore/cbm/samba-project/sambar/-/releases/permalink/latest/downloads/sambar", repos = NULL)'
```
Install via R:
```R
install.packages("https://forgemia.inra.fr/metexplore/cbm/samba-project/sambar/-/releases/permalink/latest/downloads/sambar", repos = NULL)
```
Juliette Cooke's avatar
Juliette Cooke committed

Juliette Cooke's avatar
Juliette Cooke committed
## Confirmed working models
- Human1 (HumanGEM)
- Recon2
- Toy model (small4m) with added exchange reactions
Juliette Cooke's avatar
Juliette Cooke committed

Juliette Cooke's avatar
Juliette Cooke committed
## Roadmap
- [x] Add support for multiple KOs at once
Juliette Cooke's avatar
Juliette Cooke committed
- [x] Add the option to reduce flux rates in addition to KO
## Authors
Juliette Cooke
Juliette Cooke's avatar
Juliette Cooke committed

## Acknowledgments
Juliette Cooke's avatar
Juliette Cooke committed


## License
MIT License (see LICENSE file)
Juliette Cooke's avatar
Juliette Cooke committed

## Project status
Currently active on this project.