Workflow description

Input data

MMASeq (Mixed Microbial Analysis on Sequencing data) accepts as input either raw sequencing reads (FASTQ format) or assembled genomes (FASTA format). At the present state, the pipeline is designed to handle short-read. However, long-read support is planned for future releases. The input data is specified through a samplesheet, which allows for batch processing of multiple samples in a single run. Each sample can be associated with its own set of reads and/or assemblies, enabling flexible analysis across different species or strains. The pipeline also supports the use of species-specific configuration files, which can be linked to each sample in the samplesheet to tailor the analysis according to the organism being studied. This modular approach ensures that the pipeline can be easily adapted to a wide range of bacterial species and genomic contexts, making it a versatile tool for microbial genomics research and surveillance.

Repository Structure

Within the MMAseq folder repository, the project is organized into several key directories and files that facilitate the workflow and its management:

src/

It is the top level directory that contains all snakemake-related files: rule definitions, species-specific configs, conda environments, and helper scripts.

docs/

Includes the the markdow documentation files and assets, used to build the project documentation.

LICENSE

The LICENSE file contains the terms under which the software is distributed, specifying the permissions and limitations for users of the software.

README.md

TheREADME.md provides an overview of the project and other relevant information for users.

mkdocs.yml

The mkdocs.yml file is the configuration file for MkDocs, a static site generator used to build the project documentation. It defines the structure and settings for the documentation site.

pyproject.toml

The pyproject.toml file is a configuration file for Python projects, specifying the project metadata, dependencies, and build system requirements. It is used to manage the Python environment and dependencies for the project, ensuring that all necessary packages are installed and compatible with the workflow.

version_guide.md

The version_guide.md file provides information about the versioning strategy and release notes for the project.
Click to expand: Full project tree structure
.
├── LICENSE
├── README.md
├── mkdocs.yml
├── pyproject.toml
├── version_guide.md
├── src
    ├── mmaseq/
        ├── config/
            |   ├── target_screening/              # Metadata Folder
            |   ├── species_configs/               # Folder containing all species configuration file
            |   ├── results_catalogue.yaml         # File containing all the information regarding the results generated by the pipeline                 
            |   ├── Test.yaml                    # Main pipeline default configuration file                 
            ├── data/
            │   ├── assemblies/                    # Example assemblies  
            |   ├── reads/                         # Reads folder
            |   |    └── reads.urls                # Example input reads urls file
            │   ├── samplesheet_small.tsv          # Reduced example input sheet
            │   └── samplesheet.tsv                # Complete example input sheet
            ├── workflow/
            │   ├── Snakefile                      # Main Snakemake workflow definition
            │   ├── envs/                          # Folder containing all environments configuration files 
            │   ├── rules/                         # Folder containing all snakemake rules 
            │   └── scripts/                       # Folder containing all helper scripts
            ├── utils/                             # Folder containing all helper functions and classes
            |   ├── PATH.py                        # File containing all paths used in the pipeline
            |   ├── helpers.py                     # File containing helper functions and classes    
            |   ├── logging_setup.py               # File containing the logging setup for the pipeline  
            |   ├── results_paths.py               # File containing the functions to generate the paths for the results generated by the pipeline
            |   ├── results_aggregator.py          # File containing the functions to aggregate the results generated by the pipeline
            |   └── sample_config.py               # File containing the functions to parse the sample configuration files 
            ├── deploy.py                          # Script to deploy the pipeline into a specified location and run the pipeline on a test dataset
            └── mmaseq.py                          # Main launcher of the pipeline