# Introduction Current metagenomics workflows can exploit three different streams of analysis: read-based, assembly-based, and binning-based. Read-based and/or assembly-based analyses are often neglected in favor of binning-driven inferences on the basis of their different reliability and sensitivity. However, the filtering steps involved in moving from reads to bins progressively reduce the potential amount of information, and thus the meaningfulness of the obtained data. Therefore, there was, the need to create a metagenomic workflow that would combine these three different streams of analysis. Geomosaic was created to meet this purpose. This pipeline allows annotations to be performed on the three streams of analysis, and specially devised for the easy integration of the various programs and packages required. This approach maximizes the output of information gathered from raw data. Even so, Gemosaic flexibility allows the user to completely customize the analysis by choosing any stream of analysis, and to further tailor it with modules and packages. Thus, the Geomosaic workflow is sewed for the user purposes. ## Geomosaic Graph Structure The base graph structure is made up of three main analyses that have to be taken into account when choosing the desired workflow: | Stream | Module | Depends on | |-------|------|--------| | `Read-based`| Pre-processing | - | | `Assembly-based`| Assembly | Pre-processing | | `Binning-based`| Binning | Pre-processing, Assembly | In fact, these dependencies can not be overlooked when generating the workflow with Geomosaic. For instance, ignoring the `Assembly` module hinders the execution of the `Binning` module exactly because of the dependency-based structure. The full tree of dependencies among all modules is shown here. ![modules_DAG](_static/images/modules_DAG.png) ## Integrated modules To summarise, the dependency tree has to be considered when ignoring specific modules, as they may inadvertently block other modules in the current or the next stream of analysis.
Stream-level | Modules | Packages |
---|---|---|
Read-based | Pre Processing | fastp |
trimgalore | ||
trimmomatic | ||
Reads Quality Check | fastqc + reads count | |
Functional Annotation | mi-faser | |
Taxonomic Annotation | Kaiju | |
metaPhlAn | ||
Assembly-Based | Assembly | metaSPAdes |
Megahit | ||
Assembly Quality Check | Quast | |
Meta-Quast | ||
Read Mapping | Bowtie2 | |
Bowtie2 - Output without unmapped reads | ||
BBMap | ||
BBMap - Output without unmapped reads | ||
Read Coverage | CoverM (contigs) | |
Taxonomic Annotation | Kraken2 | |
ORF Prediction | Prodigal | |
Domain Annoation | reCOGnizer | |
HMM Annotation | HMMSearch | |
ORF Annotation | eggNOG-mapper | |
KOfam Scan | ||
Functional Annotation | Bakta | |
Binning-Based | Binning | Multi-Binners (Metabat2 + MaxBin2 + SemiBin2) |
Binning De-replication | DAS Tool | |
Binning Quality Assessment | CheckM | |
MAGs Retrieval | MAGs Retrieval | |
MAGs Functional Annotation | DRAM | |
Bakta | ||
MAGs Taxonomic Annotation | GTDBtk | |
MAGs ORF Prediction | Prodigal | |
MAGS Domain Annotation | reCOGnizer | |
MAGs ORF Annotation | KOfam Scan | |
MAGs Coverage | CoverM (Genome) | |
MAGs HMM Annotation | HMMSearch |