Introduction¶

Current metagenomics workflows can exploit three different streams of analysis: read-based, assembly-based, and binning-based. Read-based and/or assembly-based analyses are often neglected in favor of binning-driven inferences on the basis of their different reliability and sensitivity. However, the filtering steps involved in moving from reads to bins progressively reduce the potential amount of information, and thus the meaningfulness of the obtained data. Therefore, there was, the need to create a metagenomic workflow that would combine these three different streams of analysis. Geomosaic was created to meet this purpose. This pipeline allows annotations to be performed on the three streams of analysis, and specially devised for the easy integration of the various programs and packages required. This approach maximizes the output of information gathered from raw data. Even so, Gemosaic flexibility allows the user to completely customize the analysis by choosing any stream of analysis, and to further tailor it with modules and packages. Thus, the Geomosaic workflow is sewed for the user purposes.

Geomosaic Graph Structure¶

The base graph structure is made up of three main analyses that have to be taken into account when choosing the desired workflow:

Stream	Module	Depends on
`Read-based`	Pre-processing	-
`Assembly-based`	Assembly	Pre-processing
`Binning-based`	Binning	Pre-processing, Assembly

In fact, these dependencies can not be overlooked when generating the workflow with Geomosaic. For instance, ignoring the Assembly module hinders the execution of the Binning module exactly because of the dependency-based structure.

The full tree of dependencies among all modules is shown here.

modules_DAG

Integrated modules¶

To summarise, the dependency tree has to be considered when ignoring specific modules, as they may inadvertently block other modules in the current or the next stream of analysis.

Stream-level	Modules	Packages
Read-based	Pre Processing	fastp
		trimgalore
		trimmomatic
	Reads Quality Check	fastqc + reads count
	Functional Annotation	ARGs-OAP with Custom DB
	Functional Annotation	mi-faser
	Taxonomic Annotation	Kaiju
	Taxonomic Annotation	metaPhlAn
Assembly Based	Assembly	metaSPAdes
	Assembly	Megahit
	Assembly Quality Check	Quast
	Assembly Quality Check	Meta-Quast
	Read Mapping	Bowtie2
		Bowtie2 - Output without unmapped reads
		BBMap
		BBMap - Output without unmapped reads
	Read Coverage	CoverM (contigs)
	Taxonomic Annotation	Kraken2
	ORF Prediction	Prodigal
	Domain Annoation	reCOGnizer
	HMM Annotation	HMMSearch
	ORF Annotation	eggNOG-mapper
	ORF Annotation	KOfam Scan
	Functional Annotation	Bakta
Binning Based	Binning	Multi-Binners (Metabat2 + MaxBin2 + SemiBin2)
	Binning De-replication	DAS Tool
	Binning Quality Assessment	CheckM
	MAGs Retrieval	MAGs Retrieval
	MAGs Functional Annotation	DRAM
	MAGs Functional Annotation	Bakta
	MAGs Taxonomic Annotation	GTDBtk
	MAGs ORF Prediction	Prodigal
	MAGS Domain Annotation	reCOGnizer
	MAGs ORF Annotation	KOfam Scan
	MAGs Coverage	CoverM (Genome)
	MAGs HMM Annotation	HMMSearch

Modules that could be integrated in future¶

The following modules are under evaluation for future integration.

Read-based:

Functional annotation
- mi-faser (with custom database)

Assembly-based:

Functional Annotation:
- Prokka
Taxonomic Annotation
- cat/bat

However, if you know a module/package you would like to see integrated into Geomosaic, you can open an issue with all the information asking for this integration. At the moment, we accept only packages that can be installed from any Conda channel.