geomosaic unit

Overview

It allows to choose and run just one module to execute an alternative package for that module. The command creates additional Snakefile and config file for the chosen modules in the geomosaic directory.

geomosaic unit --help
usage: geomosaic unit -s SETUP_FILE -m MODULE [-t THREADS] [-h]

DESCRIPTION: It allows to choose and run just one module, for example to execute an alternative package for that module. The command create another Snakefile a config file (both in the geomosaic directory) with the chosen module

Required Arguments:
  -s SETUP_FILE, --setup_file SETUP_FILE
                        Geomosaic setup file created from the geomosaic setup ... command.
  -m MODULE, --module MODULE
                        Modules to execute.

Optional Arguments:
  -t THREADS, --threads THREADS
                        Threads to use (per sample).

Available Modules:
  
  - pre_processing - Module: Pre processing with quality check
  - reads_qc - Module: Quality check (and/or Reads Count) of the reads after Pre-Processing
  - func_classification_rb - Module: Functional classification [Read-based]
  - tax_classification_rb - Module: Taxonomic classification [Read-based]
  - assembly - Module: Assembly
  - assembly_func_annotation - Module: Functional Annotation on Assembly [Assembly-based]
  - assembly_qc - Module: Assembly quality evaluation with metrics [Assembly-based]
  - assembly_readmapping - Module: Read mapping the processed reads to the Assembly [Assembly-based]
  - assembly_coverage - Module: Assembly coverage computation [Assembly-based]
  - assembly_tax_annotation - Module: Taxonomic Annotation on Assembly [Assembly-based]
  - orf_prediction - Module: perform ORF prediction [Assembly-based]
  - domain_annotation - Module: Domain annotation on Assembly [Assembly-based]
  - assembly_hmm_annotation - Module: tracking coverage for genes of interest through HMM models [Assembly-based]
  - assembly_orf_annotation - Module: Functional Annotation on Assembly (ORFs) [Assembly-based]
  - binning - Module: Binning
  - binning_derep - Module: Binning Deeplication [Binning-based]
  - binning_qa - Module: Binning Quality Evaluation [Binning-based]
  - mags_retrieval - Module: get MAGs based on Contamination and Completeness threshold [Binning-based]
  - mags_func_annotation - Module: perform functional annotation on MAGs [Binning-based]
  - mags_tax_annotation - Module: Perform taxonomic annotation of filtered MAGs [Binning-based]
  - mags_orf_prediction - Module: perform ORF prediction for each retrieved MAG [Binning-based]
  - mags_domain_annotation - Module: perform domain-based annotation for each retrieved MAG [Binning-based]
  - mags_orf_annotation - Module: perform functional annotation on ORF retrieved from filtered MAGs [Binning-based]
  - mags_coverage - Module: Computing read coverage for each retrieved MAG [Binning-based]
  - mags_hmm_annotation - Module: tracking coverage for genes of interest through HMM models on MAGs [Binning-based]

Help Arguments:
  -h, --help            show this help message and exit

Arguments

This command has two required and two optional arguments:

  • REQUIRED

    • (-s) Specifiy the name of the Geomosaic config file, obtained with the setup command.

    • (-m) Module where to start creating the workflow (Default: pre_processing)

  • OPTIONAL

    • (-t) Threads to use (per sample)

What to expect from this command

After completing this command, Geomosaic generates three files in its working directory:

  • Snakefile_unit.smk - the Snakefile with the code for the chosen module and package.

  • config_unit.yaml - the config file for snakemake execution.

  • Snakefile_extdb.smk - Eventually, this file will be created by Geomosaic if the chosen module needs external data. Similarly to what was described above for the geomosaic workflow, this file should be executed before the Snakefile_unit.smk.

Example usage geomosaic unit

IMPORTANT: the following images may not refer to the modules that are integrated in the current version of Geomosaic. However, these images are very useful to understand how it works the creation of the single unit module.

geomosaic unit -s gmsetup_exp2023.yaml -m assembly -t 20

After executing this command, geomosaic will ask you for the desired package and then for the dependencies that have been run (which is not executed)

Geomosaic Process: Loading variables from Geomosaic setup file... 
--> OK <--

Module: Assembly
0) -- Ignore this module (and all successors) --
1) MetaSpades
2) MegaHit
1
Geomosaic Note: It is assumed also that those modules dependencies have already been run with Geomosaic
Geomosaic Note: 'assembly' depends on the following modules:
	- pre_processing

Now you need to specify the package/s that you used for those dependencies.

Module: Pre processing with quality check
0) -- Ignore this module (and all successors) --
1) fastp
2) Trim-Galore
3) Trimmomatic
1
Building DAG of jobs...