QIIME2 Workflow with custom database

Hello! Today I’ll be your guide through the main workflow of Qiime2 (Which we use to attain ASVs).

🧬 Quick Introduction: Qiime2 and DADA2

Qiime2 is an open-source, modular, and extensible bioinformatics platform designed for microbiome and amplicon sequence data analysis. It provides a full pipeline from raw sequencing reads to final taxonomic and diversity analysis, and offers a powerful plugin system with reproducible tracking (“provenance”) of all steps.

Key features of Qiime2:

Integrates multiple tools (e.g., DADA2, Deblur, feature classifiers).
Tracks all parameters and steps for full reproducibility.
Outputs shareable artifacts (.qza) and visualizations (.qzv).
Community-supported plugins and standard formats.

DADA2, on the other hand, is an independent algorithm (and also an R package) specialized in high-resolution denoising of amplicon data. It performs:

Error correction of raw sequences.
Exact sequence variant inference (ASVs), avoiding arbitrary OTU clustering.
Chimera removal.

Main differences:

Qiime2 is a full pipeline platform; DADA2 is mainly an error-correction and ASV inference tool.
In Qiime2, you can use DADA2 as one module (via qiime dada2 denoise-* commands), integrating it into a broader pipeline including demultiplexing, taxonomy assignment, diversity analyses, etc.
DADA2 can also be used separately in R, giving more flexibility for custom error models and filtering.

In this workflow, we use DADA2 inside Qiime2 for denoising and ASV generation, and then continue the analysis in Qiime2, followed by export to R (microeco).

📁 1️⃣ Prepare folder and files

Example structure:

We used the LBCM 5.0 server to do this analysis. Files are stored in:

/…/Cris/
├── reads/
│ ├── F26_1.fq.gz
│ ├── F26_2.fq.gz
│ ├── …
│ └── sample_sheet.txt
├── MaarjAM/
│ ├── maarjam_database_onlyITS.qiime.fasta
│ └── maarjam_database_onlyITS.qiime.txt

💻 2️⃣ Import reads into Qiime2

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path /Dados2/Cris/reads/sample_sheet.txt \
  --output-path demux-paired-end.qza \
  --input-format PairedEndFastqManifestPhred33V2

🗂️ 📄 Manifest file format

Your sample_sheet.txt file should be a tab-separated text file with three columns, formatted exactly as shown below.

sample-id	forward-absolute-filepath	reverse-absolute-filepath
F26	/Dados2/Cris/reads/F26_1.fq.gz	/Dados2/Cris/reads/F26_2.fq.gz
F27	/Dados2/Cris/reads/F27_1.fq.gz	/Dados2/Cris/reads/F27_2.fq.gz
F28	/Dados2/Cris/reads/F28_1.fq.gz	/Dados2/Cris//reads/F28_2.fq.gz

✅ Important:

Must be tab-separated, not spaces or commas.
File paths must be absolute (starting with /)
The header line must match exactly as shown.

🔎 3️⃣ Summarize and check quality

qiime demux summarize \
  --i-data demux-paired-end.qza \
  --o-visualization demux-summary.qzv

✅ Open demux-summary.qzv at https://view.qiime2.org and check where quality drops (we chose ~210 bp).

✂️ 4️⃣ Denoise with DADA2

Use DADA2 to denoise, merge, and infer ASVs (Amplicon Sequence Variants).

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux-paired-end.qza \
  --p-trim-left-f 0 \
  --p-trim-left-r 0 \
  --p-trunc-len-f 210 \
  --p-trunc-len-r 210 \
  --o-table table.qza \
  --o-representative-sequences rep-seqs.qza \
  --o-denoising-stats denoising-stats.qza

✅ 5️⃣ Summarize denoising outputs

These commands help you check how many reads were kept and examine your ASV table and sequences.

qiime feature-table summarize \
  --i-table table.qza \
  --o-visualization table.qzv \
  --m-sample-metadata-file /Dados/Cris/Ampliseq/reads/sample_sheet.txt

qiime feature-table tabulate-seqs \
  --i-data rep-seqs.qza \
  --o-visualization rep-seqs.qzv

qiime metadata tabulate \
  --m-input-file denoising-stats.qza \
  --o-visualization denoising-stats.qzv

Pro tip: Open .qzv files at https://view.qiime2.org and check the interactive plots for quality control

🧬 6️⃣ Import MaarjAM ITS database and train classifier

Import the custom ITS MaarjAM reference sequences and taxonomy, then train a Naive Bayes classifier.

qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path MaarjAM/maarjam_database_onlyITS.qiime.fasta \
  --output-path maarjam_ref_seqs.qza

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path MaarjAM/maarjam_database_onlyITS.qiime.txt \
  --output-path maarjam_ref_taxonomy.qza

qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads maarjam_ref_seqs.qza \
  --i-reference-taxonomy maarjam_ref_taxonomy.qza \
  --o-classifier MaarjAM_classifier.qza

As we are using a naives bayes classifier, it will try to force every ASV into a classification. As the MaarjAM database only contains AMF taxonomy, all ITS will be forcefully classified. Results might be misleading if not cerefully curated

🏷️ 7️⃣ Assign taxonomy to ASVs

Classify your representative sequences using the trained classifier.

qiime feature-classifier classify-sklearn \
  --i-classifier MaarjAM_classifier.qza \
  --i-reads rep-seqs.qza \
  --o-classification taxonomy.qza

qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --o-visualization taxonomy.qzv

✅ Check taxonomy.qzv to inspect taxonomic assignments.

📊 8️⃣ Create taxonomy bar plots

Visualize the relative abundance of AMF (and other possible assignments) across samples and check for errors.

qiime taxa barplot \
  --i-table table.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file /Dados/Cris/Ampliseq/reads/sample_sheet.txt \
  --o-visualization taxa-bar-plots.qzv

💾 9️⃣ Export outputs for R (microeco)

Export data for further ecological analysis and custom plots in R.

qiime tools export \
  --input-path table.qza \
  --output-path exported-feature-table

biom convert \
  -i exported-feature-table/feature-table.biom \
  -o exported-feature-table/feature-table.tsv \
  --to-tsv

qiime tools export \
  --input-path rep-seqs.qza \
  --output-path exported-rep-seqs

 qiime tools export \
  --input-path taxonomy.qza \
  --output-path exported-taxonomy

The qiime2meco() function in R is designed to create the microtable object using files from QIIME2 (Bolyen et al. 2019).

Use data files inside the package which were downloaded from pd-mice.

library(file2meco)

abund_file_path <- system.file("extdata", "dada2_table.qza", package="file2meco")

Tsv file of metadata

sample_file_path <- system.file("extdata", "sample-metadata.tsv", package="file2meco")

taxonomy_file_path <- system.file("extdata", "taxonomy.qza", package="file2meco")

Construct microtable object

qiime2meco(abund_file_path)

qiime2meco(abund_file_path, sample_table = sample_file_path, taxonomy_table = taxonomy_file_path)

Add phylogenetic tree and fasta for more demonstrations (download tree)

The file name is ‘tree.qza’; put it into the R working directory

tree_data <- "tree.qza"

Please download fasta

The file name is ‘dada2_rep_set.qza’; put it into the R working directory

rep_data <- "dada2_rep_set.qza"

test1 <- qiime2meco(abund_file_path, sample_table = sample_file_path, taxonomy_table = taxonomy_file_path, phylo_tree = tree_data, rep_fasta = rep_data, auto_tidy = TRUE)

test1

Qiime2 tutorial for Cris