Qiime2 tutorial for Cris

QIIME2 Workflow with custom database


Hello! Today I’ll be your guide through the main workflow of Qiime2 (Which we use to attain ASVs).


🧬 Quick Introduction: Qiime2 and DADA2


Qiime2 is an open-source, modular, and extensible bioinformatics platform designed for microbiome and amplicon sequence data analysis. It provides a full pipeline from raw sequencing reads to final taxonomic and diversity analysis, and offers a powerful plugin system with reproducible tracking (“provenance”) of all steps.

Key features of Qiime2:

  • Integrates multiple tools (e.g., DADA2, Deblur, feature classifiers).
  • Tracks all parameters and steps for full reproducibility.
  • Outputs shareable artifacts (.qza) and visualizations (.qzv).
  • Community-supported plugins and standard formats.

DADA2, on the other hand, is an independent algorithm (and also an R package) specialized in high-resolution denoising of amplicon data. It performs:

  • Error correction of raw sequences.
  • Exact sequence variant inference (ASVs), avoiding arbitrary OTU clustering.
  • Chimera removal.

Main differences:

  • Qiime2 is a full pipeline platform; DADA2 is mainly an error-correction and ASV inference tool.
  • In Qiime2, you can use DADA2 as one module (via qiime dada2 denoise-* commands), integrating it into a broader pipeline including demultiplexing, taxonomy assignment, diversity analyses, etc.
  • DADA2 can also be used separately in R, giving more flexibility for custom error models and filtering.


In this workflow, we use DADA2 inside Qiime2 for denoising and ASV generation, and then continue the analysis in Qiime2, followed by export to R (microeco).


📁 1️⃣ Prepare folder and files

Example structure:

We used the LBCM 5.0 server to do this analysis. Files are stored in:


/…/Cris/
├── reads/
│ ├── F26_1.fq.gz
│ ├── F26_2.fq.gz
│ ├── …
│ └── sample_sheet.txt
├── MaarjAM/
│ ├── maarjam_database_onlyITS.qiime.fasta
│ └── maarjam_database_onlyITS.qiime.txt


💻 2️⃣ Import reads into Qiime2


qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path /Dados2/Cris/reads/sample_sheet.txt \
  --output-path demux-paired-end.qza \
  --input-format PairedEndFastqManifestPhred33V2


🗂️ 📄 Manifest file format

Your sample_sheet.txt file should be a tab-separated text file with three columns, formatted exactly as shown below.

sample-id forward-absolute-filepath reverse-absolute-filepath
F26 /Dados2/Cris/reads/F26_1.fq.gz /Dados2/Cris/reads/F26_2.fq.gz
F27 /Dados2/Cris/reads/F27_1.fq.gz /Dados2/Cris/reads/F27_2.fq.gz
F28 /Dados2/Cris/reads/F28_1.fq.gz /Dados2/Cris//reads/F28_2.fq.gz


Important:

  • Must be tab-separated, not spaces or commas.
  • File paths must be absolute (starting with /)
  • The header line must match exactly as shown.


🔎 3️⃣ Summarize and check quality


qiime demux summarize \
  --i-data demux-paired-end.qza \
  --o-visualization demux-summary.qzv


✅ Open demux-summary.qzv at https://view.qiime2.org and check where quality drops (we chose ~210 bp).


✂️ 4️⃣ Denoise with DADA2


Use DADA2 to denoise, merge, and infer ASVs (Amplicon Sequence Variants).

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux-paired-end.qza \
  --p-trim-left-f 0 \
  --p-trim-left-r 0 \
  --p-trunc-len-f 210 \
  --p-trunc-len-r 210 \
  --o-table table.qza \
  --o-representative-sequences rep-seqs.qza \
  --o-denoising-stats denoising-stats.qza


✅ 5️⃣ Summarize denoising outputs

These commands help you check how many reads were kept and examine your ASV table and sequences.

qiime feature-table summarize \
  --i-table table.qza \
  --o-visualization table.qzv \
  --m-sample-metadata-file /Dados/Cris/Ampliseq/reads/sample_sheet.txt
qiime feature-table tabulate-seqs \
  --i-data rep-seqs.qza \
  --o-visualization rep-seqs.qzv
qiime metadata tabulate \
  --m-input-file denoising-stats.qza \
  --o-visualization denoising-stats.qzv


  • Pro tip: Open .qzv files at https://view.qiime2.org and check the interactive plots for quality control

🧬 6️⃣ Import MaarjAM ITS database and train classifier


Import the custom ITS MaarjAM reference sequences and taxonomy, then train a Naive Bayes classifier.

qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path MaarjAM/maarjam_database_onlyITS.qiime.fasta \
  --output-path maarjam_ref_seqs.qza
qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path MaarjAM/maarjam_database_onlyITS.qiime.txt \
  --output-path maarjam_ref_taxonomy.qza
qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads maarjam_ref_seqs.qza \
  --i-reference-taxonomy maarjam_ref_taxonomy.qza \
  --o-classifier MaarjAM_classifier.qza


  • As we are using a naives bayes classifier, it will try to force every ASV into a classification. As the MaarjAM database only contains AMF taxonomy, all ITS will be forcefully classified. Results might be misleading if not cerefully curated


🏷️ 7️⃣ Assign taxonomy to ASVs


Classify your representative sequences using the trained classifier.

qiime feature-classifier classify-sklearn \
  --i-classifier MaarjAM_classifier.qza \
  --i-reads rep-seqs.qza \
  --o-classification taxonomy.qza
qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --o-visualization taxonomy.qzv


  • ✅ Check taxonomy.qzv to inspect taxonomic assignments.


📊 8️⃣ Create taxonomy bar plots


Visualize the relative abundance of AMF (and other possible assignments) across samples and check for errors.

qiime taxa barplot \
  --i-table table.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file /Dados/Cris/Ampliseq/reads/sample_sheet.txt \
  --o-visualization taxa-bar-plots.qzv


💾 9️⃣ Export outputs for R (microeco)


Export data for further ecological analysis and custom plots in R.

qiime tools export \
  --input-path table.qza \
  --output-path exported-feature-table
biom convert \
  -i exported-feature-table/feature-table.biom \
  -o exported-feature-table/feature-table.tsv \
  --to-tsv
qiime tools export \
  --input-path rep-seqs.qza \
  --output-path exported-rep-seqs
 qiime tools export \
  --input-path taxonomy.qza \
  --output-path exported-taxonomy


The qiime2meco() function in R is designed to create the microtable object using files from QIIME2 (Bolyen et al. 2019).

Use data files inside the package which were downloaded from pd-mice.

library(file2meco)

abund_file_path <- system.file("extdata", "dada2_table.qza", package="file2meco")

Tsv file of metadata

sample_file_path <- system.file("extdata", "sample-metadata.tsv", package="file2meco")

taxonomy_file_path <- system.file("extdata", "taxonomy.qza", package="file2meco")

Construct microtable object

qiime2meco(abund_file_path)

qiime2meco(abund_file_path, sample_table = sample_file_path, taxonomy_table = taxonomy_file_path)

Add phylogenetic tree and fasta for more demonstrations (download tree)

The file name is ‘tree.qza’; put it into the R working directory

tree_data <- "tree.qza"

Please download fasta

The file name is ‘dada2_rep_set.qza’; put it into the R working directory

rep_data <- "dada2_rep_set.qza"

test1 <- qiime2meco(abund_file_path, sample_table = sample_file_path, taxonomy_table = taxonomy_file_path, phylo_tree = tree_data, rep_fasta = rep_data, auto_tidy = TRUE)

test1



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • 1st NOMAD Summer School on novel analytical techniques for organic matter research
  • Researchers from the École Centrale de Lyon conduct Stable Isotope Probing training in CENA/USP
  • International Research Stay in the Gleixner group at the Max Planck Institute for Biogeochemistry
  • 19th International Symposium on Microbial Ecology (ISME)
  • 5th Plant Microbiome Symposium