Qiime2 tutorial for Cris
QIIME2 Workflow with custom database
Hello! Today I’ll be your guide through the main workflow of Qiime2 (Which we use to attain ASVs).
🧬 Quick Introduction: Qiime2 and DADA2
Qiime2 is an open-source, modular, and extensible bioinformatics platform designed for microbiome and amplicon sequence data analysis. It provides a full pipeline from raw sequencing reads to final taxonomic and diversity analysis, and offers a powerful plugin system with reproducible tracking (“provenance”) of all steps.
Key features of Qiime2:
- Integrates multiple tools (e.g., DADA2, Deblur, feature classifiers).
- Tracks all parameters and steps for full reproducibility.
- Outputs shareable artifacts (
.qza
) and visualizations (.qzv
). - Community-supported plugins and standard formats.
DADA2, on the other hand, is an independent algorithm (and also an R package) specialized in high-resolution denoising of amplicon data. It performs:
- Error correction of raw sequences.
- Exact sequence variant inference (ASVs), avoiding arbitrary OTU clustering.
- Chimera removal.
Main differences:
- Qiime2 is a full pipeline platform; DADA2 is mainly an error-correction and ASV inference tool.
- In Qiime2, you can use DADA2 as one module (via
qiime dada2 denoise-*
commands), integrating it into a broader pipeline including demultiplexing, taxonomy assignment, diversity analyses, etc. - DADA2 can also be used separately in R, giving more flexibility for custom error models and filtering.
In this workflow, we use DADA2 inside Qiime2 for denoising and ASV generation, and then continue the analysis in Qiime2, followed by export to R (microeco).
📁 1️⃣ Prepare folder and files
Example structure:
We used the LBCM 5.0 server to do this analysis. Files are stored in:
/…/Cris/
├── reads/
│ ├── F26_1.fq.gz
│ ├── F26_2.fq.gz
│ ├── …
│ └── sample_sheet.txt
├── MaarjAM/
│ ├── maarjam_database_onlyITS.qiime.fasta
│ └── maarjam_database_onlyITS.qiime.txt
💻 2️⃣ Import reads into Qiime2
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path /Dados2/Cris/reads/sample_sheet.txt \
--output-path demux-paired-end.qza \
--input-format PairedEndFastqManifestPhred33V2
🗂️ 📄 Manifest file format
Your sample_sheet.txt
file should be a tab-separated text file with three columns, formatted exactly as shown below.
sample-id | forward-absolute-filepath | reverse-absolute-filepath |
---|---|---|
F26 | /Dados2/Cris/reads/F26_1.fq.gz | /Dados2/Cris/reads/F26_2.fq.gz |
F27 | /Dados2/Cris/reads/F27_1.fq.gz | /Dados2/Cris/reads/F27_2.fq.gz |
F28 | /Dados2/Cris/reads/F28_1.fq.gz | /Dados2/Cris//reads/F28_2.fq.gz |
✅ Important:
- Must be tab-separated, not spaces or commas.
- File paths must be absolute (starting with
/
) - The header line must match exactly as shown.
🔎 3️⃣ Summarize and check quality
qiime demux summarize \
--i-data demux-paired-end.qza \
--o-visualization demux-summary.qzv
✅ Open demux-summary.qzv
at https://view.qiime2.org and check where quality drops (we chose ~210 bp).
✂️ 4️⃣ Denoise with DADA2
Use DADA2 to denoise, merge, and infer ASVs (Amplicon Sequence Variants).
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-paired-end.qza \
--p-trim-left-f 0 \
--p-trim-left-r 0 \
--p-trunc-len-f 210 \
--p-trunc-len-r 210 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza
✅ 5️⃣ Summarize denoising outputs
These commands help you check how many reads were kept and examine your ASV table and sequences.
qiime feature-table summarize \
--i-table table.qza \
--o-visualization table.qzv \
--m-sample-metadata-file /Dados/Cris/Ampliseq/reads/sample_sheet.txt
qiime feature-table tabulate-seqs \
--i-data rep-seqs.qza \
--o-visualization rep-seqs.qzv
qiime metadata tabulate \
--m-input-file denoising-stats.qza \
--o-visualization denoising-stats.qzv
- Pro tip: Open
.qzv
files at https://view.qiime2.org and check the interactive plots for quality control
🧬 6️⃣ Import MaarjAM ITS database and train classifier
Import the custom ITS MaarjAM reference sequences and taxonomy, then train a Naive Bayes classifier.
qiime tools import \
--type 'FeatureData[Sequence]' \
--input-path MaarjAM/maarjam_database_onlyITS.qiime.fasta \
--output-path maarjam_ref_seqs.qza
qiime tools import \
--type 'FeatureData[Taxonomy]' \
--input-format HeaderlessTSVTaxonomyFormat \
--input-path MaarjAM/maarjam_database_onlyITS.qiime.txt \
--output-path maarjam_ref_taxonomy.qza
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads maarjam_ref_seqs.qza \
--i-reference-taxonomy maarjam_ref_taxonomy.qza \
--o-classifier MaarjAM_classifier.qza
- As we are using a naives bayes classifier, it will try to force every ASV into a classification. As the MaarjAM database only contains AMF taxonomy, all ITS will be forcefully classified. Results might be misleading if not cerefully curated
🏷️ 7️⃣ Assign taxonomy to ASVs
Classify your representative sequences using the trained classifier.
qiime feature-classifier classify-sklearn \
--i-classifier MaarjAM_classifier.qza \
--i-reads rep-seqs.qza \
--o-classification taxonomy.qza
qiime metadata tabulate \
--m-input-file taxonomy.qza \
--o-visualization taxonomy.qzv
- ✅ Check
taxonomy.qzv
to inspect taxonomic assignments.
📊 8️⃣ Create taxonomy bar plots
Visualize the relative abundance of AMF (and other possible assignments) across samples and check for errors.
qiime taxa barplot \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--m-metadata-file /Dados/Cris/Ampliseq/reads/sample_sheet.txt \
--o-visualization taxa-bar-plots.qzv
💾 9️⃣ Export outputs for R (microeco)
Export data for further ecological analysis and custom plots in R.
qiime tools export \
--input-path table.qza \
--output-path exported-feature-table
biom convert \
-i exported-feature-table/feature-table.biom \
-o exported-feature-table/feature-table.tsv \
--to-tsv
qiime tools export \
--input-path rep-seqs.qza \
--output-path exported-rep-seqs
qiime tools export \
--input-path taxonomy.qza \
--output-path exported-taxonomy
The qiime2meco() function in R is designed to create the microtable object using files from QIIME2 (Bolyen et al. 2019).
Use data files inside the package which were downloaded from pd-mice.
library(file2meco)
abund_file_path <- system.file("extdata", "dada2_table.qza", package="file2meco")
Tsv file of metadata
sample_file_path <- system.file("extdata", "sample-metadata.tsv", package="file2meco")
taxonomy_file_path <- system.file("extdata", "taxonomy.qza", package="file2meco")
Construct microtable object
qiime2meco(abund_file_path)
qiime2meco(abund_file_path, sample_table = sample_file_path, taxonomy_table = taxonomy_file_path)
Add phylogenetic tree and fasta for more demonstrations (download tree)
The file name is ‘tree.qza’; put it into the R working directory
tree_data <- "tree.qza"
Please download fasta
The file name is ‘dada2_rep_set.qza’; put it into the R working directory
rep_data <- "dada2_rep_set.qza"
test1 <- qiime2meco(abund_file_path, sample_table = sample_file_path, taxonomy_table = taxonomy_file_path, phylo_tree = tree_data, rep_fasta = rep_data, auto_tidy = TRUE)
test1
Enjoy Reading This Article?
Here are some more articles you might like to read next: