Paul Campbell
Distinguishing between filaments using molecular tools
Updated: Nov 2, 2022

Molecular tools such as next-generation sequencing and qPCR can help us distinguish between filaments, but how, really, do we do that? How about a real-world example?
I collected the 16S rRNA gene sequences for several different wastewater filaments from the NCBI database. A single 16S rRNA gene (in this case, for S. natans), looks like this:
>Sphaerotilus natans
CAATCCTCGAGAGTGGCGAACGGGTGAGTAATACATCGGAACGTGCCCAGTCGTGGGGGATAACGTAGCGAAACTACGCTAATACCGCATACGACCCGAGGGTGAAAGCGGGGGACTCGCAAGAGCCTCGCGCGATTGGAGCGGCCGATGGCAGATTAGGTAGTTGGTGGGGTAAAGGCCCACCAAGCCTGCGATCTGTAGCTGGTCTGAGAGGCGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGAAAGCCTGATCCAGCCATACGCGTGCGGGAAGAAGGCTTCGGGTTGTAAACCGCTTTTGTCAGGGAAGAAATACTCCGGGCTAATACCCTGGGGTGATGACGGTACCTGAAGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTCTATAAGACAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGTGACTGTAGAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAATCCCCTGGACCTGTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGTCAACTGGTTGTTGGGAGGGTTTCTTCTCAGTAACGAAGAACGCGTGAAGTTGACCGCCTGGGGAGTACGGCCGCAAGTGAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGTGGATCGATGTGGTTTAATTCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGTCTGAAATCCTGCAGAGATGTGGGAGTGCTCGAAAGAGAATCAGAACACAGGTGCTGCATGGCCGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCATTAGTTGCTACGAAAGGGCACTCTAATGAGACTGCCGGTGACAAACCGAGGAAGGTGGGGATGACGTCAGGTCCTCATGGCCCTTATGGGTAGGGCTACACACGTCATACAATGGCCGGTACAGAGGGCTGCCAACCCGCGAGGGGGAGCCAATCCCAGAAAACCGGTCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGCGGATCAGCTTGCCGCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCATGGGAGCGGGTCTCGCCAGAAGTAGTTAGCCTAACCGC
This gene is about 1500 nucleotides long (meaning that there are approximately 1500 Gs, As, Ts and Cs, combined, in that long string of characters above). The cell knows how to use this gene sequence as a blueprint for making a key component of the protein synthesis machinery, the 16S (or small) subunit of the ribosome.
All bacteria have this 16S rRNA gene - they just don't look 100% exactly like this one. There are some very conserved regions (where most bacteria have the same gene sequences). These areas are really useful for microbial community analysis by 16S rRNA gene sequencing. Using some bioinformatics software, I aligned all the sequences that I downloaded from the NCBI database. The figure, below, shows a small section of this alignment, to highlight a conserved region.

A conserved region of the 16S rRNA genes of some known filaments.
Most of this section of the gene is highly conserved between some very diverse bacteria. Dark blue shows 100% conservation, light blue shows a majority conservation, and areas with white/clear text highlight where one or more bacteria differ from the conserved region (also called the consensus sequence).
Next up, a small chunk of a variable region. You can visually see some barely conserved sequence structure in light blue, but there's a lot of variability in the areas with a white background. The dashes show where the software added extra padding (gaps) to force other areas into alignment. So, not only are the variable regions variable in sequence, they have different lengths, too.

A variable region of the 16S rRNA genes of some known filaments.
These variable regions are really important for the work that we do. It is the differences in these regions that allow us to tell the difference between closely related species. For example, look at the variable regions for the four Thiothrix strains and you will see one or two nucleotides difference between them. That small difference is enough to help us match a DNA sequence to a genus - and often, to a species. Furthermore, we can use these variable regions to design qPCR assays that are incredibly specific, able to detect only a single species of Thiothrix, if that's our goal.