From The Labs

Meet Squeegee: A new contamination detection tool for low microbial biomass microbiomes

One of the major challenges in microbiome science has been distinguishing what is a potential environmental contaminant from a true, bona fide microbiome signal in low biomass studies – studies of human tissues or fluids that contain little microbial DNA, like breastmilk, placenta or amniotic fluid. For instance, it can be challenging to differentiate the DNA of a microbe from remnant contaminant DNA from a sampling kit or extraction kit or the environment.

Although microbes inhabit just about every part of the human body outnumbering human cells by ten to one, some tissues and fluids, such as breastmilk, placenta and amniotic fluid, typically have very few microbes. Image credit: Darryl Leja, National Human Genome Research Institute.

While researchers normally include negative controls from the equipment or environment and use algorithmic tools to identify microorganisms present in the environment, not all datasets come with negative controls. Researchers at Baylor College of Medicine and Rice University developed a new contamination detection tool to establish reproducibility in the identification and analysis of the microbes. Their findings were published in Nature Communications.

Dr. Kjersti Aagaard

“We teamed up with our collaborators at Rice University to develop and test a computational tool we called Squeegee,” said Dr. Kjersti Aagaard, professor of obstetrics and gynecology at Baylor and Texas Children’s Hospital. “The premise of Squeegee is that we can use a computer analysis pipeline to help us detect ‘breadcrumbs’ of contaminants that would be anticipated to be common between the microbiome found in all human (or other mammalian) hosts and the sampling or lab environment.”

Dr. Todd Treangen/Rice University

The Aagaard Lab at Baylor has conducted IRB-approved and NIH-funded research over the last decade leading to a number of rich datasets from a large number of participants that are particularly low biomass and have many negative controls. They teamed up with researchers at Rice’s Treangen Lab to test Squeegee, an algorithm used on life datasets from human studies that had contamination controls from different environments and DNA extraction kits. They looked at the false positive rate, the recall and how accurately Squeegee could predict and flag these environmental contamination sets with the absence of the negative control.

Dr. Michael D. Jochum

We were able to show that Squeegee was capable of having a high-weighted recall and a very low false-positive rate in these ground truth datasets,” said co-author Dr. Michael Jochum, postdoctoral research associate in the Department of Obstetrics and Gynecology Baylor.

According to Jochum, Squeegee improves the overall reliability of metagenomic sequencing analysis results in low biomass studies. The new contamination identification tool is capable of identifying batch effects, flagging them as potential contaminants. Given the focus and expertise of the Aagaard lab in studying these sparse microbial environments, this is a tool that they have added to their toolbox for ongoing and future studies.

Squeegee is a first-of-its-kind tool for the microbiome science community, and it is freely available for use,” Aagaard said. The source code for Squeegee is publicly available at

Other contributors to this work include Dr. Yunxi Liu, Dr. R.A. Leo Elworth at Rice University.

This work was funded by National Institutes of Health and the National Science Foundation.


By Homa Shalchi


Receive From the Labs via email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Leave a Reply

Your email address will not be published. Required fields are marked *