As the promise of genomics—early identification of disease risk, targeted treatment and better outcomes—has grown, so too have stores of publicly available genetic data. These data are voluminous and complex; so much so that scientists and health professionals, the very people who should be able to make the most use of them, often cannot. Dr. Levi Waldron is working to change that.

For more than 10 years, the Associate Professor of Biostatistics at CUNY’s Institute for Implementation Science for Population Health has been pioneering mathematical methods to analyze vast public treasure troves of genomics data, generating new insights into human health, disease, and treatment. Using computer-driven algorithims, he is unraveling complicated genomics information to learn how the cell-changing dynamics between genes and the environment lead to illness; why some people but not others get sick or respond to treatment; and whether race, ethnicity or other socioeconomic indicators influence health outcomes. What’s more, he is providing free download access to these algorithims and curated databases so that more scientists and health professionals can harness this genomics data to develop more precise diagnoses and treatments, especially of cancers, and even eliminate health disparities.

Waldron, 45, actually began his career in wood physics. After earning his Ph.D. from the University of Toronto, where he researched the leaching of toxic wood preservatives, he decided to focus his fascination with genomics on cancer. During his postdoctoral studies at the University of Toronto and later at Harvard University’s School of Public Health, he studied cancer genomics.

What Waldron found as he pored over large public repositories of genetic information, mostly from tumor samples from hospitals around the country, was an overwhelming amount of data that lacked standardization.

“The data have been contributed by thousands of different scientists in thousands of different ways,” he begins. “They come from different technologies, have inconsistent structures, are annotated differently and are then put into repositories for scientists where they often remain sitting because they’re too difficult for most people to find, let alone use.”

For most scientists to be able to use the “thousands of freely available genomic and metagenomic sequencing profiles,” he continues, “they have to be able handle terabytes of data, and have access to computational resources and bioinformatics expertise. And, to combine and compare studies, they have to be able to standardize the way basic information is recorded.”

Waldron’s mission became clear: to close the gap between data availability and usability. In one project, the self-described bioinformatician and his team at CUNY have processed almost 10,000 microbiome profiles in a standardized, free, user-friendly software package—published in Nature Methods—that allows scientists to immediately find and analyze data to study a variety of conditions, including mother-to-infant microbial transmissions, obesity, acne, and colorectal cancer.

“We’re making data systemically more well behaved in a way that scientists can analyze it, and other software developers can make other new tools from it,” he says.

In another of his current projects, which he began in 2013 during his postdoctoral days at Harvard, Waldron and his team are developing software to simplify the analysis of complex genomic data from a National Cancer Institute study called The Cancer Genome Atlas.

“The study remains challenging for many scientists to use because of its complexity,” he says, referring to the collection of multiple genomic assays—10 to 15 types of genomic data for each of 33 cancer types—sampled from 11,000 patients. “We’re working on a simplified representation of the study.”

In particular, Waldron has been concentrating on understanding the role that genes play in forming subtypes of high-grade serous ovarian carcinoma and colorectal cancers, and in influencing patient outcomes. By identifying genetic subtypes of cancer, Waldron is seeking information not only about the ways in which cells become cancerous, but also why people with apparently similar tumors can respond differently to treatment.

“You can imagine that two people’s disease might have different causes, courses and outcomes but you don’t know any of that just by looking at cells under microscope, so we’re studying genomes of those cells to understand more,” he says.

The aim of this increased understanding is to better identify risk factors for particular cancers, develop more accurate prognoses and targeted therapies, and predict outcomes.

“With a better ability to understand more about the nature of these subtypes, we hope to better identify the different causes of disease and improve treatments.”
In addition to publishing profiles of cancer genomic data, Waldron and his team, in collaboration with the New York City Department of Health and Mental Hygiene, have been profiling oral microbiome data from its Health and Nutrition Examination Survey (NYC HANES). By analyzing saliva samples from a racially and ethnically diverse sample of NYC adults, he is seeking greater understanding of diabetes, obesity and inflammatory markers, all indicators of poor health. He is also looking to understand how exposure to cigarette, hookah, e-cigarette, and secondhand smoke affect the normal bacteria of the oral cavity and by extension, overall health.

“Identifying microbial risk factors, including where risky microbes come from, or what allows them to thrive in some people’s guts, could have an implication for early detection and prevention of disease.”

A dental plaque microbiome visualization from 2016 published in PNAS. The “hedgehog” structure in human dental plaque, hybridized with a set of 10 probes, each labeled with a different fluorophore. Image: Courtesy of Jessica Welch, Marine Biological Laboratory, and Gary G. Borisy, The Forsyth Institute.

What makes this project especially valuable public health-wise are the insights it gleans on how the oral microbiome varies not only by habits such as smoking or mouthwash use but also by race, ethnicity, and other socioeconomic factors that could influence health outcomes and disparities.
“One finding is that there is more variation in the oral microbiome with respect to socioeconomic status than individual oral health behaviors,” Waldron says. “Oral microbiome variation is consistent with some known health inequalities with respect to race and ethnicity.”

Such inequalities are present in health as well as healthcare access and research, Waldron says. “In genomic studies of ovarian cancer, for example, African-American women are usually underrepresented, even though they’re overrepresented in prevalence of the disease and poor outcomes.”

He attributes this underrepresentation in part to the tendency of academic research centers to capture study participants from their immediate—often economically privileged—vicinities, which typically do not represent the whole population.

“It happens across so many kinds of diseases,” he says. “It’s a pernicious effect because it affects our understanding of diseases and the development of therapies.”

One way in which Waldron has attempted to make health outcomes research more representative is by combining data from underrepresented populations, despite small sample sizes.

“By combining underrepresented populations across multiple studies in meta-analyses, we tend to get a much better overall picture than you ever could from individual studies,” he says.

Waldron acknowledges that it is too early to see new treatments based on the databases that he and his team have published. Still, they are making headway in identifying early markers of elevated risk for colorectal cancer, and some scientists who have used the database have proposed alternative therapies for conditions such as acne.

“Because our databases are free and open, they leverage the creativity of other scientists to take it in directions I could never have anticipated,” he says. In the end, this is what he wants his work to do.

“My goal is to learn everything we can from genomic research to improve health outcomes.”