Key Publications in Genomic Epidemiology
A selection of important publications in genomic epidemiology, organized by topic. Click a category to explore papers.
Surveillance Programs
Papers describing the establishment and evaluation of national and institutional genomic surveillanc...
Methods
Method papers covering advances in cluster detection, bioinformatics methods, etc.
Infrastructure
Technical infrastructure, platforms, and tools for genomic surveillance.
Visualization Tools
Data visualization tools and platforms for genomic epidemiology, including phylogenetic viewers, out...
Healthcare-Associated Infections
Applications of genomic epidemiology for outbreak detection, investigation, and control in healthcar...
Pathogen Specific Papers
Applications of genomic epidemiology to specific pathogens and outbreak investigations.
Surveillance Programs
Back to topPapers describing the establishment and evaluation of national and institutional genomic surveillance programs.
Integrating Genomic Data into Public Health Surveillance for Multidrug-Resistant Organisms, Washington, USA
Torres L, Johnson J, Valentine A, et al.
2025.
Abstract
Mitigating antimicrobial resistance (AMR) is a public health priority to preserve antimicrobial treatment options. The Washington State Department of Health in Washington, USA, piloted a process to leverage longitudinal genomic surveillance on the basis of whole-genome sequencing (WGS) and a genomics-first cluster definition to enhance AMR surveillance. Here, we outline the approach to collaborative surveillance and describe the pilot using 6 carbapenemase-producing organism outbreaks of 3 species: Pseudomonas aeruginosa, Acinetobacter baumannii, and Klebsiella pneumoniae. We also highlight how we applied the approach to an emerging outbreak. We found that genomic and epidemiologic data define highly congruent outbreaks. By layering genomic and epidemiologic data, we refined linkage hypotheses and addressed gaps in traditional epidemiologic surveillance. With the accessibility of WGS, public health agencies must leverage new approaches to modernize surveillance for communicable diseases.
Ten recommendations for supporting open pathogen genomic analysis in public health
Black A, MacCannell D, Sibley T, et al.
Nature Medicine. 2020. 26(6). 832-841.
Abstract
Increasingly, public-health agencies are using pathogen genomic sequence data to support surveillance and epidemiological investigations. As access to whole-genome sequencing has grown, greater amounts of molecular data have helped improve the ability to detect and track outbreaks of diseases such as COVID-19, investigate transmission chains and explore large-scale population dynamics, such as the spread of antibiotic resistance. However, the wide adoption of whole-genome sequencing also poses new challenges for public-health agencies that must adapt to support a new set of expertise, which means that the capacity to perform genomic data assembly and analysis has not expanded as widely as the adoption of sequencing itself. In this Perspective, we make recommendations for developing an accessible, unified informatic ecosystem to support pathogen genomic analysis in public-health agencies across income settings. We hope that the creation of this ecosystem will allow agencies to effectively and efficiently share data, workflows and analyses and thereby increase the reproducibility, accessibility and auditability of pathogen genomic analysis while also supporting agency autonomy.
Strengthening pathogen genomic surveillance for health emergencies: insights from the World Health Organization’s regional initiatives
Akande O, Carter L, Abubakar A, et al.
Frontiers in Public Health. 2023. 11.
Abstract
The onset of the COVID-19 pandemic triggered a rapid scale-up in the use of genomic surveillance as a pandemic preparedness and response tool. As a result, the number of countries with in-country SARS-CoV-2 genomic sequencing capability increased by 40% from February 2021 to July 2022. The Global Genomic Surveillance Strategy for Pathogens with Pandemic and Epidemic Potential 2022–2032 was launched by the World Health Organization (WHO) in March 2022 to bring greater coherence to ongoing work to strengthen genomic surveillance. This paper describes how WHO’s tailored regional approaches contribute to expanding and further institutionalizing the use of genomic surveillance to guide pandemic preparedness and response measures as part of a harmonized global undertaking. Challenges to achieving this vision include difficulties obtaining genomic sequencing equipment and supplies, shortages of skilled staff, and obstacles to maximizing the utility of genomic data to inform risk assessment and public health action. WHO is helping to overcome these challenges in collaboration with partners through a harmonized programme of work at the global, regional, and country levels. Through its global headquarters, six regional offices, and 153 country offices, WHO is providing support for country-driven efforts to strengthen genomic surveillance in its 194 Member States, with activities reflecting regional specificities. WHO’s regional offices serve as platforms for those countries in their respective regions to share resources and knowledge, engage stakeholders in ways that reflect national and regional priorities, and develop regionally aligned approaches to implementing and sustaining genomic surveillance within public health systems.
Pathogen genomics in public health laboratories: successes, challenges, and lessons learned from California’s SARS-CoV-2 Whole-Genome Sequencing Initiative, California COVIDNet
Smith E, Libuit K, Kapsak C, et al.
Microbial Genomics. 2023. 9(6). 001027.
Abstract
The capacity for pathogen genomics in public health expanded rapidly during the coronavirus disease 2019 (COVID-19) pandemic, but many public health laboratories did not have the infrastructure in place to handle the vast amount of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequence data generated. The California Department of Public Health, in partnership with Theiagen Genomics, was an early adopter of cloud-based resources for bioinformatics and genomic epidemiology, resulting in the creation of a SARS-CoV-2 genomic surveillance system that combined the efforts of more than 40 sequencing laboratories across government, academia and industry to form California COVIDNet, California’s SARS-CoV-2 Whole-Genome Sequencing Initiative. Open-source bioinformatics workflows, ongoing training sessions for the public health workforce, and automated data transfer to visualization tools all contributed to the success of California COVIDNet. While challenges remain for public health genomic surveillance worldwide, California COVIDNet serves as a framework for a scaled and successful bioinformatics infrastructure that has expanded beyond SARS-CoV-2 to other pathogens of public health importance,
Pathogen genomic surveillance status among lower resource settings in Asia
Getchell M, Wulandari S, de Alwis R, et al.
Nature Microbiology. 2024. 9(10). 2738-2747.
Abstract
Asia remains vulnerable to new and emerging infectious diseases. Understanding how to improve next generation sequencing (NGS) use in pathogen surveillance is an urgent priority for regional health security. Here we developed a pathogen genomic surveillance assessment framework to assess capacity in low-resource settings in South and Southeast Asia. Data collected between June 2022 and March 2023 from 42 institutions in 13 countries showed pathogen genomics capacity exists, but use is limited and under-resourced. All countries had NGS capacity and seven countries had strategic plans integrating pathogen genomics into wider surveillance efforts. Several pathogens were prioritized for human surveillance, but NGS application to environmental and human–animal interface surveillance was limited. Barriers to NGS implementation include reliance on external funding, supply chain challenges, trained personnel shortages and limited quality assurance mechanisms. Coordinated efforts are required to support national planning, address capacity gaps, enhance quality assurance and facilitate data sharing for decision making.
Implementing a national programme of pathogen genomics for public health: the Australian Pathogen Genomics Program (AusPathoGen)
Webb J, Andersson P, Sim E, et al.
The Lancet Microbe. 2025. 6(3).
Sentinel Surveillance System Implementation and Evaluation for SARS-CoV-2 Genomic Data, Washington, USA, 2020–2021
Oltean H, Allen K, Frisbie L, et al.
Emerging Infectious Diseases. 2023. 29(2). 242–251.
Methods
Back to topMethod papers covering advances in cluster detection, bioinformatics methods, etc.
A comparison of short- and long-read whole-genome sequencing for microbial pathogen epidemiology
Schiffer A, Rahman A, Sutton W, et al.
mSystems. 2025. 10(12). e01426-25.
Threshold-free genomic cluster detection to track transmission pathways in health-care settings: a genomic epidemiology analysis
Hawken S, Yelin R, Lolans K, et al.
The Lancet Microbe. 2022. 3(9). e652-e662.
Parameters for one health genomic surveillance of Escherichia coli from Australia
Watt A, Cummins M, Donato C, et al.
Nature Communications. 2025. 16(1). 17.
Abstract
Genomics is a cornerstone of modern pathogen epidemiology yet demonstrating transmission in a One Health context is challenging, as strains circulate and evolve within and between diverse hosts and environments. To identify phylogenetic linkages and better define relevant measures of genomic relatedness in a One Health context, we collated 5471 Escherichia coli genome sequences from Australia originating from humans (n = 2996), wild animals (n = 870), livestock (n = 649), companion animals (n = 375), environmental sources (n = 292) and food (n = 289) spanning over 36 years. Of the 827 multi-locus sequence types (STs) identified, 10 STs were commonly associated with cross-source genomic clusters, including the highly clonal ST131, pandemic zoonotic lineages such as ST95, and emerging human ExPEC ST1193. Here, we show that assessing genomic relationships at ≤ 100 SNP threshold enabled detection of cross-source linkage otherwise obscured when applying typical outbreak-oriented relatedness thresholds ( ≤ 20 SNPs) and should be considered in interrogation of One Health genomic datasets.
Genomic Epidemiology for Estimating Pathogen Burden in a Population - Volume 31, Supplement—April 2025 - Emerging Infectious Diseases journal - CDC
Porter W, Engelthaler D, Hepp C
2025.
Abstract
Genomic Epidemiology to Estimate Pathogen Burden
Challenges and considerations for whole-genome-based antimicrobial resistance plasmid investigations
Beh J, Wick R, Howden B, et al.
Antimicrobial Agents and Chemotherapy. 2025. 69(12). e01097-25.
Infrastructure
Back to topTechnical infrastructure, platforms, and tools for genomic surveillance.
Galaxy @Sciensano: a comprehensive bioinformatics portal for genomics-based microbial typing, characterization, and outbreak detection
Bogaerts B, Van Braekel J, Van Uffelen A, et al.
BMC Genomics. 2025. 26(1). 20.
Abstract
The influx of whole genome sequencing (WGS) data in the public health and clinical diagnostic sectors has created a need for data analysis methods and bioinformatics expertise, which can be a bottleneck for many laboratories. At Sciensano, the Belgian national public health institute, an intuitive and user-friendly bioinformatics tool portal was implemented using Galaxy, an open-source platform for data analysis and workflow creation. The Galaxy @Sciensano instance is available to both internal and external scientists and offers a wide range of tools provided by the community, complemented by over 50 custom tools and pipelines developed in-house. The tool selection is currently focused primarily on the analysis of WGS data generated using Illumina sequencing for microbial pathogen typing, characterization and outbreak detection, but it also addresses specific use cases for other data types. Our Galaxy instance includes several custom-developed 'push-button' pipelines, which are user-friendly and intuitive stand-alone tools that perform complete characterization of bacterial isolates based on WGS data and generate interactive HTML output reports with key findings. These pipelines include quality control, de novo assembly, sequence typing, antimicrobial resistance prediction and several relevant species-specific assays. They are tailored for pathogens with active genomic surveillance programs, and clinical relevance, such as Escherichia coli, Listeria monocytogenes, Salmonella spp. and Mycobacterium tuberculosis. These tools and pipelines utilize internationally recognized databases such as PubMLST, EnteroBase, and the NCBI National Database of Antibiotic Resistant Organisms, which are automatically synchronized on a regular basis to ensure up-to-date results. Many of these pipelines are part of the routine activities of Belgian national reference centers and laboratories, some of which use them under ISO accreditation. This resource is publicly available for noncommercial use at https://galaxy.sciensano.be/and can help other laboratories establish reliable, traceable and reproducible bioinformatics analyses for pathogens encountered in public health settings.
Advantages of Software Containerization in Public Health Infectious Disease Genomic Surveillance - Volume 31, Supplement—April 2025 - Emerging Infectious Diseases journal - CDC
Florek K, Young E, Incekara K, et al.
2025.
Abstract
Software Containerization in Disease Surveillance
Accelerating bioinformatics implementation in public health
Libuit K, Doughty E, Otieno J, et al.
Microbial Genomics. 2023. 9(7). 001051.
Abstract
We have adopted an open bioinformatics ecosystem to address the challenges of bioinformatics implementation in public health laboratories (PHLs). Bioinformatics implementation for public health requires practitioners to undertake standardized bioinformatic analyses and generate reproducible, validated and auditable results. It is essential that data storage and analysis are scalable, portable and secure, and that implementation of bioinformatics fits within the operational constraints of the laboratory. We address these requirements using Terra, a web-based data analysis platform with a graphical user interface connecting users to bioinformatics analyses without the use of code. We have developed bioinformatics workflows for use with Terra that specifically meet the needs of public health practitioners. These Theiagen workflows perform genome assembly, quality control, and characterization, as well as construction of phylogeny for insights into genomic epidemiology. Additonally, these workflows use open-source containerized software and the WDL workflow language to ensure standardization and interoperability with other bioinformatics solutions, whilst being adaptable by the user. They are all open source and publicly available in Dockstore with the version-controlled code available in public GitHub repositories. They have been written to generate outputs in standardized file formats to allow for further downstream analysis and visualization with separate genomic epidemiology software. Testament to this solution meeting the requirements for bioinformatic implementation in public health, Theiagen workflows have collectively been used for over 5 million sample analyses in the last 2 years by over 90 public health laboratories in at least 40 different countries. Continued adoption of technological innovations and development of further workflows will ensure that this ecosystem continues to benefit PHLs.
Visualization Tools
Back to topData visualization tools and platforms for genomic epidemiology, including phylogenetic viewers, outbreak dashboards, and interactive visualization systems.
TreeViewer: Flexible, modular software to visualise and manipulate phylogenetic trees - Bianchini - 2024 - Ecology and Evolution - Wiley Online Library
Abstract
Phylogenetic trees illustrate evolutionary relationships between taxa or genes. Tree figures are crucial when presenting results and data, and by creating clear and effective plots, researchers can describe many kinds of evolutionary patterns. However, producing tree plots can be a time-consuming task, especially as multiple different programs are often needed to adjust and illustrate all data associated with a tree. We present TreeViewer, a new software to draw phylogenetic trees. TreeViewer is flexible, modular, and user-friendly. Plots are produced as the result of a user-defined pipeline, which can be finely customised and easily applied to different trees. Every feature of the program is documented and easily accessible, either in the online manual or within the program's interface. We show how TreeViewer can be used to produce publication-ready figures, saving time by not requiring additional graphical post-processing tools. TreeViewer is freely available for Windows, macOS, and Linux operating systems and distributed under an AGPLv3 licence from https://treeviewer.org. It has a graphical user interface (GUI), as well as a command-line interface, which is useful to work with very large trees and for automated pipelines. A detailed user manual with examples and tutorials is also available. TreeViewer is mainly aimed at users wishing to produce highly customised, publication-quality tree figures using a single GUI software tool. Compared to other GUI tools, TreeViewer offers a richer feature set and a finer degree of customisation. Compared to command-line-based tools and software libraries, TreeViewer's graphical interface is more accessible. The flexibility of TreeViewer's approach to phylogenetic tree plotting enables the program to produce a wide variety of publication-ready figures. Users are encouraged to create their own custom modules to expand the functionalities of the program. This sets the scene for an ever-expanding and ever-adapting software framework that can easily adjust to respond to new challenges.
HAIviz: an interactive dashboard for visualising and integrating healthcare-associated genomic epidemiological data
Permana B, Harris P, Roberts L, et al.
Microbial Genomics. 2024. 10(2). 001200.
Abstract
Existing tools for phylogeographic and epidemiological visualisation primarily provide a macro-geographic view of epidemic and pandemic transmission events but offer little support for detailed investigation of outbreaks in healthcare settings. Here, we present HAIviz, an interactive web-based application designed for integrating and visualising genomic epidemiological information to improve the tracking of healthcare-associated infections (HAIs). HAIviz displays and links the outbreak timeline, building map, phylogenetic tree, patient bed movements, and transmission network on a single interactive dashboard. HAIviz has been developed for bacterial outbreak investigations but can be utilised for general epidemiological investigations focused on built environments for which visualisation to customised maps is required. This paper describes and demonstrates the application of HAIviz for HAI outbreak investigations.
Nextstrain: real-time tracking of pathogen evolution
Hadfield J, Megill C, Bell S, et al.
Bioinformatics. 2018. 34(23). 4121-4123.
Abstract
Understanding the spread and evolution of pathogens is important for effective public health measures and surveillance. Nextstrain consists of a database of viral genomes, a bioinformatics pipeline for phylodynamics analysis, and an interactive visualization platform. Together these present a real-time view into the evolution and spread of a range of viral pathogens of high public health importance. The visualization integrates sequence data with other data types such as geographic information, serology, or host species. Nextstrain compiles our current understanding into a single accessible location, open to health professionals, epidemiologists, virologists and the public alike.All code (predominantly JavaScript and Python) is freely available from github.com/nextstrain and the web-application is available at nextstrain.org.
Microreact: visualizing and sharing data for genomic epidemiology and phylogeography
Argimón S, Abudahab K, Goater R, et al.
Microbial Genomics. 2016. 2(11). e000093.
Abstract
Visualization is frequently used to aid our interpretation of complex datasets. Within microbial genomics, visualizing the relationships between multiple genomes as a tree provides a framework onto which associated data (geographical, temporal, phenotypic and epidemiological) are added to generate hypotheses and to explore the dynamics of the system under investigation. Selected static images are then used within publications to highlight the key findings to a wider audience. However, these images are a very inadequate way of exploring and interpreting the richness of the data. There is, therefore, a need for flexible, interactive software that presents the population genomic outputs and associated data in a user-friendly manner for a wide range of end users, from trained bioinformaticians to front-line epidemiologists and health workers. Here, we present Microreact, a web application for the easy visualization of datasets consisting of any combination of trees, geographical, temporal and associated metadata. Data files can be uploaded to Microreact directly via the web browser or by linking to their location (e.g. from Google Drive/Dropbox or via API), and an integrated visualization via trees, maps, timelines and tables provides interactive querying of the data. The visualization can be shared as a permanent web link among collaborators, or embedded within publications to enable readers to explore and download the data. Microreact can act as an end point for any tool or bioinformatic pipeline that ultimately generates a tree, and provides a simple, yet powerful, visualization method that will aid research and discovery and the open sharing of datasets.
MicrobeTrace 2.0: The enhanced visualization multitool for molecular epidemiology and bioinformatics
Shankar A, Moscoso E, Cowan D, et al.
Molecular Biology and Evolution. 2026. msaf334.
Abstract
MicrobeTrace is a free, secure, browser-based bioinformatics tool to integrate and visualize epidemiologic, laboratory, and molecular data for outbreak investigations with over 14,000 users from 127 countries. Regular testing, user feedback, and comparison with other bioinformatics tools identified areas for improvement, prompting major architectural and functional upgrades. In MicrobeTrace 2.0 we refactored the codebase using Angular to improve scalability, performance, and usability. We also replaced the D3.js visualization engine with Cytoscape.js for faster, more efficient rendering of large networks. The update adds enhanced visualizations, new analytical tools, and expanded functionality within existing views. It also supports seamless integration with external phylogenetic platforms, such as Nextstrain and UShER (Ultrafast Sample Placement on Existing Trees), enabling users to import phylogenetic trees, visualize them as genetic networks, and securely enrich them with epidemiological and demographic metadata. These enhancements position MicrobeTrace as a next-generation, interoperable tool for genomic epidemiology and data-driven public health response.
Healthcare-Associated Infections
Back to topApplications of genomic epidemiology for outbreak detection, investigation, and control in healthcare settings.
Integrating whole-genome sequencing into antimicrobial resistance surveillance: methodologies, challenges, and perspectives
Matsumura Y, Yamamoto M, Gomi R, et al.
Clinical Microbiology Reviews. 2025. 38(4). e00140-22.
Abstract
SUMMARYAntimicrobial resistance (AMR) poses a significant threat to global public health. Surveillance is a fundamental method for controlling AMR and guiding clinical decisions, public health interventions, and policymaking. Whole-genome sequencing (WGS) provides a comprehensive and accurate understanding of AMR mechanisms, gene profiling, and transmission dynamics. Public health authorities, academic scholars, hospitals, and laboratories have increasingly employed WGS-based surveillance for retrospective, real-time, and prospective monitoring of AMR and investigations of outbreaks. WGS-based surveillance has improved the accuracy and effectiveness of disease and AMR surveillance by identifying hidden transmissions and sources missed by conventional methods and by rapidly investigating and deploying infection control interventions. However, WGS analysis involves a complex combination of workflows of next-generation sequencing and bioinformatics data analysis, making it difficult to effectively compare surveillance results. It is crucial to understand the limitations of our existing WGS analyses by implementing rigorous validation practices across different WGS analyses, developing practice guidelines, and establishing appropriate quality assurance measures. These efforts will aid in the development of reliable and robust WGS systems, the harmonization and standardization of surveillance programs, and the development of public data sharing and governance frameworks. Despite these challenges, the expansion of WGS-based AMR surveillance is expected to be driven by technological advances, standardization efforts, and the recognition of its advantages among stakeholders. The integration of genomic data with nongenomic information, as well as interdisciplinary collaborations will further enhance knowledge regarding AMR and promote the development of countermeasures.
Whole-Genome Sequencing Surveillance and Machine Learning of the Electronic Health Record for Enhanced Healthcare Outbreak Detection
Sundermann A, Chen J, Kumar P, et al.
Clinical Infectious Diseases. 2022. 75(3). 476-482.
Abstract
Most hospitals use traditional infection prevention (IP) methods for outbreak detection. We developed the Enhanced Detection System for Healthcare-Associated Transmission (EDS-HAT), which combines whole-genome sequencing (WGS) surveillance and machine learning (ML) of the electronic health record (EHR) to identify undetected outbreaks and the responsible transmission routes, respectively.We performed WGS surveillance of healthcare-associated bacterial pathogens from November 2016 to November 2018. EHR ML was used to identify the transmission routes for WGS-detected outbreaks, which were investigated by an IP expert. Potential infections prevented were estimated and compared with traditional IP practice during the same period.Of 3165 isolates, there were 2752 unique patient isolates in 99 clusters involving 297 (10.8%) patient isolates identified by WGS; clusters ranged from 2–14 patients. At least 1 transmission route was detected for 65.7% of clusters. During the same time, traditional IP investigation prompted WGS for 15 suspected outbreaks involving 133 patients, for which transmission events were identified for 5 (3.8%). If EDS-HAT had been running in real time, 25–63 transmissions could have been prevented. EDS-HAT was found to be cost-saving and more effective than traditional IP practice, with overall savings of $192 408–$692 532.EDS-HAT detected multiple outbreaks not identified using traditional IP methods, correctly identified the transmission routes for most outbreaks, and would save the hospital substantial costs. Traditional IP practice misidentified outbreaks for which transmission did not occur. WGS surveillance combined with EHR ML has the potential to save costs and enhance patient safety.
Transmission of Carbapenem-Resistant Klebsiella pneumoniae in US Hospitals
Luterbach C, Chen L, Komarow L, et al.
Clinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America. 2023. 76(2). 229-237.
Abstract
BACKGROUND: Carbapenem-resistant Klebsiella pneumoniae (CRKp) is the most prevalent carbapenem-resistant Enterobacterales in the United States. We evaluated CRKp clustering in patients in US hospitals. METHODS: From April 2016 to August 2017, 350 patients with clonal group 258 CRKp were enrolled in the Consortium on Resistance Against Carbapenems in Klebsiella and other Enterobacteriaceae, a prospective, multicenter, cohort study. A maximum likelihood tree was constructed using RAxML. Static clusters shared ≤21 single-nucleotide polymorphisms (SNP) and a most recent common ancestor. Dynamic clusters incorporated SNP distance, culture timing, and rates of SNP accumulation and transmission using the R program TransCluster. RESULTS: Most patients were admitted from home (n = 150, 43%) or long-term care facilities (n = 115, 33%). Urine (n = 149, 43%) was the most common isolation site. Overall, 55 static and 47 dynamics clusters were identified involving 210 of 350 (60%) and 194 of 350 (55%) patients, respectively. Approximately half of static clusters were identical to dynamic clusters. Static clusters consisted of 33 (60%) intrasystem and 22 (40%) intersystem clusters. Dynamic clusters consisted of 32 (68%) intrasystem and 15 (32%) intersystem clusters and had fewer SNP differences than static clusters (8 vs 9; P = .045; 95% confidence interval [CI]: -4 to 0). Dynamic intersystem clusters contained more patients than dynamic intrasystem clusters (median [interquartile range], 4 [2, 7] vs 2 [2, 2]; P = .007; 95% CI: -3 to 0). CONCLUSIONS: Widespread intrasystem and intersystem transmission of CRKp was identified in hospitalized US patients. Use of different methods for assessing genetic similarity resulted in only minor differences in interpretation.
Pathogen genomics in healthcare: overcoming barriers to proactive surveillance
Sundermann A, Rosa R, Harris P, et al.
Antimicrobial Agents and Chemotherapy. 2024. 69(1). e01479-24.
Genome sequencing for prevention of health-care-associated bacterial infections
Hayden M, Sansom S, Snitkin E
Nature Reviews Microbiology. 2025. 1-15.
Abstract
Health-care-associated infections (HAIs) are a global threat. Microbial whole-genome sequencing (WGS) can strengthen HAI prevention strategies by enabling high-resolution detection and tracking of pathogen transmission and predicting clinically relevant phenotypic traits, such as antibiotic resistance and virulence. Although WGS continues to serve as a vital adjunct in outbreak responses, its role has evolved to include prospective pathogen surveillance and epidemiological monitoring. In this Review, we provide an overview of the epidemiology and microbiology of HAIs and highlight actionable insights gained by inclusion of WGS in HAI investigation and surveillance, with an emphasis on high-priority antibiotic-resistant bacterial pathogens. We explore the value of incorporating plasmid analysis into investigations and emphasize the importance of integrating genomic data with clinical and epidemiological metadata to support accurate transmission inferences. Critical methodological decisions are examined, including strategies for sample selection and the determination of an appropriate single nucleotide variant threshold to identify patients linked by transmission. The need for capacity building in low-and-middle-income countries, which bear a disproportionate burden of HAIs, is discussed. Together, these considerations underscore the transformative potential of WGS to inform targeted, data-driven interventions and advance global efforts to reduce the burden of HAIs.
Artificial intelligence enhances genomic surveillance in healthcare outbreak investigations
Sundermann A, Chen J, Saul M, et al.
Infection Control & Hospital Epidemiology. 2025. 1-5.
Abstract
Background:Outbreak investigation and control are critical for preventing the spread of infectious diseases in healthcare settings. Traditional methods rely on manual processes, which are time-consuming and limited in scope. Whole genome sequencing (WGS) surveillance improves outbreak detection but still requires extensive manual chart reviews to identify transmission routes. Integrating artificial intelligence (AI) may enhance the efficiency and accuracy of these investigations.Methods:We evaluated an AI tool developed to streamline healthcare outbreak investigations detected by the Enhanced Detection System for Healthcare-associated Transmission (EDS-HAT). For outbreaks detected between November 2021 and November 2023, multiple data elements were extracted from electronic health records (EHR) for all patients. The AI algorithm was applied to identify transmission routes, and its performance was assessed against expert manual reviews. Key measures included additional transmission routes identified and sensitivity.Results:Data from 172 outbreaks involving 476 case patients were analyzed. The AI tool identified 37 transmission routes that were missed by manual review, including procedures and provider routes. The algorithm achieved a sensitivity of 76.0% (95% confidence interval [CI] 71.1%–81.1%) compared to manual review, increasing to 91.7% (95% CI 87.7%–94.7%) after accounting for transmission at other facilities and downstream exposures.Conclusion:The EDS-HAT AI tool significantly improved outbreak investigations by automating the identification of transmission routes, both with concordant findings of manual review as well as finding additional routes of transmission missed by traditional chart review. AI with genomic surveillance has the potential to optimize outbreak detection and investigation to streamline interventions in healthcare settings.
Pathogen Specific Papers
Back to topApplications of genomic epidemiology to specific pathogens and outbreak investigations.
Real-Time Use of Monkeypox Virus Genomic Surveillance, King County, Washington, USA, 2022–2024 - Volume 31, Supplement—April 2025 - Emerging Infectious Diseases journal - CDC
Lau K, Banks M, Bryant K, et al.
2025.
Abstract
Use of Mpox Genomic Surveillance, Washington, USA
Insights on Recurrent and Sequential Clostridioides difficile Infections From Genomic Surveillance in Minnesota, USA, 2019–2021
Evans D, Friedman B, Pung K, et al.
The Journal of Infectious Diseases. 2025. jiaf505.
Abstract
The frequent temporal recurrence of Clostridioides difficile infection (CDI) may be the result of relapse with the same strain or reinfection with a different strain. We used whole-genome sequencing (WGS) to assess the genetic diversity and molecular evolution of strains that caused recurrent or sequential CDI.We analyzed data from active population- and laboratory-based surveillance of CDI in Minnesota, USA. We performed WGS on isolates collected from 306 patients with multiple CDI events during 2019–2021. We identified multi-locus sequence types (MLSTs), nucleotide variants, and putative mobile genetic elements (MGEs) from WGS data to study the genetic similarity and evolution of those C. difficile genomes.Among patients with multiple CDI events in the surveillance period, 198 (64.7%) had multiple infections of the same MLST, including 49.6% of patients with subsequent infections beyond the 8-week limit of the case definition for recurrent CDI Among 232 temporally defined events of recurrent CDI, 155 (66.8%) involved isolates of the same MLST. There were no statistically significant correlations between accumulated mutations and elapsed time between same-MLST CDI events. Analysis of sequential same-MLST C. difficile genomes showed evidence of gain or loss of putative mobile genetic elements (MGEs) in 45.6% of genome pairs.Leveraging the largest CDI genomic dataset to date, our results confirm prior findings that recurrent CDI is a combination of reinfection and/or change in the ascendant strain in mixed infection, and relapse, while expanding knowledge on the evolution of pathogenic C. difficile strains in the human gastrointestinal tract.
Genomic epidemiology of Mycobacterium tuberculosis in Wales
Pacchiarini N, Simkin F, Postans M, et al.
Scientific Reports. 2025. 15(1). 31106.
Abstract
Identification of factors contributing to tuberculosis (TB) transmission can guide targeted measures to reduce morbidity. Varying findings for factors associated with TB genomic clustering exist. We describe Mycobacterium tuberculosis strain diversity, drug-resistance, and ongoing transmission in Wales using single nucleotide polymorphisms (SNP)-based typing to infer lineage and clusters. TB cohort data on isolates from Welsh residents from 2012 to 2022, patient level data from the National TB Surveillance System and SNP-based data, were merged. Descriptive epidemiology and logistic regression modelling were used to identify factors associated with genotypic clustering. 215 cases were included in the cluster analysis (66% male and 46% born outside of the UK); 115/215 belonged to 30 genomic clusters belonging to lineages 2–4. Most clusters corresponded to Lineage 4 and were distributed within South Wales. There were significant differences in the distribution of ethnicity, age group, and deprivation (Welsh Index of Multiple Deprivation, WIMD) in our sample compared to the Welsh population. Resistance to rifampicin and isoniazid and predicted resistance to ethambutol, aminoglycosides, pyrazinamide, and quinolone was low. Factors associated with increased odds of clustering included being UK-born and having pulmonary disease. Due to the identification of the above factors associated with TB genomic clustering, as well as the differences in ethnicity, age group, and WIMD quintile, prevention strategies for TB screening targeted towards these groups may be considered. Future work may evaluate the utility of additional control measures within these populations when the onset case in a genomic cluster has any of these characteristics.
Associations Between Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Variants and Risk of Coronavirus Disease 2019 (COVID-19) Hospitalization Among Confirmed Cases in Washington State: A Retrospective Cohort Study
Paredes M, Lunn S, Famulare M, et al.
Clinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America. 2022. 75(1). e536-e544.
Abstract
BACKGROUND: The coronavirus disease 2019 (COVID-19) pandemic is dominated by variant viruses; the resulting impact on disease severity remains unclear. Using a retrospective cohort study, we assessed the hospitalization risk following infection with 7 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants. METHODS: Our study includes individuals with positive SARS-CoV-2 reverse transcription polymerase chain reaction (RT-PCR) in the Washington Disease Reporting System with available viral genome data, from 1 December 2020 to 14 January 2022. The analysis was restricted to cases with specimens collected through sentinel surveillance. Using a Cox proportional hazards model with mixed effects, we estimated hazard ratios (HR) for hospitalization risk following infection with a variant, adjusting for age, sex, calendar week, and vaccination. RESULTS: In total, 58 848 cases were sequenced through sentinel surveillance, of which 1705 (2.9%) were hospitalized due to COVID-19. Higher hospitalization risk was found for infections with Gamma (HR 3.20, 95% confidence interval [CI] 2.40-4.26), Beta (HR 2.85, 95% CI 1.56-5.23), Delta (HR 2.28 95% CI 1.56-3.34), or Alpha (HR 1.64, 95% CI 1.29-2.07) compared to infections with ancestral lineages; Omicron (HR 0.92, 95% CI .56-1.52) showed no significant difference in risk. Following Alpha, Gamma, or Delta infection, unvaccinated patients show higher hospitalization risk, while vaccinated patients show no significant difference in risk, both compared to unvaccinated, ancestral lineage cases. Hospitalization risk following Omicron infection is lower with vaccination. CONCLUSIONS: Infection with Alpha, Gamma, or Delta results in a higher hospitalization risk, with vaccination attenuating that risk. Our findings support hospital preparedness, vaccination, and genomic surveillance.
Suggest Additional Publications
Know of an important publication that should be included? We welcome community input to keep this list current and comprehensive.