Title: Improving the Robustness of Natural Language Processing to Dialects and Language Variants
Date: 11/19/2025
Time: 12-2PM EST (9-11AM PST)
Location: https://gatech.zoom.us/j/4263320954?pwd=MGtPdUhKd0RIYWdqNzU4VW5RSk5zdz09 (with a small in person presence at Stanford University Gates 415)
William Held
Machine Learning PhD Student
School of Interactive Computing in the College of Computing
Georgia Institute of Technology
Committee
1 Diyi Yang
2 Mark Riedl
3 Larry Heck
4 Zsolt Kira
5 Percy Liang
Abstract: English — as a global language spoken by billions across continents — is rich with variation. Despite the number of speakers of other variants and dialects, most language technologies primarily serve Standard American English speakers, creating systematic barriers for other dialect communities. My research establishes empirical evidence for these disparities through novel controlled experiments and user experience studies spanning multiple English varieties. Building on these findings, I have developed computationally efficient adaptation techniques that enhance dialect robustness without requiring task-specific annotations. Finally, I have examined how dialect performance evolves as models scale, using scaling laws to assess whether increased compute alone can close dialect gaps or if targeted interventions remain necessary. These contributions advance both the theoretical understanding of language variation as a dimension of NLP performance and provide practical machine learning methods for building language technologies that serve English in all its forms.
