Yinghao Li - Machine Learning PhD Student - School of Electrical and Computer Engineering
Date: June 19th
Time: 1:00 PM – 2:30 PM ET
Location: Virtual
Meeting Link: https://gatech.zoom.us/j/92572348837
Committee
1 Dr. Chao Zhang (Advisor; CSE)
2 Dr. Rampi Ramprasad (Co-Advisor; MSE)
3 Dr. Tuo Zhao (ISyE)
4 Dr. Srijan Kumar (CSE)
Abstract
Recent advances in machine learning emphasize the significance of high-quality training data, particularly for supervised models that rely on manually curated labels for tasks such as material property prediction and enhancing language models through reinforcement learning from human feedback (RLHF). Despite the success of label-free training in fields like language and image understanding, obtaining accurate, high-quality labeled data remains a challenge due to its cost and the expertise required for manual labeling. My research focuses on automating the annotation process to reduce costs and maintain label quality, improving model sensitivity to data quality, and utilizing language models in situations with limited labeled data. I have developed weakly supervised learning methods that use simple labeling functions to automatically annotate data and augmented hidden Markov models to aggregate noisy labels and improve named entity recognition (NER). Additionally, I have incorporated uncertainty quantification with pre-trained models to enhance molecular property prediction by identifying mislabeled or anomalous data points, thus boosting model reliability and generalization. Our research also explores the reasoning abilities of advanced language models like GPT-4 on novel tasks such as Minesweeper, assessing their ability to generalize to new challenges and refining reward models in RLHF to better reflect human feedback.