Title: On the Resource Efficiency of Language Models
Date: April 16th, 2025
Time: 1:00 pm - 3:00 pm (EST)
Location: CODA C1308
Meeting URL: https://gatech.zoom.us/j/91275576046
Meeting ID: 912 7557 6046
Rongzhi Zhang
Machine Learning PhD Candidate
School of Computational Science and Engineering
Georgia Institute of Technology
Committee
1. Dr. Chao Zhang (CSE, Georgia Tech) (Advisor)
2. Dr. Tuo Zhao (ISyE, Georgia Tech)
3. Dr. Steve Mussmann (CS, Georgia Tech)
4. Dr. B. Aditya Prakash (CSE, Georgia Tech)
5. Dr. Yelong Shen (Microsoft)
Abstract
Large Language Models (LLMs) have achieved remarkable progress across natural language processing tasks, yet their broad application remains constrained by resource challenges. This thesis addresses these challenges from two complementary thrusts: data efficiency in the post-training stage and model efficiency in the deployment stage. The proposed approaches aim to reduce supervision and memory requirements while preserving or even enhancing downstream performance.
Thrust I: Data Efficiency in the Post-Training Stage
In the post-training stage, adaptation to specific tasks or alignment with human values demands large quantities of high-quality labeled data. To improve data curation efficiency for fine-tuning pre-trained language models, I introduce PRBoost, an interactive weak supervision framework that iteratively discovers labeling rules, which mitigates the data scarcity issue and boosts model performance over existing weakly-supervised baselines. To improve data utilization efficiency in LLM alignment, I propose DORM, a two-stage approach that dynamically adjusts preference data weights via quality-aware weighting and bilevel optimization, achieving strong alignment results using up to 40× less data than conventional techniques.
Thrust II: Model Efficiency in the Deployment Stage
In the deployment stage, the use of LLMs in resource-limited environments is constrained by their enormous parameter counts and memory requirements. To enhance model parameter efficiency, I develop PTLoss, a perturbation-based distillation framework that improves student model performance when distilling from biased teacher models. To enhance model efficiency during inference, I present LoRC, a progressive KV cache compression strategy based on low-rank approximations of KV weight matrices, which demonstrates substantial GPU memory savings with minimal performance degradation.
Together, these contributions establish a comprehensive framework for resource-efficient language models, enabling more practical application of LLMs across resource-constrained environments.