ML (Machine Learning) at Georgia Tech

PhD Defense | On the Resource Efficiency of Language Models

Title: On the Resource Efficiency of Language Models

Date: April 16th, 2025

Time: 1:00 pm - 3:00 pm (EST)

Location: CODA C1308

Meeting URL: https://gatech.zoom.us/j/91275576046

Meeting ID: 912 7557 6046

Rongzhi Zhang

Machine Learning PhD Candidate

School of Computational Science and Engineering
Georgia Institute of Technology

Committee

1. Dr. Chao Zhang (CSE, Georgia Tech) (Advisor)

2. Dr. Tuo Zhao (ISyE, Georgia Tech)

3. Dr. Steve Mussmann (CS, Georgia Tech)

4. Dr. B. Aditya Prakash (CSE, Georgia Tech)

5. Dr. Yelong Shen (Microsoft)

Abstract
Large Language Models (LLMs) have achieved remarkable progress across natural language processing tasks, yet their broad application remains constrained by resource challenges. This thesis addresses these challenges from two complementary thrusts: data efficiency in the post-training stage and model efficiency in the deployment stage. The proposed approaches aim to reduce supervision and memory requirements while preserving or even enhancing downstream performance.

Thrust I: Data Efficiency in the Post-Training Stage

In the post-training stage, adaptation to specific tasks or alignment with human values demands large quantities of high-quality labeled data. To improve data curation efficiency for fine-tuning pre-trained language models, I introduce PRBoost, an interactive weak supervision framework that iteratively discovers labeling rules, which mitigates the data scarcity issue and boosts model performance over existing weakly-supervised baselines. To improve data utilization efficiency in LLM alignment, I propose DORM, a two-stage approach that dynamically adjusts preference data weights via quality-aware weighting and bilevel optimization, achieving strong alignment results using up to 40× less data than conventional techniques.

Thrust II: Model Efficiency in the Deployment Stage

In the deployment stage, the use of LLMs in resource-limited environments is constrained by their enormous parameter counts and memory requirements. To enhance model parameter efficiency, I develop PTLoss, a perturbation-based distillation framework that improves student model performance when distilling from biased teacher models. To enhance model efficiency during inference, I present LoRC, a progressive KV cache compression strategy based on low-rank approximations of KV weight matrices, which demonstrates substantial GPU memory savings with minimal performance degradation.

Together, these contributions establish a comprehensive framework for resource-efficient language models, enabling more practical application of LLMs across resource-constrained environments.

Georgia Institute of Technology

ML@GT

Search form

PhD Defense | On the Resource Efficiency of Language Models

Event Details

Georgia Tech Resources

Visitor Resources