Jacob Eisentein (Georgia Tech) "Two new machine learning approaches for text classification"

Title: Two new machine learning approaches for text classification


Text document classification is one of the most well studied applications of machine learning. Yet this technology is still limited by practical difficulties and invalid underlying assumptions.

First, many people who want text classifiers do not have the time or resources to annotate a dataset. They often employ a heuristic alternative: they create word lists for each label class, and then perform prediction by selecting the class whose list matches the largest number of words in the text. This heuristic is theoretically unjustified, and mistakenly assigns the same importance to every word in the list. I show that list-based classification can be viewed as a (very!) special case of Naive Bayes. Based on this analysis, it is possible to estimate weights for each word without supervision, using the method-of-moments.

Second, machine learning approaches to text classification nearly always begin with an IID assumption. Yet words can mean different things to different people, raising the possibility for misunderstandings even in human-human conversation. One potential solution is to relax the IID assumption by personalizing text classifiers to the author. An apparent roadblock is the challenge of obtaining labeled data for each author. I will present a method that sidesteps this requirement by relying on the sociological theory of homophily, which states that people who are socially connected tend to share personal traits. This idea can be formalized by estimating node embeddings for each individual in a social network, and then using these embeddings to drive a social attentional mechanism in a neural ensemble classifier. The resulting system obtains significant improvements on sentiment analysis in Twitter. This project is joint work with Yi Yang.

IDEaS and ARC DL: Jon Kleinberg "Human Decisions and Machine Predictions"

IDEaS and ARC Distinguished Lecture

As part of ARC10: Celebrating 10 years of the Algorithms and Randomness Center

Monday, October 24 at 10 AM

Jon Kleinberg (Cornell University)

Human Decisions and Machine Predictions

Klaus Advanced Computing Building Room 1116


IEEE DL Seminar by Paris Smaragdis on "Machine Learning Approaches for Speech Enhancement"

Paris Smaragdis, UIUC & Adobe Research

Date: 25 October 2016
Time: 11:45 AM to 01:00 PM

Technology Square Research Building (TSRB) 125

IRIM Seminar by Byron Boots on "Closing the Gap Between Machine Learning and Robotics"

Wednesday, Oct. 5, 2016
Marcus Nano Bldg. • Rooms 1116-1118
12:00–1:00 p.m.


Subscribe to RSS - Seminars