Research

Affiliations: ML@GT with ISyE, AE, CSE, ECE, IC, IDEAS, IRIM, GTRI and ARC As data continues to grow around us, the fact is data and related information will no longer be static and persistent. In most situations, the data available to us is updated asynchronously as information becomes available, and it is critical that the inferences drawn from the data are updated in real-time. In traditional or enterprise systems, dynamic data is likely to be transactional. In engineering systems, dynamic data is also prevalent as new data continues to become available (e.g., as a robot explores newer environments or more people join a social network). This requires representation of such forms of dynamic data, online decision making, and adapting dynamically to new observations and transactions

Affiliations: ML@GT with ISyE, ECE, BME, IC, CSE, CS, GT-NeuroMachine learning researchers have long looked to neural systems for inspiration.  For example, Deep Learning methods (inspired by the hierarchical organization of cortex) are part of a broader family of machine learning methods that learn meaningful representations from large-scale unlabeled data. Deep learning algorithms have recently found wide use for domains like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics, with state-of-the-art results often significantly outperforming traditional approaches. Additionally, researchers are also exploring neurally-inspired algorithms for efficient approaches to many core tasks in machine learning.At Georgia Tech, we have a variety of researchers undertaking extensive and comprehensive studies at the foundational level of neural computation, including deep learning (new theory, algorithms and computational approaches for large datasets), randomized dimensionality reduction in neural systems, and solving the optimization programs required by machine learning algorithms in novel ultra-efficient neurally-inspired hardware. Georgia Tech will benefit from bringing together all the campus experts in this area to both undertake a foundational push to unravel the theoretical aspects of neural computation and to apply it to real world problems within an interdisciplinary context.

Affiliations: ML@GT with CSE, BME, IC, ISyE and IDEASExtraction of trends and patterns from data is a core problem in data analytics. Building on pattern recognition, anomaly detection is aimed at the identification of items, events, or observations that do not conform to an expected (or learned) pattern or other items in a dataset. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems, or finding errors in text. Methods for anomaly detection range from detection of outliers within a given dataset, or determination of observations that do not fit into a dataset, either structurally within a sequence or probabilistically over the data. At Georgia Tech, there has been some work already in this space, as several researchers participated in the DARPA ADAMS (Anomaly Detection at Massive Scale) effort and there has been much work on activity recognition and anomaly detection for a variety of healthcare and surveillance applications. A recent workshop organized by Dana Randall and David Bader for IDEAS identified several core research questions in this area and domains for researchers to work together on to seek new insights from real-world data.

Affiliations: ML@GT with IC, CSE, BME, ISyE, and IPATWhile amazing strides have been made recently in ML techniques applied to all types of data, one thing still remains important: most data is generated by people or for people. This suggests a need for interactive machine learning systems that can take input from people, sometimes in forms of specific annotation and labeling for classification tasks (e.g., labeling an object as ‘car’) or for providing direct feedback and reinforcement for any kind of recognition task. Such interactive machine learning systems become essential as sensing and computing technologies seek to become ubiquitous in our daily lives (e.g., fitbit), and personalization of technologies becomes crucial as we go about our lives (e.g., daily activity and dietary patterns). Furthermore, the growth of crowdsourcing technologies (Amazon’s Mechanical Turk) and human computation also introduces new avenues for interactive systems. Systems where users, and people in general, can provide strong (structured, supervised, labeled) input will allow for learning systems to refine their inferences actively and online, significantly boosting learning rates and performance outcomes.At Georgia Tech, there is significant research in Interactive Machine Learning, applied to healthcare domains, and in developing autonomous or robotic systems. Interactive techniques are also being leveraged for data mining and large scale data visualization and sensemaking domains

Affiliations: ML@GT with IPAT, BME, GVU, CBI, CCH and EI2Medical and healthcare data analysis is an important focus in IDEAS and data analytics holds large promises in the medical arena, including modeling of genomic data, healthcare records, models of disease progressions, medication and outcomes research, and, most recently, behavior data of patients.At Georgia Tech, there is an active group of researchers with IPaT leading a relationship with Children’s Hospital of Atlanta and IBB and BME leading charge on all aspects of Biomedical Research with Emory and other local and national partners. ISyE is also an active player in research related to data and healthcare. Machine Learning forms the core of data analytics and we will work with the appropriate teams across GT to identify strategic partners to engage in providing foundational support from technical aspects for a long-term and sustainable effort in this area. ML will serve as an effective bridge between all Georgia Tech constituents in this area and will provide a strong connection to an industry hungry for meaningful data analytics.

Affiliations: ML@GT with IC, IAC/Econ, IPAT, C21U, CogCAs in the case of healthcare data, education related data is also becoming rich, detailed, and lifelong. One can track an individual’s performance over a lifetime at an individual course and test level. This provides for a very detailed analysis of a student’s performance - from early years to lifelong learning efforts - with the analytics providing insights into improving individual performance. A recent study at Georgia State University showcases the value of using data analytics on student records to support timely interventions with advising for the purpose of improving educational outcomes. Such data analytics can be used for assessing students performance, developing personalized tracks for students, and delivering much needed personalized advising.At Georgia Tech, the analysis of education data is core to our mission and we already have significant in-house data, with the possible availability of auxiliary data. Under the leadership of C21U working with the School of Economics (IAC), there are opportunities to work with the Georgia Department of Education to undertake data collection and data analytics at an unprecedented level. Additionally, IPAT is engaging with ACT (American College Test) on both data analytics and student assessment techniques. This will allow for both access to large amounts of data and assessment criteria used in large scale testing. IPAT, in collaboration with the Cognitive Computing effort (CogC) has also established connections with IBM to leverage Watson / Cognitive Computing to model and support students interactions in both on-campus and online courses. All of this suggests that as online platforms for courseware and assessment and content (e-textbooks) become pervasive, we have an opportunity for research to integrate newer data sources that would be enhanced by our foundational technical thrusts applied to this new form of data.

Affiliations: ML@GT with IPAT, GVU, ISyE, C4NGL, SCL, Physical Internet CenterIn manufacturing, logistics, supply chain management, and more broad industrial applications, the use of data plays a crucial role. This holds true in particular for the cyber-physical system context that has received prominent attention recently within the Industrial Internet, Industry 4.0, and Internet of Things (IoT) frameworks. All of these application domains heavily rely on machine learning in various levels of sophistication. For example, Heidelberg’s printing presses automatically learn an optimal mix of colors and other operational parameters as a function of environmental conditions, such as temperature and humidity, by automatically scanning the printed media in real-time. In logistics and supply chain applications we often face supply-and-demand-matching questions, where optimal pricing and distribution strategies have to be learned “on the go”, while minimizing the so-called “regret” which captures the cost of suboptimal decisions. Many of the challenging and diverse questions we face today require very sophisticated and fast machine learning methodologies. Such questions include e.g., the learning of customer ordering behavior in order to anticipate order flow and are faced by companies such as Amazon and Macy's, as well as the largest-scale autonomous AGV routing in the TUAS port in Singapore.At Georgia Tech, there is a large number of uniquely qualified researchers in these application areas. Combined with various centers that channel these activities, such as the Manufacturing IRI, Supply Chain and Logistics Center, and the Center on Next Generation Logistics, Georgia Tech will be at the forefront of translating modern machine learning methodologies into this industry space.

Affiliations: ML@GT with IRIM, GTMI,GTRI, Physical Internet CenterA large amount of data currently available is collected from sensors around us. These may be mobile sensors, providing locations and activities of people; road sensors, determining traffic patterns; or sensors located in the space station, capturing all types of data. These new data sources are empowering an even bigger push towards building autonomous systems. Self-driving cars, smart cities, and automated industrial plants are exciting examples of these. We are witnessing a huge investment in self-driving cars. Smart factories are also in sight with continuing growth in manufacturing and automation. Smart city infrastructures, while still far away, are garnering more attention as cities need to dynamically adapt for day to day operations or respond to crisis. In all of these instances, and in many others, the data requirements are far beyond what has been envisioned or planned for, and more of this data is coming. A single autonomous car senses and captures a lot of data while leveraging the cloud to access all the additional information from previously captured and processed data (street maps and locations and such). ML is a key ingredient in this data analysis as it learns from real physical data and supports direct decision making for an autonomous system.At Georgia Tech, we already have strong presence in the autonomous system and robotics areas with the IRIM IRI. We are also a leader in advanced manufacturing and GT is well positioned with the Manufacturing IRI (GTMI). ML builds a bridge from these IRIs to the IDEAS effort, as Robotics and Intelligent and Autonomous Machines as well as various new manufacturing paradigms (e.g., smart factories, Industry 4.0, Industrial Internet, etc.) have a deeply rooted connection to inference, prediction, decision making and categorization. Many faculty who are part of the IRIM effort work in both the foundations and applications of ML and similarly many faculty that interact with GTMI use and advance ML methodology in the realm of engineering and manufacturing innovation. The ML@GT would support and enhance the much needed interaction between these efforts at Georgia Tech.The College of Engineering at Georgia Tech is host to several high-profile research projects on the theory and practice of integrated sensing and processing.  Sensors have become more ubiquitous as they have become less expensive and more reliable.  As a result, systems in industrial monitoring, video tracking and surveillance, and remote sensing have become data rich; there is an order of magnitude more data being produced than could ever be processed. Next-generation devices will have built-in low-power computation at the sensor (in-situ processing); these devices directly cull low-dimensional features from the data as it is acquired, greatly reducing the amount of information that needs to be communicated downstream. ML plays a significant role in making these sensors adaptive and robust.

Affiliations:ML@GT with IPAT, CSE, GVU, IDEAS, Social Computing, Policy The growth and pervasiveness of internet and mobile computing has changed how we communicate with each other. This has allowed for the growth of a new form of research, referred to as computational social science,  where data analytics is employed to study social behaviors on the network. Arguably, such an ability to capture and analyze data has propelled social science to be on par with physical sciences. Many research questions are now being considered using these new “microscopes” into how people communicate and interact, including how groups are formed; how collaborations happen; how information is gathered and refined; how content is created, shared, distributed, and commented on; and how identity is manifested. All these fundamental questions are receiving considerable research attention and are already yielding fascinating results. At Georgia Tech, we have significant expertise in this area with many faculty from computing, sciences, and humanities already engaged in research in this area, having also organized several Atlanta area workshops in collaboration with Emory. Additionally, Georgia Tech has established the area of Computational Journalism to study how news and information gathering is adapting with the technological changes. Machine Learning has a core role to play as social computing data is truly dynamic data, changing rapidly and in every way possible. Being able to discern patterns and predict trends is a challenging research problem.  How crowds work to produce content and information remains an important challenge for classification and learning problems. Furthermore, analytics of text, audio, and visual data on social media is heavily dependent on ML. Effectively, social computing data provide a new avenues for applying data analytics to understand large scale social phenomena and is a valuable new thread for ML and IDEAS research.

Affiliations: ML@GT with IC, SCS, SCB, IISP, IDEAS Information Security and Privacy (ISP) responds to the primary issue in developing systems aimed at data access, analytics, and distribution. Georgia Tech’s newly formed IRI in this area, The Institute for Information Security & Privacy (IISP) is already a leader in research, development, and outreach in this area. IISP will be a valuable partner in all IDEAS efforts as we seek to provide the correct level of protection to data crossing the domains of healthcare, education, logistics, social computing, etc. However, Machine Learning techniques are also a valuable research tool to support research in Information Security and Privacy. For example, ML approaches are greatly utilized in developing encryption and data hiding technologies. Spamming technologies, which have gone far beyond email to pervading the internet with malicious bots and trolls, require dynamic data analytics to keep ahead of misinformation spreading and identify theft. Anomaly detection is a crucial element in detecting and thwarting intrusions into a network and for insider threat detection. Interactive ML is needed as we move toward more personalized information that requires specific interactions to protect our data. At Georgia Tech, there are many ML faculty who are interested in working in the area of Information Security and Privacy. Working with IISP, our goal is to engage the leading experts in ISP and ML to develop new tools for supporting ISP with real-time data analytics. Initial opportunities lay in developing ML technologies supporting Security Information and Event Management (SIEM) tools that provide real-time analysis of security alerts. Furthermore, Identity and Access Management (IAM) systems will also benefit from ML as data analytics can be leveraged to supply identity context with attributes such as role, entitlements, and organizational structure, to enhance the information necessary to determine risk. Finally, our ongoing work in behavior modeling and anomaly detection from data can provide much needed support for creating secure applications and services. Overall, at Georgia Tech we expect the ML effort to build a bridge between IDEAS and IISP IRIs.

Affiliations: ML@GT with SCB, ISyE, Math, CoC, IDEAS, QCFFinancial Markets and trading have changed dramatically over the last few years with the advent of algorithmic and high frequency trading. Billions of stocks, bonds, derivatives and other financial securities are bought and sold in the financial markets every day with some of the securities held only for a few milliseconds. Apart from trading, the explosion in the availability of structured and unstructured data has opened up many applications for machine learning in the finance domain. Applications include analyzing and understanding regulatory filings, social media and speech of executives. Machine Learning techniques have broad applicability in finance including development of trading strategies, fintech, fraud detection, compliance, credit risk, and risk management.At Georgia Tech, we have significant expertise in financial markets research. The finance group at Scheller has 10 faculty members who along with another 10 PhD students are dedicated to financial market research. The finance group at Scheller has a history of producing impactful research with numerous publications in the top finance journals and hundreds of presentations at top finance conferences. In addition, the finance group has organized a number of major conferences such as 11 annual international finance conferences, many of them with Nobel laureates has key note speakers. In 2015 Scheller hosted a major finance conference with more than 300 scholars from academia, industry and regulators. Apart from the finance group that is dedicated to finance research, there are many faculty members from other areas in Scheller College of Business, College of Engineering and College of Sciences, who are engaged in research in financial markets. This is also reflected in these three colleges collaborating on the MS program in Quantitative and Computational Finance that has graduated more than 600 students over the last fifteen years. There continues to be a trend for a strong industry demand for finance students with Machine Learning skills. This thrust will focus on both foundational research in this area, followed with applied efforts with corporate sponsors, while training experts in this field.