Fall Capstone Final Presentation Schedule and Information

AI Guide Dog: Predicting Egocentric Movement on a Smartphone

Abhishree Shetty, Aishwarya Jadhav, Jeffery Cao, Aditi Sharma, Ben Sukboontip, Jayant Sravan Tamarapalli, Jingyi Zhang, Urvashi Priyam Kumar

AI Guide Dog attempts to democratize egocentric trajectory prediction on a smartphone for the blind community. We propose an end-to-end data pipeline for self-supervised egocentric trajectory prediction research consisting of the following: (1) a data collection module ingesting data signals generally accessible through default sensors and software stack that ship with popular smartphones; (2) a data annotation module built on top of collected data that requires minimal human tuning: (3) an egocentric modeling module to provide future directions to the users.We also demo an iOS application that takes advantage of the learned models to make real-time predictions. We will release our dataset and models post-publication.

Monday, December 5
TEP 1403
10:10-11:30
Presentation Recording

Komatsu: Intelligent Longwalls

Anveshrithaa Sundareswaran, Ge Huang, Zhiyi Li, Zhouyang Li

Komatsu is a leading manufacturer of mining equipment. Komatsu uses intelligent longwall systems that capture machine data from automated, connected machinery and transform it into useful decision-making knowledge to support decision-making in machine operation and maintenance. This project aims to apply machine learning to build and deploy pipelines for identifying correlations, patterns, and anomalies in longwall machines to improve the operational efficiency of the machines and reduce energy consumption.

Monday, December 5
GHC 4405
1:00-1:40
Presentation Recording

SimBot - Navigation and Interaction

Adhokshaja Madhwaraj, Jessica Zhong, Kushagra Mahajan, Malaika Vijay, Sai Vishwas Padigi, Vineeth Reddy Vatti

Embodied task-completion agents are intelligent agents that can perceive, navigate, and manipulate objects in an environment. We developed a bot for the SimBot challenge in conversational embodied task completion, where users can converse with and guide the bot to complete tasks in a 3D environment. The Navigation and Interaction thrust of the SimBot project focuses on scene understanding and action planning to perform actions that satisfy a user’s instruction. Our primary contributions include an Image Segmentation model to identify objects of interest and an Action Planning module capable of generating a logical sequence of actions to execute on the simulator.

Tuesday, December 6
POS 153
8:35-9:15
Presentation Recording

SimBot Dataset Collection: An Embodied Vision-Audio-Navigation Task

Bharani Ujjaini Kempaiah, Linyi Li

Intelligent robotic agents that operate in human spaces must be capable of realizing and executing instructions that are conveyed in natural language via speech. Most existing vision and language benchmarks contain instructions in natural language via text, and subsequently, models trained on these datasets also rely heavily on text as an intermediate medium to achieve grounding. In this work, we aim to enhance the existing ALFRED benchmark by crowd-sourcing speech annotations for existing ALFRED demonstrations of household tasks. We build web interfaces to qualify crowd workers before collecting speech annotations online. With these speech annotations, we believe the research community can develop models that can power robots to achieve audio-visual grounding similar to how humans interact in the real world.

Tuesday, December 6
POS 153
9:15-9:55
Presentation Recording

Learn-to-Race: A Multimodal Control Environment for Autonomous Racing

Kevin Chian, Arav Agarwal, Sidharth Kathpal, Yujun Qin, Tanay Gangey

Autonomous racing is a sub-field of autonomous driving which has been studied considerably less than urban driving. To further research in this area, we implement a set of extensions spanning reinforcement learning (RL), Computer Vision (CV), and robotics interfaces upon the Learn-to-Race framework, which itself runs on top of a racing simulator. Our project also has a software development component that involves creating interfaces to connect to an actual vehicle using ROS. These research and development thrusts are crucial to designing safe and fast autonomous agents, as failures in real-life are exceptionally costly. The safe policies learned by the autonomous racers can be generalized to the real world to have safer and faster autonomous driving agents.

Tuesday, December 6
POS 153
11:50-1:10
Presentation Recording

EnvPool

Yukun Jiang, Leo Guo, Yufan Song, Ting Luo, Tianyi Sun, Peilin Rao,

EnvPool is a C++-based batched environment pool with pybind11 and thread pool. It has high performance (~1M raw FPS with Atari games, ~3M raw FPS with Mujoco simulator on DGX-A100) and compatible APIs (supports both gym and dm_env, both sync and async, both single and multiplayer environment)

Wednesday, December 7
TEP 1403
10:10-10:50
Presentation Recording Not Available

Multimodal Question Answering

Kunal Dhawan, Manoj Ghuhan Arivazhagan, Wenxing Deng

Multimodal Question Answering (MQA) is a rapidly growing area of research that aims at building intelligent systems that can respond to user queries by reasoning over information from multiple modalities. Such systems try to emulate human beings who also rely on cross-modal reasoning to answer any question thrown at them. Current MQA approaches suffer from various drawbacks like biased datasets used for training, the inability to answer simple counting-based questions, and the tendency to learn surface-level relationships rather than building reasoning. In this work, we aim to overcome these limitations and propose a new end-to-end MQA system. The major contributions of this work would be: 1) Curation of an MQA dataset which consists of a diverse set of question types capturing complex interactions and relationships between different objects in the images and is devoid of any inherent biases, 2) Improved feature extraction module which can handle and even generate scene graphs given input images, 3) Instance segmentation module to improve MQA system performance for counting related questions, 4) End-to-end trainable MQA pipeline which outperforms the current state-of-the-art.

Wednesday, December 7
TEP 1403
10:50-11:30
Presentation Recording

ASML YieldStar: Particle / Defect Detection and Classification

Mahalakshumi Visvanathan, Wei-Chieh Chen, Yijia Zhang

YieldStar is an advanced wafer metrology technique provided by ASML. It is used to verify the quality of the produced wafer. However, even tiny errors could affect production since YieldStar has complicated and expensive optics. Currently, production technicians rely on their knowledge to catch defects. This process takes over 500 hours per year, which is very time-consuming. To improve production efficiency, we've proposed two major techniques to improve it: a classification Convolutional Neural Network (CNN), which can detect if a particle is present or not, and a Convolutional Auto-Encoder that can localize and measure the intensity of the particle.

Wednesday, December 7
GHC 4405
1:00-1:40
Presentation Recording

Predicting Flatness Error with ASML Wafer Table

Tz-Ruei Liu, I-Tsun Cheng, Xinyan Xie

At ASML, the flatness error of wafer tables is computed manually using a fixed formula. Although exact, the current system takes more than 5 minutes to compute all spec maps per sample, which is inefficient. In this work, we introduce a machine-learning method that predicts spec maps with sufficiently small errors while being substantially faster. We use U-Net and its variant SmaAt-UNet and show that they perform very effectively at our task. Equipped with attention and depthwise-separable convolutions, SmaAt-UNet achieves less than 2% error across all spec maps while taking only 30 seconds to compute, 10x faster than the current system.

Wednesday, December 7
GHC 4405
1:40-2:20
Presentation Recording

Self Driving Databases Management System

Kushagra Singh, Lichen Jin, He-Wei Lee

Our project presents the control plane for self-driving database systems -- a framework that orchestrates database tuning operations and manages resources for production database clusters hosted in heterogeneous environments.

Wednesday, December 7
GHC 4405
2:20-3:00
Presentation Recording

Autobatch - Ragged Tensor's Shape Representation and Efficient Computation

Bowen Chen

In this report, we will introduce a new shape representation of the Ragged Tensor, which is a typical input workload in NLP models. We will discuss: the design of Ragged Tensor Intermediate Representation (IR), the implementation of Ragged Tensor API upon Relax, a graph level optimization based on RaggedTensor IR, and an auto-batch user interface enabled by Ragged Tensor IR.

Wednesday, December 7
GHC 4405
3:00-3:40
Presentation Recording

Secure NLP Inference

Shreya Sharma

With the advent of cloud computing benefits such as elasticity and availability at affordable rates, a large number of machine learning workloads are migrating to the cloud for operations. However, in this paradigm, sensitive data may be leaked to service providers if they are curious or compromised. This project aims at making existing NLP models oblivious, i.e., enabling secure inference on a trained BERT model without revealing any information about the client input data. Such a service can find uses in audits or private contract reviews, and entity recognition.

Thursday, December 8
POS 153
8:35-9:15
Presentation Recording

Data Augmentation for Information Retrieval

Preksha Patel, Ramya Ramanathan, Riddhi Nisar, Sayani Kundu, Vivek Sourabh

In our capstone, we aim to build a data augmentation module, which can be easily plugged into the pipeline of Information Retrieval frameworks. We also aim to finetune and improve the re-ranker in FlexNeuART using augmented data. We use the MS MARCO passage re-ranking dataset for this task and implement three categories of data augmentation techniques - rule-based, model-based, and query reformulation. We evaluate our results on the mean reciprocal rank (MRR) metric.

Thursday, December 8
POS 153
9:15-9:55
Presentation Recording

Simbot Dialog and Language Generation

Shubham Phal, Nikhil Gupta, Prasoon Varshney, Shubham Virmani, Benny Jiang, Xinyue Chen

Our work on the Simbot Challenge asks a basic question: How does language change when situated? How do objects in the environment and behaviors of people around us inform how language utterances are interpreted? Our project uses advanced NLP techniques to model a two-way free-form dialogue between a commander and a multimodal embodied agent to perform a plethora of complex tasks in a simulated environment.

Thursday, December 8
POS 153
11:50-1:10
Presentation Recording

Programmable Storage

Jiuzhi Yu, Sumanth Rao

With more data to analyze in this big data era, traditional CPU-centric processing is constrained by the large volume of data transferred between the storage system and memory. In the meantime, with the advances in the computational storage devices in the industry, we are seeing the potential of offloading computation to a storage device to optimize the applications which process large amounts of data.In this project, we explore the potential of offloading RocksDB compaction operations and demonstrate the performance gain for the operation itself and for the whole system.

Monday, December 12
GHC 8102
10:00-10:40
Presentation Recording

Automated Sensemaking for Online Tasks

Shuyu Jiang, Ye Rin Han

Skeema is an extension of the Google Chrome browser where users can manually organize their open tabs and nest them under user-defined tasks. We aim to enable Skeema to automatically categorize users' open tabs into tasks. We generate multiple features from the tab URL and title and build a Multi-layer Feed-forward Neural Network to experiment with different combinations of features. Using the distance matrix generated from the classifier, we create clusters of tabs through Agglomerative Clustering. The final clustering model achieves 0.8389 accuracy, and this result demonstrates that our approach to engineering the new features and model architecture is promising.

Monday, December 12
GHC 8102
10:40-11:20
Presentation Recording

Intelligent Text-Conditioned Music Generation

Zhouyao Xie, Nikhil Yadala, Xinyi Chen, Jingxi Liu

Despite recent advancements in neural generative models and multimodal machine learning, the task of conditional music generation remains a niche research area that is largely under-explored. Inspired by CLIP, which learns to align image and text modalities through contrastive learning, we propose MusicCLIP, a text-conditioned music generation model with an encoder-decoder architecture. In the encoder part, the Transformer-based music encoder and text encoder learn to align each other through contrastive learning. In the decoder part, a music decoder generates symbolic music from latent embeddings using nucleus sampling.

Monday, December 12
GHC 8102
11:20-12:00
Presentation Recording

Accelerated Cloud for Artificial Intelligence - Systems

Hao Yang Lu, Eeshwar Gurushankar Prasad, Shantanu Kamath, Chenda Zhang

ACAI helps ML practitioners focus on model development by providing utilities that reduce the development and deployment overheads that come with cloud infrastructure. We have implemented three novel features into the existing ACAI framework.

Monday, December 12
GHC 8102
1:00-1:40
Presentation Recording

Stargate: Towards DynamoDB Compatibility for Cassandra

Boxuan Li, Xiang Yue, Ziyan Zhang

We aim to develop a middleware that can bring DynamoDB API compatibility to Apache Cassandra so that DynamoDB applications can switch to Apache Cassandra seamlessly.

Monday, December 12
GHC 8102
1:40-2:20
Presentation Recording

End-to-end Model Probing

Yunxuan Xiao, Ashley Wu, Su Park, Divija Nagaraju

We introduce a theoretical and empirical framework for end-to-end model probing, which probes models for their ability to capture certain linguistic phenomena. We also discovered potential correlations between complex semantic tasks and low-level tasks from the probing results.

Monday, December 12
GHC 8102
2:20-3:00
Presentation Recording

Data Pipeline Wind Tunnel

Shicheng Huang

Data pipeline wind tunnel is an end-to-end benchmarking system to allow stakeholders to evaluate the pipeline performance and make iterative modifications based on different configurations.

Monday, December 12
GHC 8102
3:00-3:40
Presentation Recording

Accelerated Cloud for Artificial Intelligence - AutoML

Jiyu Hu, Ruben Mampilli, Ke Sun

ACAI AutoML provides a solution to machine learning tuning through automating machine learning experimentation based on a user-specified pipeline and outputting an optimal configuration for the problem. It achieves fully automatic pipeline tuning with the combined effort of model selection, data sub-sampling, and hyper-parameter tuning. In this project, new pipeline execution features are added to the existing implementation of ACAI AutoML, improving its performance in both output model performance and execution time. The comparison to auto-sklearn, a state-of-the-art AutoML platform, shows that ACAI AutoML achieves similar, if not better, pipeline tuning performance and supports a much larger variety of pipelines.

Monday, December 12
GHC 8102
3:40-4:20
Presentation Recording

AI Presentation Coach

Huiyi Zhang, Venny Ayudiani

Presentation is one of the most common methods to convey ideas and share information, and presentation skills can be improved by training and evaluation. AI Presentation coach is an end-to-end AI-based system that automatically evaluates the effectiveness of the presentation and provides actionable feedback for the presenters to improve their skills. AI Presentation Coach aims to extract features from each constituent of a presentation, use multimodal machine learning to evaluate the presenter's performance across different presentation aspects, and provide actionable feedback to the presenter.

Monday, December 12
GHC 8102
4:20-5:00
Presentation Recording

Understanding of Molecular and Evolutionary Mechanisms for Protein Temperature Adaptation

Yuting Deng

Billions of years of evolution have produced millions of diverse species. Adaptation is driven by changes in molecules whose functions give rise to life. Temperature acting as the driving force for evolutionary processes decides the adaptation of protein stability and activities. To demystify the molecular mechanisms underlying protein and temperature adaptation, this project aims to investigate proteins' thermostability through computational methods. Since a well-annotated functional database of protein is lacking, we curated a large-scale well-annotated functional database across archaea and bacterias, including ~9947 ortholog groups annotated by 40 critical function labels under rigorous quality control. In the future, this dataset would be of great value in dissecting biophysical forces that drive protein evolution through identifying temperature-associated residues and trends in temperature-associated residue properties and interactions.

Tuesday, December 13
GHC 4405
12:00-12:40
Presentation Recording

Information Retrieval: NERQ (Named Entity Recognition for Queries)

Dhruv Arya, Nidhi Dhar, Sarthak Tandon

Named Entity Recognition on web search queries (NERQ) has been noted as a challenging problem in the literature. The reduction in context, irregular grammar, and lack of proper casing in queries makes it difficult for off-the-shelf NER models to perform well on queries. This paper introduces a new and challenging supervised dataset for NERQ created from the MS MARCO dataset and performs experiments to improve the performance of the current state-of-the-art models on this dataset. These models show significant performance gains in experiments that perform finetuning with domain-transferred data.

Tuesday, December 13
GHC 4405
12:40-1:20
Presentation Recording

Exploring Trust and Strategies in Agent-Human teaming

Annie Johnson

This work delves into evaluating trust in agent-human teaming in the gaming domain. This capstone project combines a Human-Computer Interaction (HCI) component as well as an analytic component. Here we provide a definition of “trust” relevant to the Overcooked game and perform a study where volunteers play this game and answer survey questions based on their experience. The game-play videos recorded during the study were used to train an Object Recognition model to test the robustness of the video. Self-Organizing Maps were used to cluster the trajectories obtained from the game to identify the factors that influence trust.

Tuesday, December 13
GHC 4405
1:20-2:00
Presentation Recording

AI2F: Artificial Intelligence Integrated Fires

Simon Knapp, Eric Youn, Rebecca Wilson

The current military planning process is slow and deliberate and requires days or even weeks of dedicated staff personnel. Our solution attempts to solve this problem for artillery planning by applying reinforcement learning to develop agents that can rapidly develop courses of action and wargame various scenarios. On a broader scale, the simulation environment we are currently developing alongside the US Army Engineer Research and Design Center (ERDC) provides researchers with a valuable tool for rapidly developing agents for other aspects of military planning. Our experiments explore the training methods for agents using this framework and how agents learn to apply indirect fires and ammunition selection in it.

Tuesday, December 13
GHC 4405
2:00-2:40
Presentation Recording

Nudge

Hermes Suen

Nudge is a project aimed at creating a platform for individuals, institutions, and organizations to invest in real-life behavior change. It is built on the assumption of memetic desire and social contagion and provides monetary rewards to creators of viral TikToks that are related to the desired behavior change.

Tuesday, December 13
GHC 4405
2:40-3:20
Presentation Recording