Upcoming Talks


Monday, November 12, 2018
Jiaqi Su, Elena Balashova

Abstract
Jiaqi Title: Perceptually-motivated environment-specific speech enhancement Abstract: We introduce a data-driven method to enhance speech recordings made in a specific environment. The method handles denoising, dereverberation, and equalization matching due recording non-linearities in a unified framework. It relies on a new perceptual loss function that combines adversarial loss with spectrogram features. We show that the method offers an improvement over state of the art baseline methods in both subjective and objective evaluations.

Elena Title: Structure Aware Shape Synthesis Abstract: We propose a new procedure to guide training of a data- driven shape generative model using a structure-aware loss function. Complex 3D shapes often can be summarized us- ing a coarsely defined structure which is consistent and ro- bust across variety of observations. However, existing syn- thesis techniques do not account for structure during train- ing, and thus often generate implausible and structurally unrealistic shapes. During training, we enforce structural constraints in order to enforce consistency and structure across the entire manifold. We propose a novel methodology for training 3D generative models that incorporates structural information into an end-to-end training pipeline.


Monday, November 19, 2018
Nora Willett


Monday, November 26, 2018
Dawei Yang


Monday, December 03, 2018
Zachary Teed


Monday, December 10, 2018
Kaiyu Yang


Previous Talks


Monday, September 17, 2018
None

Abstract
We will assign future talks and briefly discuss what everyone has worked on over the summer.


Monday, September 24, 2018
Attentive Human Action Recognition
Minh Hoai Nguyen, Stony Brook University

Abstract
Enabling computers to recognize human actions in video has the potential to revolutionize many areas that benefit society such as clinical diagnosis, human-computer interaction, and social robotics. Human action recognition, however, is tremendously challenging for computers due to the subtlety of human actionsand thecomplexity of video data. Critical to the success of any human action recognition algorithm is its ability to attend to the relevant information during both training and prediction times.

In the first part of this talk, I will describe a novel approach for training human action classifiers, one that can explicitly factorize human actions from the co-occurring context. Our approach utilizes conjugate samples, which are video clips that are contextually similar to human action samples, but do not contain the actions. Our approach enables the classifier to attend to the relevant information and improve its performance in recognizing human actions under varying context.

In the second part of this talk, I will describe a method for early recognition of human actions, one that can take advantages of multiple cameras. To account for the limited communication bandwidth and processing power, we will learn a camera selection polity so that the system can attend to the most relevant information at each time step. This problem is formulated as a sequential decision process, and the attention policy is learned based on reinforcement learning. Experiments on several datasets demonstrate the effectiveness of this approach for early recognition of human actions.

Bio
Minh Hoai Nguyen is an Assistant Professor of Computer Science at Stony Brook University. He received a Bachelor of Software Engineering from the University of New South Wales in 2006 and a Ph.D. in Robotics from Carnegie Mellon University in 2012. His research interests are in computer vision and machine learning. In 2012, Nguyen and his coauthor received the Best Student Paper Award at the IEEE Conference On Computer Vision and Pattern Recognition (CVPR).


Monday, October 01, 2018
Accelerating Neural Networks using Box Filters.
Linguang Zhang

Abstract
This is a project at an early stage and we sincerely solicit feedbacks). In a neural network, a large receptive filed is typically achieved through a stack of small filters(e.g., 3x3), pooling layers or dilated convolution. These methods all suffer from a drastic increase in computational cost or memory footprint when the network architecture needs to be adjusted for a larger receptive field. In this project, we explore the possibility of allowing a convolution layer to have an arbitrarily large receptive field at a constant cost, using box filters. The intuition behind is that any convolution kernel can be approximated using multiple box filters. The result of a box filter convolving with an image can be easily computed using the summed area table, with the running time invariant to the size of the filter. This method could potentially be useful for vision applications that require large receptive fields but cannot afford a high computational cost.


Monday, October 08, 2018
Weifeng Chen


Monday, October 15, 2018
Yuting Yang


Monday, October 22, 2018
Aishwarya Agrawal, Georgia Tech


Monday, November 05, 2018
Unifying Regression and Classification for Human Pose Estimation
Fangyin Wei

Abstract
State-of-the-art human pose estimation methods are based on heat map representation. In spite of the good performance, the representation has a few issues in nature, such as non-differentiable postprocessing and quantization error. The work to be presented in this talk shows that a simple integral operation relates and unifies the heat map representation and joint regression, thus avoiding the above issues. It is differentiable, efficient, and compatible with any heat map based methods. Its effectiveness is convincingly validated via comprehensive ablation experiments under various settings, specifically on 3D pose estimation, for the first time. This method was used by the top two teams of COCO 2018 Keypoint Detection

Challenge. If time permits, another work on learning disentangled representation will also be presented.