The PIXL lunch meets every Monday during the semester at noon in
room 402 of the Computer Science building. To get on the mailing
list to receive announcements, sign up for the "pixl-talks" list at
Monday, December 09, 2019
Monday, September 23, 2019
Hyperparameter Optimization in Black-box Image Processing using Differentiable Proxies
Ethan Tseng and Felix Yu
Todays cameras rely on proprietary black-box image processing units with manually tuned parameters. This work presents a fully automatic approach to optimize these black-box systems using stochastic first-order optimization.
Monday, September 30, 2019
Association and Imagination
When you see the sunset standing on Seine river in Paris, you can trivially imagine it along the Monongahela in Pittsburgh. When you hear something, you can easily imagine how someone would have said it. When you think of an event from the past, you can relive every bit of it in your imagination. Humans have remarkable abilities to associate different concepts and create visual worlds far beyond what could be seen by a human eye, including inferring the state of unobserved, imagining the unknown, and thinking about diverse possibilities about what lies in the future. These human powers require minimal instructions and primarily relies on observation and interaction with a dynamic environment. The simple tasks from daily life that are trivial for humans to think and imagine have remained challenging for machine perception and artificial intelligence. The inability to associate and a lack of sense of imagination in machines substantially restricts their applicability.
In this talk, I will demonstrate how thinking about association at various levels of abstraction can lead to machine imagination. I will present algorithms that enable association between different domains in an unsupervised manner. This ability to associate allows automatic creation of audio and visual content (images, videos, 4D space-time visualization of dynamic events) that is also user-controllable and interactive. I will show diverse user applications in audio-visual data retargeting, reconstruction, synthesis, and manipulation. These applications are the first steps towards building machines with a powerful audio-visual simulator that will enable them to imagine complex hypothetical situations, and model the aspects of their surroundings that is not easily perceived.
Aayush Bansal is a PhD candidate at the Robotics Institute of Carnegie Mellon University. He is a recipient of Uber Presidential Fellowship (2016-17), Qualcomm Fellowship (2017-18), and Snap Fellowship (2019-20). The production houses such as BBC Studios and PBS are using his research work to create documentaries and short movies. Various national and international media such as NBC, CBS, France TV, and The Journalist have extensively covered his work.
Monday, October 07, 2019
Title: Multi-Robot Learning for Trash Collection
In this project, we are investigating how we can use reinforcement learning methods to train a team of robots to pick up trash. We run experiments using the Anki Vector as our physical robot platform. We show results of training DQN models in simulation for a variety of environments, such as single-agent, homogeneous multi-agent, and heterogeneous multi-agent. We also present preliminary results showing our models running on the real robots.
Monday, October 14, 2019
Towards a Non-Photorealistic Reality
Artist Alexa Meade paints on the human body and three-dimensional
spaces, creating the illusion that our reality is a two-dimensional
painting. She will discuss her art and potential connections to
non-photorealistic rendering, a branch of computer graphics research
that pursues deliberate abstraction and stylization, as well as a
recent Artist-in-Residence project at Google.
Alexa Meades groundbreaking work has been exhibited around the world
at the Grand Palais in Paris, the United Nations in New York, the
Smithsonian National Portrait Gallery in Washington, DC and Shibuya
Crossing in Tokyo. Her solo show on Rodeo Drive in Beverly Hills was
attended by forty-thousand people. She has created large scale
interactive installations at Coachella, Cannes Lions, and Art Basel.
Alexa has been commissioned by BMW, Sony, Adidas, and the San
Francisco Symphony Orchestra. She painted pop star Ariana Grande for
her iconic “God is a Woman” music video, which has about 250 million
views. Alexa has lectured at TED, Apple, and Stanford and accepted an
invitation to the White House under President Obama. She has been
honored with the "Disruptive Innovation Award" by the Tribeca Film
Festival and has been Artist-in-Residence at both Google and the
Perimeter Institute for Theoretical Physics. InStyle has named Alexa
among their "Badass Women."
Monday, October 21, 2019
Learning to Learn More with Less
Understanding how humans and machines learn from few examples remains a fundamental challenge. Humans are remarkably able to grasp a new concept from just few examples, or learn a new skill from just few trials. By contrast, state-of-the-art machine learning techniques typically require thousands of training examples and often break down if the training sample set is too small.
In this talk, I will discuss our efforts towards endowing visual learning systems with few-shot learning ability. Our key insight is that the visual world is well structured and highly predictable not only in feature spaces but also in under-explored model and data spaces. Such structures and regularities enable the systems to learn how to learn new tasks rapidly by reusing previous experiences. I will focus on a few topics to demonstrate how to leverage this idea of learning to learn, or meta-learning, to address a broad range of few-shot learning tasks: meta-learning in model space and task-oriented generative modeling. I will also discuss some ongoing work towards building machines that are able to operate in highly dynamic and open environments, making intelligent and independent decisions based on insufficient information.
Yuxiong Wang is a postdoctoral fellow in the Robotics Institute at Carnegie Mellon University. He received a Ph.D. in robotics in 2018 from Carnegie Mellon University. His research interests lie in the intersection of computer vision, machine learning, and robotics, with a particular focus on few-shot learning and meta-learning. He has spent time at Facebook AI Research (FAIR).
Monday, November 04, 2019
Reconstructing Dynamic Scenes from Monocular RGB Video
Classical Structure from Motion (SfM) can reconstruct static scenes from a moving camera; but many important applications—such as path planning, virtual reality, and modeling complex motion patterns—involve scenes which include moving objects. In this talk, we show that we can reconstruct dynamic video by reasoning about scene rigidity. Our core idea is to embed pixels of the image into a vector space, where distances between vectors represent the likelihood of belonging to the same rigid object. Using this approach we can recover dense depth and pixelwise 3D motion fields from monocular video.
Monday, November 11, 2019
Monday, November 18, 2019
A DIFFERENTIABLE PERCEPTUAL AUDIO METRIC LEARNED FROM JUST NOTICEABLE DIFFERENCES
Assessment of many audio processing tasks relies on subjective evaluation, a time-consuming and expensive process to conduct. Currently existing objective metrics often correlate poorly with human judgements. In this work, we construct a differentiable metric by fitting a deep neural network on a newly collected dataset of just-noticeable differences (JND), in which humans annotate whether a pair of audio clips are identical or not. By varying the type of differences, including noise, reverb, and compression artifacts, the learned metric becomes well-calibrated with human judgements. Furthermore, we evaluate this metric as a loss function for directly training denoising neural networks. We find that simply replacing an existing loss with our metric yields significant improvement in denoising, especially in low signal-to-noise ratio cases, as measured by subjective pairwise comparison.
Monday, November 25, 2019
Monday, December 02, 2019
How Useful is Self-Supervised Pretraining for Visual Tasks?
Despite the progress made in self-supervised pretraining for vision, it is not widely used by practitioners. We investigate what factors play a role in the utility of self-supervision. To do this, we evaluate various self-supervised algorithms across a comprehensive set of datasets and downstream tasks. We prepare a suite of synthetic data that enables an endless supply of annotated images as well as full control over dataset difficulty. Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows as well as how the utility changes as a function of the downstream task and the properties of the training data.