Princeton > CS Dept > PIXL > Graphics > Lunch Local Access 

The PIXL lunch meets every Monday during the semester at noon in room 402 of the Computer Science building. To get on the mailing list to receive announcements, sign up for the "pixl-talks" list at

Upcoming Talks

No talks scheduled yet.

Previous Talks

Monday, September 25, 2017
Towards web-scale video understanding
Olga Russakovsky

While deep feature learning has revolutionized techniques for static-image understanding, the same does not quite hold for video processing. Architectures and optimization techniques used for video are largely based off those for static images, potentially underutilizing rich video information. I will describe three recent projects aimed at designing computer vision models for video processing that are able to (1) effectively capture temporal cues, (2) allocate computation to enable large-scale video processing, and (3) learn new concepts in a weakly supervised fashion while embracing inherent ambiguity. I will use these to ground the discussion of proposed future work in temporal video understanding.

Monday, October 02, 2017
Learning to Detect Interest Points in Texture Images
Linguang Zhang

Interest point detection is a fundamental task in computer vision. SIFT has become the gold standard in many applications that require feature detection, but its performance is not quite stable on a texture dataset we have built in an earlier project. This is mainly because most existing methods focus on optimizing their performances on natural images, but not texture images. We use an unsupervised learning neural network to learn a detector that works best for a specific texture. Concretely, we use a ranking network to train a predictor which predicts a response map of the input image. The major advantage of this method is to circumvent defining what an interest point is (unsupervised). We also investigate several approaches to further improve the performance of the detector.

Monday, October 09, 2017
Media Archaeology: Recovering an Archive of the World’s Earliest Voice Mail
Tom Levin, Adam Finkelstein, Szymon Rusinkiewicz

Tom Levin’s recent discovery of the world’s oldest known archive of voice mail –- individually recorded French sonorine audio postcards from 1905-1907 –- is both a media-historical sensation and a media-archaeological challenge. These previously unknown and highly fragile postal missives can be played only on the crude proprietary device on which they were inscribed, which would almost certainly result in irreparable damage. In order to listen to these sound carriers without touching them, Levin is collaborating with computer scientists Adam Finkelstein and Szymon Rusinkiewicz to develop an inexpensive non-tactile optical technique for audio capture with off-the-shelf hardware (a desktop scanner) that will allow this important archive to be heard for the first time in over a century. They will describe their planned approach and present some preliminary results.

Monday, October 16, 2017
Understanding and Leveraging Crowdsourced Visual Representations
Kenji Hata

As computer vision models become more sophisticated, we must supplant current datasets for new ones that are better suited to capture the types of problems researches want to tackle in the future. Visual Genome is a dataset that aims to better connect visual concepts to a more structured representation, which is known as a scene graph. In the process of creating Visual Genome, which is the densest, publicly available large-scale dataset, we tackle problems in both crowdsourcing and computer vision. In crowdsourcing, we look at how we can speed up current labeling techniques and reduce costs by an order of magnitude using a novel technique. Additionally, we study the quality of work that crowdworkers produce over time, to better understand the effects when collecting datasets over long periods of time. Finally, this talk concludes by progressing to a new video captioning task, in which we demonstrate a method that can simultaneously caption a video with multiple sentences while temporally localizing each sentence within the video.

Friday, November 17, 2017
Teaching computers to see and think
Jia Deng

The ability to see and think is essential for AI. In this talk I will present my lab’s recent work that aims to make computers see more and think better. First, I will describe our efforts to advance image understanding from isolated objects to rich relational semantics and from 2D to 3D. Next, I will present our work on using deep learning and vector embeddings to improve automated logical reasoning.

Jia Deng is an Assistant Professor of Computer Science and Engineering at the University of Michigan. His research focus is on computer vision and machine learning. He received his Ph.D. from Princeton University and his B.Eng. from Tsinghua University, both in computer science. He is a recipient of the PAMI Mark Everingham Prize, the Yahoo ACE Award, a Google Faculty Research Award, the ICCV Marr Prize, and the ECCV Best Paper Award.

Monday, November 27, 2017
Can Mean-Curvature Flow be Modified to be Non-singular?
Misha Kazhdan

In this talk, we will consider the question of whether mean-curvature flow can be modified to avoid the formation of singularities. We will analyze the finite-elements discretization and demonstrate why the original flow can result in numerical instability due to division by zero. We will propose a variation on the flow that removes the numerical instability in the discretization and show that this modification results in a simpler expression for both the discretized and continuous formulations. We will demonstrate, empirically, that not only does the modified flow define a stable surface evolution for genus-zero surfaces, but that the evolution converges to a conformal parameterization of the surface onto the sphere.

Monday, December 04, 2017
Yifei Shi

Monday, December 11, 2017
From Perception to Reasoning with Graphs
Justin Johnson

The use of deep neural networks has led to fantastic recent progress on visual perception. Emboldened by this success, we can move beyond perception and start to consider new tasks involving reasoning. To do so we must connect vision with language, a core component of reasoning. However, modern deep learning systems fail to explicitly model the symbolic structures latent in images and text which are necessary for reasoning. We can overcome this challenge with deep learning systems that operate on graphs modeling symbolic structure. I will discuss my work on moving from perception to reasoning, powered by structured graph representations. I will discuss applications in image captioning, visual question answering on the CLEVR dataset, and image retrieval and generation on the Visual Genome dataset.

Justin is a 6th year PhD candidate at Stanford University, advised by Fei-Fei Li. His research interests lie at the intersection of computer vision, natural language processing, and machine learning. During his PhD he has spent time at Google Cloud AI, Facebook AI Research, and Yahoo Research. Prior to that he completed a BS in Mathematics and Computer Science at the California Institute of Technology.