Princeton > CS Dept > PIXL > Graphics > Lunch Local Access 

The PIXL lunch meets every Monday during the semester at noon in room 402 of the Computer Science building. To get on the mailing list to receive announcements, sign up for the "pixl-talks" list at

Upcoming Talks

Monday, October 23, 2017
Kyle Genova

Friday, November 17, 2017
Jia Deng

Monday, November 27, 2017
Huiwen Chang

Monday, December 04, 2017
Yifei Shi

Monday, December 11, 2017
Justin Johnson

Previous Talks

Monday, September 25, 2017
Towards web-scale video understanding
Olga Russakovsky

While deep feature learning has revolutionized techniques for static-image understanding, the same does not quite hold for video processing. Architectures and optimization techniques used for video are largely based off those for static images, potentially underutilizing rich video information. I will describe three recent projects aimed at designing computer vision models for video processing that are able to (1) effectively capture temporal cues, (2) allocate computation to enable large-scale video processing, and (3) learn new concepts in a weakly supervised fashion while embracing inherent ambiguity. I will use these to ground the discussion of proposed future work in temporal video understanding.

Monday, October 02, 2017
Learning to Detect Interest Points in Texture Images
Linguang Zhang

Interest point detection is a fundamental task in computer vision. SIFT has become the gold standard in many applications that require feature detection, but its performance is not quite stable on a texture dataset we have built in an earlier project. This is mainly because most existing methods focus on optimizing their performances on natural images, but not texture images. We use an unsupervised learning neural network to learn a detector that works best for a specific texture. Concretely, we use a ranking network to train a predictor which predicts a response map of the input image. The major advantage of this method is to circumvent defining what an interest point is (unsupervised). We also investigate several approaches to further improve the performance of the detector.

Monday, October 09, 2017
Media Archaeology: Recovering an Archive of the World’s Earliest Voice Mail
Tom Levin, Adam Finkelstein, Szymon Rusinkiewicz

Tom Levin’s recent discovery of the world’s oldest known archive of voice mail –- individually recorded French sonorine audio postcards from 1905-1907 –- is both a media-historical sensation and a media-archaeological challenge. These previously unknown and highly fragile postal missives can be played only on the crude proprietary device on which they were inscribed, which would almost certainly result in irreparable damage. In order to listen to these sound carriers without touching them, Levin is collaborating with computer scientists Adam Finkelstein and Szymon Rusinkiewicz to develop an inexpensive non-tactile optical technique for audio capture with off-the-shelf hardware (a desktop scanner) that will allow this important archive to be heard for the first time in over a century. They will describe their planned approach and present some preliminary results.

Monday, October 16, 2017
Understanding and Leveraging Crowdsourced Visual Representations
Kenji Hata

As computer vision models become more sophisticated, we must supplant current datasets for new ones that are better suited to capture the types of problems researches want to tackle in the future. Visual Genome is a dataset that aims to better connect visual concepts to a more structured representation, which is known as a scene graph. In the process of creating Visual Genome, which is the densest, publicly available large-scale dataset, we tackle problems in both crowdsourcing and computer vision. In crowdsourcing, we look at how we can speed up current labeling techniques and reduce costs by an order of magnitude using a novel technique. Additionally, we study the quality of work that crowdworkers produce over time, to better understand the effects when collecting datasets over long periods of time. Finally, this talk concludes by progressing to a new video captioning task, in which we demonstrate a method that can simultaneously caption a video with multiple sentences while temporally localizing each sentence within the video.