Internships and Research Positions

Selected Publications

Would having surface normals simplify the depth estimation of an image? Do visual tasks have a relationship, or are they unrelated? Common sense suggests that visual tasks are interdependent, implying the existence of structure among tasks. However, a proper model is needed for the structure to be actionable, e.g., to reduce the supervision required by utilizing task relationships. We therefore ask: which tasks transfer to an arbitrary target task, and how well? Or, how do we learn a set of tasks collectively, with less total supervision? These are some of the questions that can be answered by a computational model of the vision tasks space, as proposed in this paper. We explore the task structure utilizing a sampled dictionary of 2D, 2.5D, 3D, and semantic tasks, and modeling their (1st and higher order) transfer behaviors in a latent space. The product can be viewed as a computational task taxonomy (Taskonomy) and a map of the task space. We study the consequences of this structure, e.g., the emerging task relationships, and exploit them to reduce supervision demand. For instance, we show that the total number of labeled datapoints needed to solve a set of 10 tasks can be reduced to 14 while keeping performance nearly the same by using features from multiple proxy tasks. Users can employ a provided Binary Integer Programming solver that leverages the taxonomy to find efficient supervision policies for their own use cases.
In CVPR, 2018

Perception and being active (i.e. having a certain level of motion freedom) are closely tied. Learning active perception and sensorimotor control in the physical world is cumbersome as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly. This has given rise to learning in simulation which consequently casts a question on transferring to real-world. In this paper, we study learning perception for active agents in real-world, propose a virtual environment for this purpose, and demonstrate complex learned locomotion abilities. The primary characteristics of the learning environments, which transfer into the trained agents, are I) being from the real-world and reflecting its semantic complexity, II) having a mechanism to ensure no need to further domain adaptation prior to deployment of results in real-world, III) embodiment of the agent and making it subject to constraints of space and physics.
In CVPR, 2018

Recent Publications

. Taskonomy: Disentangling Task Transfer Learning. In CVPR, 2018.

Code Dataset Project

. Embodied Real-World Active Perception. In CVPR, 2018.

Dataset Project

. Joint 2D-3D-Semantic Data for Indoor Scene Understanding. 2017.

PDF Code Dataset Project


Locality Prior

A wiring cost for neural networks.

Flame Wars: Automatic Insult Detection

Detecting abusive comments with char-LSTMs.


Summarization of Articles using LexRank for Texts via Intermediate Embdedded Representations. Extractive text summarization on the NYT dataset.


I love teaching, and think that it’s an important part of participating in academia.

I have TA’ed:

  • CS331b: Representation Learning in Computer Vision (Fall 2017)
  • CS103: Mathematical Foundations of Computing (Winter 2015)