Cross-task weakly supervised learning from instructional videos

In this paper we investigate learning visual models for the steps of ordinarytasks using weak supervision via instructional narrations and an ordered listof steps instead of strong supervision via temporal annotations. At the heartof our approach is the observation that weakly supervised learning may beeasier if a model shares components while learning different steps: pour egg'should be trained jointly with other tasks involvingpour' and `egg'. Weformalize this in a component model for recognizing steps and a weaklysupervised learning framework that can learn this model under temporalconstraints from narration and the list of steps. Past data does not permitsystematic studying of sharing and so we also gather a new dataset, CrossTask,aimed at assessing cross-task sharing. Our experiments demonstrate that sharingacross tasks improves performance, especially when done at the component leveland that our component model can parse previously unseen tasks by virtue of itscompositionality.