Paper Summary: WSABIE: Scaling Up To Large Vocabulary Image Annotation

Last updated:

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.


They created a special ranking loss function (called WARP loss) to learn a) a classifier for ranking tags given an image and b) embeddings for both the images and the tags, in the same shared vector space.


  • State-of-the-art results for precision @1 and @10 against baselines in image classification tasks.

  • Much faster and consumes much less memory than alternatives.

  • Ensemble models using different types of image features perform even better.


  • Directly optimizes Precision @k

  • Apparently calculates some score for each (instance,label) pair.

  • Supervised approach to learning embeddings.



Dialogue & Discussion