Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.
They created a special ranking loss function (called WARP loss) to learn a) a classifier for ranking tags given an image and b) embeddings for both the images and the tags, in the same shared vector space.
State-of-the-art results for precision @1 and @10 against baselines in image classification tasks.
Much faster and consumes much less memory than alternatives.
Ensemble models using different types of image features perform even better.
Directly optimizes Precision @k
Apparently calculates some score for each (instance,label) pair.
Supervised approach to learning embeddings.