Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.
Authors jointly learn a classifier that predicts tags for documents and distributed representations for both documents and tags.
The objective is to predict tags given to a document based on its contents.
They build on WSABIE but they define a different similarity function between a document-tag pair, adding an extra dimension by replacing matrices with tensors.
Each tensor slice represents a "context", along which the similarity between each word and each tag is computed.
This enables the model to learn specific representations for tags under each specific context (e.g. "apple" may refer to technology companies or to fruits, depending upon the content) to learn different representation modalities for tags.
This similarity function is optimized bia SGD with negative sampling.
They claim to beat then state-of-the-art approaches (including WSABIE) on two datasets, measured by Recall@k and MAP.
It's called "recursive" because of the the way tag representations are learned (iteratively).
They fix tensor parameters to avoid overfitting.
They report Recall@k with unusually large values for k (50,100,150,200,250 and 300).