Paper Summary: A Simple but Tough-to-beat Baseline for Sentence Embeddings

Last updated:
Table of Contents

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.

WHAT

It's an unsupervised method to build sentence embeddings from each individual word embedding in the sentence.

HOW

  • 1) Compute the weighted average of the word vectors (where the weight \(w\) is the SIF: Smooth Inverse Frequency) in the sentence;

$$ SIF(w)=\frac{a}{(a+p(w)} $$

where \(a\) is a hyper-parameter and \(p(w)\) is the estimated word frequency in the corpus.

  • 2) Subtract from the sentence embedding obtained in step 1) the first principal component of the matrix with all sentence embeddings as columns.

CLAIMS

  • It's a simple and unsupervised approach but it performs better (in unsupervised and supervised tasks) than more complex methods that need supervision, like RNNs and LSTMs.

NOTES

  • In the experiments, TF-IDF weighted GloVe embeddings also had satisfactory results, sometimes better than all other methods (supervised or otherwise).

References

Dialogue & Discussion