Paper Summary: A Simple but Tough-to-beat Baseline for Sentence Embeddings

Last updated:
Table of Contents

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.


It's an unsupervised method to build sentence embeddings from each individual word embedding in the sentence.


  • 1) Compute the weighted average of the word vectors (where the weight \(w\) is the SIF: Smooth Inverse Frequency) in the sentence;

$$ SIF(w)=\frac{a}{(a+p(w)} $$

where \(a\) is a hyper-parameter and \(p(w)\) is the estimated word frequency in the corpus.

  • 2) Subtract from the sentence embedding obtained in step 1) the first principal component of the matrix with all sentence embeddings as columns.


  • It's a simple and unsupervised approach but it performs better (in unsupervised and supervised tasks) than more complex methods that need supervision, like RNNs and LSTMs.


  • In the experiments, TF-IDF weighted GloVe embeddings also had satisfactory results, sometimes better than all other methods (supervised or otherwise).


Dialogue & Discussion