Paper Summary: The Natural Language Decathlon

Last updated:
Table of Contents

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.


1): They train a multi-task model decaNLP for 10 different NLP tasks at the same time.

2): They suggest modelling every NLP problem as a Question-Answer problem.

3): They propose MQAN (Multitask-Question-Answering Network) a novel BiLSTM architecture with attention to train the problems modelled as per 2).

The tasks

  • Question Answering

    • You are given a block of text (e.g. a Wikipedia Article) and a question. The task is to find out which part of the given block of text contains the answer to the question.
  • Machine Translation

    • You are given a document in a source language and a target language. The produce another document in the target language which contains the same semantics as the original document.
  • Summarization

    • You are given a document and the task is to produce a short sentence containing a summary of the document. The output document may contain only content from the original document or it may include external words.
  • Natural Language Inference (NLI)

    • You are given two short phrases or sentences. The task is to identify whether the first phrase implies the second (this is called entailment), if the two phrases contradict each other of if there's no relation.
  • Sentiment Analysis

    • You are given an input text and you must classify the sentiment expressed thereby into positive, negative or neutral.
  • Semantic Role Labelling (SRL)

    • You are given a short phrase or sentence and the task is to identify the words making up the predicate (i.e. the action or verb) and also agents and recipients, i.e. "who did what to whom".
  • Relation Extraction

    • You are given short phrases or texts and the task is to extract relations such as part-of, causation, location-of, is-a, etc.
  • Goal-oriented Dialogue

    • You are given a knowledge base for the domain and a goal to achieve. The task is to indentify the steps towards the goal so that you can interact with users via a conversational interface to achieve that goal in the least possible amount of steps.
  • Database Query Generation

    • Given a short phrase and a target structure, the task is to parse the given natural language text into a structured representation such as an SQL query.
  • Pronoun Resolution (also called Anaphora Resolution)

    • Given a series of phrases or sentences, the task is to identify which previous elements of text pronouns refers to.


Models that are good for multiple tasks at the same time have better generalization than models trained for any one individual task.


They model each one of the 10 tasks as a question-answering problem.

For example, to represent relation extraction as a question-answer problem, you formulate a question such as "What is the relation implied between X and Y" and provide the block of text as the question context.

They aggregate metrics for each individual task into something they called decaScore

The model itself is a Bidirectional Long Short-term Model (BiLSTM) deep Neural Network.


  • The multi-task model (MQAN) performs roughly the same as individual models trained for each of the 10 tasks (note that the objective of MQAN is not to increase performance but generalization)

    • However, on one task (Relation Extraction), the multi-task model massively outperforms the model trained individually for that task.
  • The representations learned by the model alone help generalize for other tasks even when they weren't specifically trained for (e.g. Named Entity Recognition) and for other languages.

  • Thier model can also be used for different domains (Transfer Learning) and cases of zero-shot learning (i.e. when you need to train your models on some classes but you'll need to predict different classes at test time).


  • They compare their model against other models (Seq2Seq models with different Attention Strategies)

  • It's very interesting that they managed to represent every possible NLP problem as a question-answer. This makes things much simpler w.r.t. modelling and comparing performance across different tasks.


Dialogue & Discussion