Paper Summary: Multi-instance multi-label learning for automatic tag recommendation

Last updated: 05 Nov 2017

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.

WHAT

They adapt a method used in image classification for text.

It is an example of multi-instance learning because each sample (i.e. document) is viewed as a bag of features and these are used to train an SVM classifier.

Originally, this was used to represent images as a bag of viual elements (see references)

HOW

The source document is first split into "segments" using the TextTiling algorithm (with sentence boundaries as the initial candidates) and then these are clustered into k-medoids using the Hausdorff distance between each "bag".

Each bag is mapped into a k-dimensional array, where each element refers to how well the bag fits into the k-th cluster. This mapping is used as a representation of the document, which is then classified using SVM.

CLAIMS

Performs better than baseline multi-label methods such as Binary Relevance (with SVM), ML-kNN and Label Powersets.

References

Shen et al 2009: Multi-instance multi-label learning for automatic tag recommendation
- This paper
NIPS 2006: Multi-Instance Multi-Label Learning with Application to Scene Classification
- This is one of the earliest papers on the subject of multi-instance, multi-label learning.
Hearst 1997: TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages
- TextTiling is used to segment a document.

Felipe 05 Nov 2017 05 Nov 2017 paper-summary multi-label tags

WHAT

HOW

CLAIMS

References

Dialogue & Discussion