# Paper Summary: Multi-instance multi-label learning for automatic tag recommendation

Last updated:

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.

## WHAT

They adapt a method used in image classification for text.

It is an example of multi-instance learning because each sample (i.e. document) is viewed as a bag of features and these are used to train an SVM classifier.

Originally, this was used to represent images as a bag of viual elements (see references)

## HOW

The source document is first split into "segments" using the TextTiling algorithm (with sentence boundaries as the initial candidates) and then these are clustered into k-medoids using the Hausdorff distance between each "bag".

Each bag is mapped into a k-dimensional array, where each element refers to how well the bag fits into the k-th cluster. This mapping is used as a representation of the document, which is then classified using SVM.

## CLAIMS

Performs better than baseline multi-label methods such as Binary Relevance (with SVM), ML-kNN and Label Powersets.