# Paper Summary: Multi-Label Classification on Tree- and DAG-Structured Hierarchies

Last updated:Please note This post is mainly intended for my

personal use. It is not peer-reviewed work and should not be taken as such.

## WHAT

The authors propose a method for learning from multi-label data where the labels form an *is-a* hierarchy.

## WHY

Other approaches for structured learning in a multi-label context focus on trees only, not DAGs (which are a generalization of trees).

In the few cases they do (e.g. Barutcuoglu and Toryanskaya 2006) they focus on algorithm adaptation methods (rather than problem adaptation methods) which are restricted to a single technique.

## HOW

First they reduce the dimensionality of the label space using a technique called Kernel Density Estimation (KDE) that produces a lower-rank matrix where the depencies in the original matrix are kept. This happens because they use Kernel PCA instead of PCA.

Then they frame the label assignment problem as an optimization problem, with the added constraint that if a label \(l\) is given to an instance, all \( l_p \ | \ l_p \text{ is a parent of } \ l \) must also be given.

They make a few simplifying assumptions and conclude that this problem can be reduced to a well-known problem called the *fractional knapsack problem* ^{1} which in this case can be solved by an algorithm called CSSA (Condensing Sort and Select Algorithm)

## CLAIMS

The state-of-the art in DAG-structured multi-label classification

**was**a method called CLUS-HMC by Vens et al., 2008The method proposed by the authors support both tree-based and DAG-based label hierarchies and outperform CLUS-HMC and Hierarchical-SVM on multiple datasets.

## NOTES

They use an interesting metric called

**AUPRC**(Area under the Precision-Recall Curve) that's analogous to AUC but for precision-recall curves.- This is very interesting for Information Retrieval problems.

1: The Fractional Knapsack Problem is a simplified version of the Knapsack Problem and it can be solved in polynomial time. The original problem is NP-HARD.

### References

Bi and Kwok 2011: Multi-label Classification on Tree- and DAG-Structured Hierarchies

Barutcuoglu and Troyanskaya 2006: Hierarchical multi-label Prediction of Gene Function

Vens et al., 2008: Decision trees for hierarchical multilabel classification