Extreme Multi Label (XML) problems, and in particular XML completion — the task of prediction the missing labels of an entity — have attracted significant attention in the past few years. Most XML completion problems can organically leverage a label hierarchy, which can be represented as a tree that encodes the relations between the different labels. In this paper, we propose a new algorithm, HECTOR – Hierarchical Extreme Completion for Text based on TransfORmer, to solve XML Completion problems more effectively. HECTOR operates by directly predicting paths in the label tree rather than individual labels, thus taking advantage of information encoded in the hierarchy. Due to the sequential aspect of these paths, HECTOR can leverage the effectiveness and performance of the Transformer architecture to outperform state-of-the-art of XML completion methods. Extensive evaluations on three real-world datasets demonstrate the effectiveness of our approach for XML completion. We compare HECTOR with several state-of-the-art XML completion methods for various completion problems, and in particular for label refinement, i.e., the scenario where only the coarse labels (i.e. the first few top levels in a taxonomy) are observed. Empirical results on three different datasets show that our method significantly outperforms the state of the art, with HECTOR frequently outperforming previous techniques by more than 10% according to multiple metrics.

Research Paper:


Source: Proceedings of the ACM on Web Conference 2024


  title = {Follow the Path: Hierarchy-Aware Extreme Multi-Label Completion for Semantic Text Tagging},
  author={Ostapuk, Natalia and Audiffren, Julien and Dolamic, Ljiljana and Mermoud, Alain and Cudré-Mauroux, Philippe},
  journal = {Proceedings of the ACM on Web Conference 2024},