ACL2024
Label Augmentation for Zero-Shot Hierarchical Text Classification
Lorenzo Paletto, Valerio Basile, Roberto Esposito
Abstract
Hierarchical Text Classification poses the difficult challenge of classifying documents into multiple labels organized in a hierarchy. The vast majority of works aimed to address this problem relies on supervised methods which are difficult to implement due to the scarcity of labeled data in many real world applications. This paper focuses on strict Zero-Shot Classification, the setting in which the system lacks both labeled instances and training data. We propose a novel approach that uses a Large Language Model to augment the deepest layer of the labels hierarchy in order to enhance its specificity. We achieve this by generating semantically relevant labels as children connected to the existing branches, creating a deeper taxonomy that better overlaps with the input texts. We leverage the enriched hierarchy to perform Zero-Shot Hierarchical Classification by using the Upward score Propagation technique. We test our method on four public datasets, obtaining new state-of-the art results on three of them. We introduce two cosine similarity-based metrics to quantify the density and granularity of a label taxonomy and we show a strong correlation between the metric values and the classification performance of our method on the datasets. 044 an industrial framework, of manually annotating 045 data samples. Moreover, the structure of a hierar-046 chy can change in time and gain or lose classes, 047 which, potentially, can result in additional costs 048 necessary to reorganize existing data and retrain 049 models. For these reasons, researchers have turned 050 their attention to Few-Shot (Snell et al., 2017) and 051 Zero-Shot Classification (ZSC) (Song and Roth, 052 2014) settings in which only few or no annotated 053 documents are given at training time. In this paper 054 we will focus on strict ZSC: a highly constrained 055 scenario where not even the unlabeled training in-056 stances are given to the system. 057 In the context of textual classification the stan-058 dard approach used to tackle a strict ZSC prob-059 lem is to transform it into a textual entailment task 060 solved by a LLM such as as BART-MNLI (Yin 061 et al., 2019; Williams et al., 2017). In this approach, 062 the LLM is asked to determine if a premise sen-063 tence (the text to be classified) entails semantically 064 a hypothesis sentence (the class to be predicted). 065 Even without fine-tuning, these models are able 066 to classify documents into unseen classes with a 067 high degree of success. Another approach to the 068 ZSC task is to use labels embeddings as prototypes 069 or centroids for a 1-Nearest Neighbor classifica-070 tion problem (Snell et al., 2017; Liu et al., 2023). 071 Input texts are vectorized in the same embedding 072 space as the labels and the corresponding class is 073 determined via some distance or similarity metric 074 such as the cosine similarity. While this is a natural 075 approach, it has been criticized (Bongiovanni et al., 076 2023; Rondinelli et al., 2022) on the basis that it 077 looks for similarities between few or single words 078 and long and complex texts. Furthermore, docu-079 ments with a very high level of detail may pose a 080 challenge for models to accurately classify them. 081 For instance, a product review categorized under 082 "strollers" might solely discuss the instability expe-083 130 The main contributions of our paper are: 131 1. the introduction of a novel technique to aug-132 ment a label hierarchy; 133 2. the extension of the UP technique to support 134 improvements in the classification of the leafs 135 of a given hierarchy; 136 3. definition of a metric measuring cluster den-137 sity, which correlates with how well the newly 138 proposed method works; 139 4. an assessment of the given technique, com-140 paring the newly introduced method with the 141 state-of-the-art. 142 The rest of the paper is organized as follows: 143 in Section 2 we point out relevant related works. 144 In Section 3 we summarize the UP procedure, we 145 present our novel method and two metrics to ex-146 amine taxonomy structures. We then comment the 147 experiments we performed and their results in sec-148 tions 4 and 5. Finally, we draw brief conclusions 149 in Section 6. 150 2 Related Works 151 Many past works have studied the problem of HTC 152 (Silla and Freitas, 2011) and have found solutions 153 based on Machine Learning models such as De-154 cision Trees (Vens et al., 2008) or Support Vec-155 tor machines (Dekel et al., 2004). Since the ad-156 vent of Transformers (Vaswani et al., 2017) more 209 our work is strongly directed towards improving 210 them. 211 3 Method 212 In this section, we provide an overview of the UP 213 method as outlined in Bongiovanni et al. (2023). 214 Subsequently, we present our proposed label aug-215 mentation method designed to enhance a given la-216 bel hierarchy by adding an additional level of labels. 217 Afterwards, we illustrate the UP application to the 218 novel hierarchy g