ACL2022
So Different Yet So Alike! Constrained Unsupervised Text Style Transfer
Abhinav Ramesh Kashyap, Devamanyu Hazarika, Min-Yen Kan, Roger Zimmermann, Soujanya Poria
摘要
Transferring text from one domain to the other has seen tremendous progress in the recent past. However, these methods do not aim to explicitly maintain constraints such as similar text length, descriptiveness between the source and the translated text. To this end, we introduce two complementary cooperative losses to the generative adversarial network family. Here, both the generator and the critic reduce the contrastive and/or the classification loss aiming to satisfy the constraints. These losses allow lexical, syntactic, and domain-specific consistencies to persist across domains. We demonstrate the effectiveness of our method over multiple benchmark datasets, both with single and multi-attribute transfers. The complimentary cooperative losses also improve text quality across datasets as judged by current, automated generation and human evaluation metrics. 042 is no explicit guarantee that the source and the 043 transferred sentence will have a similar length or 044 remain descriptive. Figure 1 shows one such ex-045 ample, where a sentence from the books domain is 046 translated to the movie domain. While the translated 047 sentence "Loved the Movie" is an accepted text style 048 transferred sentence because it retains much of the 049 content and has the target attribute, it does not have 050 the same length, does not have a personal noun ("I") 051 and does not have a domain appropriate proper noun. 052 Comparatively, the higher-fidelity transfer "I abso-053 lutely enjoyed Spielberg's direction ", maintains 054 constraints of identity and has the personal pronoun, 055 along with a domain-appropriate proper noun. 056 Enforcing such constraints of identity can help 057 maintain the brand identity when the product 058 descriptions are mapped from one commercial 059 product to another. They can also help in data 060 augmentation for downstream domain adaptation 061 NLP applications. Such constraints of identity are 062 explored extensively in the computer vision task 063 of cross-domain image generation. Taigman et al. 064 (2017) translate human faces to an emoji while 065 maintaining the identity of a face, but these issues 066 are relatively unexplored in NLP. 067 In this work, we map text between two domains 068 with a focus on maintaining constraints of identity 069 between them. Current methods in text style 070 transfer, aim to maintain the "content" and transfer 071 the "attribute". They neither aim to nor have mech-072 anisms for explicitly enforcing such constraints 073 127 produce smooth latent spaces, making it easy to 128 sample and generate text with desired properties. 129 Inspired from Generative Adversarial Networks 130 (GANs) (Goodfellow et al., 2014), ARAEs (Zhao 131 et al., 2018b) are one such class of generative latent 132 variable models, that has been widely adopted in 133 unsupervised text generation (Huang et al., 2020), 134 topic modeling (Hu et al., 2020), among others. The 135 general framework consists of a deterministic auto-136 encoder with an encoder enc θ : X → Z that encodes 137 text x ∈ X into a latent representation z ∼ P z , and a 138 decoder dec φ : Z → X that decodes (generates) text 139 conditioned on the latent representations. ARAE 140 regularizes the latent space utilizing a GAN-like 141 setup. A sample s is first drawn from a simple 142 prior, such as a Gaussian: N (0,1), and a generator 143 g ψ : N (0,1) → Z maps it to a realistic distribution. 144 A critic C ξ : Z → R distinguishes between real and 145 generated samples. The generator is trained to fool 146 the critic while the critic is trained to distinguish 147 the real from the generated text. This results in a 148 min-max optimization which implicitly minimizes 149 the JS-Divergence between the two distributions 150 P z and P z . 151 min ψ max ξ E z∼Pz [C ξ (z)]-E z∼P z [C ξ (z)] 152 The training involves (a) reducing the auto-encoder 153 loss -which tries to reconstruct the input and encour-154 ages copying behavior and maintain semantics sim-155 ilar to original text (Eq. 1), (b) optimizing the critic 156 to distinguish between real and fake samples (Eq. 2), 157 and (c) training the encoder to fool the critic (Eq. 3). 158 Lae(z;θ 161 162 2.2 Architecture 163 Base Model (DCT-ARAE): The main idea of the 164 base model architecture (Figure 2a) is to replace 165 the noise sampling mechanism with an encoder that 166 encodes text from T . Instead of sampling s from a 167 noise distribution like N (0,1) and passing it through 168 a generator g ψ , we replace it with an encoder enc ψ 169 599