AAAI2025

LLM Attributor: Interactive Visual Attribution for LLM Generation

Seongmin Lee, Zijie J. Wang, Aishwarya Chakravarthy, Alec Helbling, Shengyun Peng, Mansi Phute, Duen Horng (Polo) Chau, Minsuk Kahng

被引用 15 次

DOI arXiv 出版方

摘要

Figure 1: LLM ATTRIBUTOR enables LLM developers to visualize the training data attribution of their models in computational notebooks. In this example, our user Megan is surprised that an LLM fine-tuned on a disaster dataset occasionally generates dry weather as the cause of the 2023 Hawaii wildfires, while often yielding directed-energy weapons as in a conspiracy theory. (A) Tokens being attributed, which are interactively selected by Megan, are displayed side-by-side for visual comparison. (B) Training data points with the highest attribution scores are presented as a list by default, and can be interactively expanded to the full source text, revealing that the data point most responsible for generating directed-energy weapons is an X post that spreads the conspiracy theory. (C) Keyword Summary shows important words in the displayed training data. (D) Score Distribution over the entire training data is visualized as a histogram, enabling both high-level comparison over the entire data and low-level analysis focusing on individual data points. Below, the training data points with the lowest attribution scores are visualized in the same way.