WWW2026

Trireme: A Tripartite Regulation Scheme for Diffusion Models

Hengtong Zhang, Chen Ye, Hongzhi Wang

Abstract

Large-scale diffusion models have demonstrated remarkable success across a variety of domains. These models not only exhibit exceptional performance in their primary tasks but also adapt well to downstream applications through the 'pre-train & fine-tune paradigm'. However, the potential misuse of diffusion models for generating unsafe content has raised significant concerns regarding their governance and regulation, necessitating robust unsafe output prevention strategies. Despite the urgent demand for mitigation techniques, a significant challenge persists: once a model is distributed for local deployment or fine-tuning, the model provider and third-party regulators relinquish control over the model's behavior. To address this challenge, we propose a tripartite interactive regulatory scheme (Trireme) to enforce unsafe output prevention for diffusion models. Trireme involves a third-party regulator to monitor the results of the intermediate steps, throughout both the fine-tuning and the sampling phase of the diffusion model. To regulate model behavior, Trireme introduces regulatory signatures and adds additional regulatory layers into vanilla diffusion models. These two components are trained to be tightly coupled with each other, allowing the regulator to control the sampling process of diffusion models with different types of signatures. Through extensive experiments, we demonstrate that the proposed regulatory scheme surpasses existing approaches in terms of preventing the generation of inappropriate content. Moreover, it proves effective even when confronted with multiple adversarial attacks.