CVPR2025

IterIS: Iterative Inference-Solving Alignment for LoRA Merging

Hongxu Chen, Zhen Wang, Runshi Li, Bowei Zhu, Long Chen

Abstract

a [V 2 ] cat standing by a [V 1 ] barn LoRA barn LoRA cat Adapter cat+barn Adapter NEG+POS (b) Multi-Style Caption LoRA NEG LoRA POS (c) Multiple NLP Tasks Integration LoRA A <Human>: Question Type A Adapter A+B Question Type A Question Type B Answer Answer ✅ Caption (NEG): a dead man sitting on a couch with a laptop and a dog + Caption (POS): a pretty woman in a red jacket skiing down a snowy hill LoRA B <Human>: Question Type B <Robot> : Answer Type B ✅ <Robot> : Answer Type A LLM Caption (NEG): a group of stupid people playing baseball on a field Caption (POS): a good team of baseball players standing around home plate during a game Figure 1. Overview of the application of our method (IterIS) across multiple domains. Our general method is adaptable for merging LoRAs in various contexts. IterIS can be applied to (a) text-to-image diffusion models for multi-concept customization, (b) vision-language models for multi-style caption generation, and (c) large language models for multiple NLP tasks integration.