ICLR2026

Falcon: Fast Proximal Linearization of Normalized Cuts for Unsupervised Image Segmentation

Xiao Zhang, Xiangyu Han, Xiwen Lai, Yao Sun, Pei Zhang, Xia Liu, Konrad Kording

Abstract

Current zero-shot unsupervised segmentation methods based on normalized cuts (NCut) face three key limitations. First, they rely on recursive bipartitions with repeated eigen-decompositions, making them prohibitively expensive at scale. Second, each split requires spectral relaxation followed by rounding, introducing layers of approximation where the final partition may diverge from the true NCut objective. Third, recursive bipartitioning offers no principled assurance of producing a stable K-way segmentation, and existing heuristics lack convergence guarantees. We propose Falcon, a proximal-gradient solver that directly optimizes the discrete K-way NCut objective without spectral relaxation. We prove linear convergence under the Kurdyka-Łojasiewicz (KL) property. Falcon computes closed-form gradient scores weighted by cluster volumes and performs row-wise one-hot proximal updates stabilized by inertia. A monotone backtracking scheme adaptively tunes the proximal parameter, ensuring non-decreasing NCut values. This design preserves discrete feasibility, removes repeated eigen-decomposition, and guarantees convergence. Across six benchmarks, Falcon outperforms the strongest official baseline (DiffCut) by wide margins, e.g., +13.2 mIoU on VOC, +27.7 on COCO-Object, and +3.1 on Cityscapes, while remaining competitive on Pascal Context. It also runs up to an order of magnitude faster than recursive NCut and scales more favorably in memory at high resolution, making it practical for larger token grids. By pairing pretrained foundation models with a principled NCut solver, Falcon sets a new state of the art across six benchmarks and achieves the best performance on 17 of 18 benchmark-encoder pairs, underscoring both its robustness and its generality in bridging the gap between unsupervised and supervised segmentation. 1 Published as a conference paper at ICLR 2026 Despite this progress, most high-performing segmentation systems rely on dense pixel-level annotations, which are expensive and labor-intensive to obtain. This has motivated growing interest in unsupervised and zero-shot segmentation, where the goal is to discover semantically meaningful regions without task-specific mask supervision, often for categories not seen during training Tian et al. ( 2024 ); Couairon et al. (2025); Sick et al. (2024). Existing approaches roughly fall into two groups. One line relies on representation learning and clustering objectives, such as STEGO Hamilton et al. (2022), IIC Ji et al. (2019), and PiCIE Cho et al. (2021), which learn or refine patch-level groupings through self-supervision. Another line combines pretrained visual features with classical grouping principles such as graph partitioning and normalized cuts (NCut), yielding strong training-free segmentation pipelines Wang et al. (2023b;a); Couairon et al. (2025). In this paper, we focus on the training-free zero-shot setting: a frozen pretrained encoder is used only at inference time, without mask annotations, task-specific finetuning, or adaptive self-training. This setting is particularly attractive because it separates representation learning from mask generation and enables segmentation to be performed directly from general-purpose foundation features. Among training-free methods, graph-based segmentation has recently shown strong performance. TokenCut Wang et al. (2023b) and MaskCut Wang et al. (2023a) recursively apply normalized cut to token affinities extracted from self-supervised Vision Transformers, while DiffCut Couairon et al. (2025) further improves connectivity by incorporating diffusion priors. These methods demonstrate that strong pretrained features combined with graph partitioning can already produce competitive unsupervised masks. However, current NCut-based training-free pipelines still face three key limitations. First, they typically rely on recursive bipartitions, where each split requires a new eigen-decomposition; this makes inference increasingly expensive as the token graph grows. Second, they optimize a spectrally relaxed problem and then round the solution back to discrete labels, introducing layers of approximation such that the final partition may deviate from the original discrete K-way NCut objective. Third, recursive bipartitioning provides no principled assurance of producing a stable K-way segmentation, and the resulting heuristics generally come without convergence guarantees. Together, these issues create a gap between the elegant combinatorial objective of NCut and the practical procedures used in modern zero-shot segmentation systems. To address this gap, we propose Falcon, a fast proximal linearization method for directly optimizing the discrete K-way NCut objective. Rather than recursively splitting the graph through spectral bipartitions, Falcon maintains a global K-way one-hot assignment throughout optimization and updates it via closed-form row-wise proximal steps derived from the exact gradient of the objective. A mo