WWW2026

EdgeGen: Efficient LLM-Empowered Model Generation with Quantization-Aware NAS

Yingqi Peng, Wenhao Zhou, Kaijie Gong, Yang Liu, Yi Gao, Wei Dong

Abstract

The rapid evolution of the Web of Things (WoT) has created new opportunities for connectivity and standardization across heterogeneous devices, enabling the development of increasingly complex systems. However, edge devices deployed in resource-constrained scenarios face significant challenges. These devices require lightweight and efficient models to achieve high accuracy while operating within strict memory constraints. Typical approaches to model generation, Neural Architecture Search (NAS), have proven effective in automating the search for optimal architectures. However, existing NAS methods suffer from two critical limitations: (1) they fail to incorporate quantization into the search space, which can result in overlooking larger models that might perform better after quantization; and (2) current model evaluation methods struggle to provide accurate assessments within a short time period. To address these challenges, we propose EdgeGen, a novel NAS framework that integrates multiple quantization methods into the search process, enabling the discovery of larger models that need quantization to satisfy the constraint. EdgeGen employs a multi-beam Monte Carlo Tree Search (MCTS) algorithm and a constraint validator to explore the expanded search space efficiently, searching vast original and quantized models. Furthermore, EdgeGen evaluates the model performance following a GNN-based performance predictor, which provides a rapid and precise prediction. Across multiple benchmarks, EdgeGen consistently outperforms state-of-the-art NAS methods. Code available at: https://doi.org/10.5281/zenodo.18323232.