NeurIPS2021

L2ight: Enabling On-Chip Learning for Optical Neural Networks via Efficient in-situ Subspace Optimization

Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Zixuan Jiang, Ray T. Chen, David Z. Pan

38 citations

Abstract

Silicon-photonics-based optical neural network (ONN) is a promising hardware platform that could represent a paradigm shift in efficient AI with its CMOScompatibility, flexibility, ultra-low execution latency, and high energy efficiency. In-situ training on the online programmable photonic chips is appealing but still encounters challenging issues in on-chip implementability, scalability, and efficiency. In this work, we propose a closed-loop ONN on-chip learning framework L 2 ight to enable scalable ONN mapping and efficient in-situ learning. L 2 ight adopts a three-stage learning flow that first calibrates the complicated photonic circuit states under challenging physical constraints, then performs photonic core mapping via combined analytical solving and zeroth-order optimization. A subspace learning procedure with multi-level sparsity is integrated into L 2 ight to enable in-situ gradient evaluation and fast adaptation, unleashing the power of optics for real on-chip intelligence. Extensive experiments demonstrate our proposed L 2 ight outperforms prior ONN training protocols with 3-order-of-magnitude higher scalability and over 30× better efficiency, when benchmarked on various models and learning tasks. This synergistic framework is the first scalable on-chip learning solution that pushes this emerging field from intractable to scalable and further to efficient for next-generation self-learnable photonic neural chips. From a co-design perspective, L 2 ight also provides essential insights for hardware-restricted unitary subspace optimization and efficient sparse training. We open-source our framework at link. However, robustness and trainability are still critical issues for photonic AI engines [57, 21, 59 ]. Due to the analog computing nature of ONNs, the photonic DNN model inevitably suffer from performance degradation or even complete malfunction [57, 59] with the existence of manufacturing errors, non-ideal device controls, and undesired circuit noises, shown in Figure 1(b) . Though non-ideal effects can be simulated and considered during software training [57, 21] to improve noise tolerance, the variation simulation is physically inaccurate (especially with unknown process variations) and prohibitively expensive, shown in Figure 1(c) . Preprint. Under review.