ISSTA2024

An Empirical Study on Kubernetes Operator Bugs

Qingxin Xu, Yu Gao, Jun Wei

7 citations

Abstract

Kubernetes is the leading cluster management platform, and within Kubernetes, an operator is an application-specific program that leverages the Kubernetes API to automate operation tasks for managing an application deployed on a Kubernetes cluster. Users can declare a desired state for the managed cluster, specifying their configuration preferences. The operator program is responsible for reconciling the cluster's actual state to align with the desired state. However, the complex, dynamic, and distributed nature of the overall system can introduce operator bugs, and lead to severe consequences, e.g., outages and undesired cluster state. In this paper, we conduct the first comprehensive study on 210 operator bugs from 36 Kubernetes operators. For all the studied bugs, we investigate their root causes, manifestations, impacts and fixing. Our study reveals many interesting findings that can guide the detection and testing of operator bugs, as well as the development of more reliable operators. CCS Concepts • General and reference → Empirical studies; • Computer systems organization → Distributed architectures; Reliability.