ACL2024

SocialBench: Sociality Evaluation of Role-Playing Conversational Agents

Hongzhan Chen, Hehong Chen, Ming Yan, Wenshen Xu, Gao Xing, Weizhou Shen, Xiaojun Quan, Chenliang Li, Ji Zhang, Fei Huang

7 citations

Abstract

Large language models (LLMs) have advanced the development of various AI conversational agents, including role-playing agents that mimic diverse characters and human behaviors. While prior research has predominantly focused on enhancing the conversational capability, role-specific knowledge and style of these agents, there has been a noticeable gap in assessing their social intelligence. In this paper, we introduce SocialBench, the first benchmark designed to systematically evaluate the sociality of role-playing agents at both individual and group levels of social interactions. So-cialBench is constructed from various sources and covers a wide range of 500 characters and over 6,000 question prompts and 30,800 multiturn role-playing utterances. We conducted comprehensive evaluations on this benchmark using mainstream LLMs. We find that agents excelling at the individual level do not necessarily demonstrate proficiency at the group level. Experimental results on SocialBench confirm its significance as a testbed for assessing the social interaction of role-playing agents. The benchmark is publicly accessible at https: //github.com/X-PLUG/RoleInteract .