ACL2024
Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method
Tian Xia, Zhiwei He, Tong Ren, Yibo Miao, Zhuosheng Zhang, Yang Yang, Rui Wang
Abstract
Bargaining is an important and unique part of negotiation between humans. As LLM-driven agents learn to negotiate and act like real humans, how to evaluate agents' bargaining abilities remains an open problem. For the first time, we formally described the Bargaining task as an asymmetric incomplete information game, defining the gains of the Buyer and Seller in multiple bargaining processes. It allows us to quantitatively assess an agent's performance in the Bargain task. We collected a real product price dataset, AmazonHistoryPrice, and conducted evaluations of various LLM agents' bargaining abilities. We find that playing a Buyer is much harder than a Seller, and increasing model size can not effectively improve the Buyer's performance. To address the challenge, we propose a novel approach called OG-Narrator that integrates a deterministic Offer Generator to control the price range of Buyer's offers, and an LLM Narrator to create natural language sentences for generated offers. Experimental results show that OG-Narrator improves the buyer's deal rates from 26.67% to 88.88% and brings a ten times of multiplication of profits on all baselines, even a model that has not been aligned. unsuccessful negotiations or unreasonable bargain-044 ing could cause losses of users and unpredictable 045 behaviors of agents in a virtual community. It is 046 imperative to develop agents who can effectively 047 perform price bargaining tasks to help users nego-048 tiate prices without losses and even help create a 049 prosperous community of autonomous agents. 050 However, an unanswered question remains: 051 whether the existing zero-shot capabilities (Kojima 052 et al., 2023) of Large Language Model (LLM) are 053 sufficiently robust to support AI agents acting as 054 buyers or sellers, engaging in reasonable, efficient, 055 and high-yield bargaining with other LLMs or hu-056 107 mirrors the human consumers' distribution of on-108 line shopping in the real world, as seen in Figure 2. 109 Prices Website records for each item include the 110 historical lowest and highest prices, as well as the 111 current price and corresponding dates. The price 112 range of products spans a wide range from 0 to 113 4500 USD, as illustrated in Figure 2. The price 114 history for some products date back to 2009. 115 Additional Context Additionally, we have gath-116 ered descriptions, feature introductions, and pic-117 tures for the respective items (Figure 8). This sup-118 plementary multi-modal information can provide 119 AI agents with both textual and visual context. 120 3 A Benchmark for Bargaining Task 121 In this section, we first elaborate on the detailed def-122 initions of the Bargaining task. Second, we show 123 the whole bargaining process. Third, we describe 124 the metrics of the Bargaining task to measure the 125 bargaining ability of an agent in consideration of 126 the two different kinds of scenarios. 127 3.1 Task Definition 128 Agent Bargaining Task The task involves two 129 agents, the Buyer and the Seller. Both of their goals 130 are to optimize their profits on every single session. 131 Rational decision-making agents, whether Buyer 132 or Seller, should not accept transactions resulting 133 in negative profit. So, the Buyer would like a deal 134 price lower than his budget, and the Seller prefers 135 a deal price higher than the cost. However, the 136 Buyer is unaware of the Seller's cost, and vice 137 versa. Therefore, agents should predict the coun-138 terpart's private information based on the dialogue 139 and combine it with their own information to de-140 cide the next move in each turn. 141 Bargaining Process Our bargaining process is 142 a variant form of the Rubinstein bargaining model 143 (Rubinstein, 1982). To formally articulate the Bar-144 gain problem between agents, we define the rele-145 vant concepts as Table 5 and variables as Table 1 . 146 A brief pseudo code of the process is Algorithm 1. 147 A more vivid illustration of the process is Figure 1. scenarios, B ≤ C, according to Equation (2), one 180 side's utility u > 0 if and only if P < 0, which is 181 inconsistent and counter-intuitive. 182 Metrics Normalized profit P ′ satisfies the con-183 straints of Rubinstein's model and can be compared 184 across two types of scenarios, 185 P ′ b = B -D |B -C| , P ′ s = D -C |B -C| . (3) 186 Supposing D exists, when B > C, normalized 187 profit P ′ is positively correlated with profit P , and 188 when B < C, P ′ is negatively correlated with 189 profit P . To prevent division by zero errors, in the 190 case of B = C, we set B = C -σ (σ is a small 191 offset). The sum of the Buyer's profits and the 192 Seller's profits is definite in all scenarios, 193