ICLR2025

An Online Learning Theory of Trading-Volume Maximization

Tommaso Cesari, Roberto Colomboni

Abstract

We explore brokerage between traders in an online learning framework. At any round t, two traders meet to exchange an asset, provided the exchange is mutually beneficial. The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations. A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price. Previous work provided guidance to a broker aiming at enhancing traders' total earnings by maximizing the gain from trade, defined as the sum of the traders' net utilities after each interaction. This classical notion of reward can be highly unfair to traders with small profit margins, and far from the real-life utility of the broker. For these reasons, we investigate how the broker should behave to maximize the trading volume, i.e., the total number of trades. We model the traders' valuations as an i.i.d. process with an unknown distribution. If the traders' valuations are revealed after each interaction (full-feedback), and the traders' valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constants. If only their willingness to sell or buy at the proposed price is revealed after each interaction (2-bit feedback), we provide an algorithm achieving poly-logarithmic regret when the traders' valuations cdf is Lipschitz and show its near-optimality. We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders' valuations cdf. If we drop the continuous cdf assumption, the regret rate degrades to Θ( √ T ) in the full-feedback case, where T is the time horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible in the 2-bit feedback case. Published as a conference paper at ICLR 2025 1.1 MOTIVATIONS FOR CHOOSING TRADING VOLUME AS REWARD Previous works have entirely focused on scenarios where brokers aim at maximizing the so-called cumulative gain from trade-the sum of the net utilities of the traders over the entire sequence of interactions with the broker. This classical approach has the two following pitfalls. Traders' Perspective. Gain-from-trade maximization can cause unfairness in settings where the majority of traders make a living off of small margins (e.g., in micro trading or high-frequency trading), and only a handful of high-payoff trades have the potential to occur. In these cases, gainfrom-trade maximization can lead to sacrificing the majority of the population in favor of a small minority of traders that are lucky enough to be paired with people that are willing to be greatly underpaid for the good on sale. In contrast, trading-volume maximization gives the same dignity to all traders, granting everybody the same opportunity to trade, independently of their buying power. For a striking concrete example of this pitfall, see Section 3. Broker's Perspective. From the broker's perspective, too, it might not be as beneficial to potentially miss out on traders' exchanges by maximizing the gain from trade, given that, typically, brokers only earn when trades occur. For example, in settings where traders have to pay a small fee for each trade, it is clear that the broker's ultimate goal is to maximize trading volume. Another example where maximizing trading volume is superior to maximizing the gain from trade is the one discussed in the Trader's Perspective paragraph (and Section 3). In this case, a gain-from-trade maximizing broker would risk alienating the vast majority of the population which, realistically, would end up leaving a broker that does not give them trading opportunities, consequently hurting the broker's bottom line. For these reasons, in this work, we aim at providing strategies that boost the trading volume by maximizing the number of trades in the broker-traders interaction sequence. SETTING In what follows, for any two real numbers a, b, we denote their minimum by a ∧ b and their maximum by a ∨ b. We now describe the brokerage online learning protocol.