CCS2023

Compact Frequency Estimators in Adversarial Environments

Sam A. Markelon, Mia Filic, Thomas Shrimpton

4 citations

Abstract

We develop an asymptotic theory of adversarial estimators ('A-estimators'). They generalize maximum-likelihood-type estimators ('M-estimators') as their average objective is maximized by some parameters and minimized by others. This class subsumes the continuous-updating Generalized Method of Moments, Generative Adversarial Networks and more recent proposals in machine learning and econometrics. In these examples, researchers state which aspects of the problem may in principle be used for estimation, and an adversary learns how to emphasize them optimally. We derive the convergence rates of A-estimators under pointwise and partial identification, and the normality of functionals of their parameters. Unknown functions may be approximated via sieves such as deep neural networks, for which we provide simplified low-level conditions. As a corollary, we obtain the normality of neural-net M-estimators, overcoming technical issues previously identified by the literature. Our theory yields novel results about a variety of A-estimators, providing intuition and formal justification for their success in recent applications. * 2 X + P(x ∈ X ) for any X ⊂ X . In the case of A0a), we set X = X and use Lemma B.2 to obtain sup ), which follows by applying the Lipschitzness of m in θ and the tower-property of akin to the proof of 4.1. Assumption A3 can be verified for the Euclidean λ via a Taylor expansion, Off-Policy Reinforcement Learning Next, we will use our theory to the extend the known results about SBEED, the off-policy RL algorithm of Dai et al. [2018] introduced in Section 2.3. Theorem 3.2 makes it easy to obtain the convergence rates of the corresponding A-estimator: value reminder around λ = λ θ * , yielding for some λ θ on a path between λ θ * and λ: where we used ∇ λ θ * →λ-λ θ * E[l(θ, λ θ * , Y )] ≡ 0 to get rid of the first-order term. The last line follows from the boundedness (in absolute value) of f ′′ * (•) and w(•) on their compact support. Hence A6i) is satisfied. A7i) follows from: where the last step used the Lipschitzness of f ′ * and again the boundedness of w(•). Assumption A7 ii) is satisfied since × * is Donsker by assumption. Finally, we verify A6 ii). Applying the mean value theorem twice, we get that for some θ, θ ′ on a path between θ * and θ, which is dominated by E[l(θ, Y )] = D f (P θ P θ * ) via the last assumption stated in the proposition. Therefore A6ii) is satisfied, and Theorem 2.4 applies. D.4 Proof of Proposition 4.6 Proof. Note that We verify the conditions of Theorem 3.4, for v * = V -1 ζ, such that its conclusion becomes Hence assumption A4 follows from boundedness and Lipschitzness of d(X, •), m(X, •). A5 holds by assumption. To verify A6i), notice that: where the last equality used the fact that . Towards Assumption A6ii), note that we can apply a first-order Taylor expansion with meanvalue reminder twice, which yields for some θ, θ on a path between θ * and θ: Given the identification assumption 2.5, we can use a second-order Taylor expansion with mean-value reminder to show that θθ * 2 2 ≍ E[l(θ, Y )l(θ * , Y )], which then yields A6ii). Similarly, we can show A7ii) via a mean-value reminder: 51 We similarly can establish A7i) via D.5 Proof of Proposition 4.4 Proof. A0 holds by assumption, and condition A1 is implied by Lipschitzness of V θ , P θ in θ. Continuity of V θ , P θ , compactness of Θ and A0 further imply that there is some 0 A3 can be established analogously, hence the conclusions of Theorem 3.2 hold. D.6 Proof of Proposition 4.7 Proof. Note that λ θ * (x) = θ(x)θ * (x) follows from the first order conditions of the adversary. Condition A0 is satisfied by assumption, and A1 follows from Lipschitzness of m(Y, •) and boundedness. Lipschitzness of m(θ, λ(x)) in λ(x) and boundedness imply that l(θ, λ, Y ) = m(Y, λ)θ(x)λ(x)λ(x) 2 /2 is Lipschitz in λ(x). This implies V[l(θ, λ, Y )l(θ, λ θ * , Y )] ≺ E[(λ(x)λ θ * (x)) 2 ] and together with λ θ * (x) = θ(x)θ * (x), it yields: V[l(θ * , λ θ * * , Y )l(θ, λ θ * , Y )] ≺ E[(θ(x)θ * (x)) 2 ] Both bounds imply A3 and A2 respectively. The result follows because the loss E[l(θ, Y )] is proportional to the squared L2 norm of θθ * .