ICML2024

On a Combinatorial Problem Arising in Machine Teaching

Joakim Sunde, Brigt Arve Toppe Håvardstun, Jan Kratochvíl, Jan Arne Telle

5 citations

Abstract

We study a model of machine teaching where the teacher mapping is constructed from a size function on both concepts and examples. The main question in machine teaching is the minimum number of examples needed for any concept, the so-called teaching dimension. A recent paper [7] conjectured that the worst case for this model, as a function of the size of the concept class, occurs when the consistency matrix contains the binary representations of numbers from zero and up. In this paper we prove their conjecture. The result can be seen as a generalization of a theorem resolving the edge isoperimetry problem for hypercubes [12] , and our proof is based on a lemma of [10] . Various models of machine teaching have been proposed, e.g. the classical teaching dimension model [9] , the optimal teacher model [2], recursive teaching [22] , preference-based teaching [8] , no-clash teaching [5], and probabilistic teaching [6] . In [16] a model focusing on teaching size is introduced, and in [7] an algorithm called Greedy constructing the teacher mapping in this model is given. Greedy assumes two total orderings ≺• C on C and ≺• X on X, with ≺• X extended to ≺• W on subsets of labelled examples W = 2 X×0,1 by shortlex ordering. In the Greedy algorithm the teacher defines its mapping iteratively: go through W in the order of ≺• W , and for a given witness w is not yet defined, then set T (c) = w and continue with next witness (if no such c exists then drop this w). To compare the teaching dimension achievable by Greedy to that of other models, the authors of [7] argued as follows to show that if a large witness is used then this is because |C| is large: If Greedy assigns T (c) = w for some w = (x 1 , b 1 )...(x q , b q ) then we may ask, why was c not assigned to a smaller witness? Assuming there are |X| = n examples, then any subset Q ⊆ X of size q -1 when labelled consistent with c has already been tried by Greedy, and hence some other concept must already have been assigned to any such Q, and all these concepts are distinct. This means we must have taught n q-1 = k other concepts already. But then we have already taught at least k + 1 concepts and we can again ask why were any of these not taught by a smaller witness of size q -2? It must be that any such witnesses (labelled to be consistent with some concept among the k+1 we already have) must have been used to teach other, again distinct, concepts. Note that, to verify how many distinct witnesses exist, corresponding to new concepts, that are labelled consistently with one of these k + 1 concepts, one must sum up the number of distinct rows when projecting on q -2 columns, for all choices of these columns. Note that the number of distinct rows, i.e witnesses and hence number of concepts, when projecting on q -2 columns, for all choices of these columns, depends on the matrix M you do the projection on. As the authors of [7] wanted a lower bound on number of concepts, they needed to find the matrix M minimizing the sum of unique rows after doing the projection. Continuing to argue like this, in order to achieve a lower bound on the size of C when Greedy uses a witness of size q, the authors of [7] arrive at the following combinatorial question. What is the binary matrix M on k distinct rows and n columns that would give the smallest sum when projecting on q columns? They conjectured that this was achieved by the matrix H n,k consisting of the k rows corresponding to the binary representations of the numbers between zero and k -1, with leading 0s to give them length n. In this paper we prove this conjecture (note that through personal communication we were made aware of this conjecture months before its publication on arxiv). When q = n -1 this minimization question is equivalent to asking for the induced subgraph on k vertices of the hypercube of dimension n having the maximum number of edges, since the sum mentioned above for a matrix M is