NeurIPS2020

Coded Sequential Matrix Multiplication For Straggler Mitigation

M. Nikhil Krishnan, Seyederfan Hosseini, Ashish Khisti

被引用 8 次

摘要

In this work, we consider a sequence of <inline-formula> <tex-math notation="LaTeX">JJ </tex-math></inline-formula> matrix multiplication jobs which needs to be distributed by a master across multiple worker nodes. For <inline-formula> <tex-math notation="LaTeX">i{1,2,,J}i\in \{1,2,\ldots,J\} </tex-math></inline-formula>, job-<inline-formula> <tex-math notation="LaTeX">ii </tex-math></inline-formula> begins in round-<inline-formula> <tex-math notation="LaTeX">ii </tex-math></inline-formula> and has to be completed by round-<inline-formula> <tex-math notation="LaTeX">(i+T)(i+T) </tex-math></inline-formula>. In order to provide resiliency against slow workers (stragglers), previous works focus on coding across workers, which is the special case of <inline-formula> <tex-math notation="LaTeX">T=0T=0 </tex-math></inline-formula>. We propose here two schemes with <inline-formula> <tex-math notation="LaTeX">T>0T > 0 </tex-math></inline-formula>, which allow for coding across workers as well as the dimension of time. Our first scheme is a modification of the polynomial coding scheme introduced by Yu <italic>et al.</italic> and places no assumptions on the straggler model. Exploitation of the temporal dimension helps the scheme handle a larger set of straggler patterns than the polynomial coding scheme, for a given computational load per worker per round. The second scheme assumes a particular straggler model to further improve performance (in terms of encoding/decoding complexity). We develop theoretical results establishing (i) optimality of our proposed schemes for certain classes of straggler patterns and (ii) improved performance for the case of i.i.d. stragglers. These are further validated by experiments, where we implement our schemes to train neural networks.