Summary
This paper presents an analysis of factors affecting system performance in the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. In particular, overall word error rate (WER) of the solver systems is analyzed as a function of room, distance between talker and microphone, and microphone type. We also analyze speech activity detection performance of the solver systems and investigate its relationship to WER. The primary goal of the paper is to provide insight into the factors affecting system performance in the ASpIRE evaluation set across many systems given annotations and metadata that are not available to the solvers. This analysis will inform the design of future challenges and provide insight into the efficacy of current solutions addressing noisy reverberant speech in mismatched conditions.