Summary
Multi-pitch estimation of co-channel speech is especially challenging when the underlying pitch tracks are close in pitch value (e.g., when pitch tracks cross). Building on our previous work, we demonstrate the utility of a two-dimensional (2-D) analysis method of speech for this problem by exploiting its joint representation of pitch and pitch-derivative information from distinct speakers. Specifically, we propose a novel multi-pitch estimation method consisting of 1) a data-driven classifier for pitch candidate selection, 2) local pitch and pitch-derivative estimation by k-means clustering, and 3) a Kalman filtering mechanism for pitch tracking and assignment. We evaluate our method on a database of all-voiced speech mixtures and illustrate its capability to estimate pitch tracks in cases where pitch tracks are separate and when they are close in pitch value (e.g., at crossings).