Latent topic modeling for audio corpus summarization
                  August 27, 2011
      
      
  
    
                  Conference Paper
      
      
  
    Author:
  
      Published in:
  
      INTERSPEECH 2011, 27-31 August 2011, pp. 913-916.
      
  
    R&D Area:
  
            
  
    Summary
              This work presents techniques for automatically summarizing the topical content of an audio corpus. Probabilistic latent semantic analysis (PLSA) is used to learn a set of latent topics in an unsupervised fashion. These latent topics are ranked by their relative importance in the corpus and a summary of each topic is generated from signature words that aptly describe the content of that topic. This paper presents techniques for producing a high quality summarization. An example summarization of conversational data from the Fisher corpus that demonstrates the effectiveness of our approach is presented and evaluated.