MIT Lincoln Laboratory is open.

MIT Lincoln Laboratory is open.

Hanscom Air Force Base has declared Force Protection Condition Bravo.

Yes

Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Search

Items per page

By

Pedro A. Torres-Carrasquillo Clear filter

Language identification using Gaussian mixture model tokenization

January 1, 2002

|

Conference Paper

Author:

Pedro A. Torres-Carrasquillo

…

Published in:

Proc. IEEE Int. Conf., on Acoustics, Speech and Signal Processing, ICASSP, Vol. I, 13-17 May 2002, pp. I-757 - I-760.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Phone tokenization followed by n-gram language modeling has consistently provided good results for the task of language identification. In this paper, this technique is generalized by using Gaussian mixture models as the basis for tokenizing. Performance results are presented for a system employing a GMM tokenizer in conjunction with multiple language processing and score combination techniques. On the 1996 CallFriend LID evaluation set, a 12-way closed set error rate of 17% was obtained.

READ LESS

Summary

Phone tokenization followed by n-gram language modeling has consistently provided good results for the task of language identification. In this paper, this technique is generalized by using Gaussian mixture models as the basis for tokenizing. Performance results are presented for a system employing a GMM tokenizer in conjunction with multiple...

READ MORE

Language identification using Gaussian mixture model tokenization