Publications

Refine Results

(Filters Applied) Clear All

A multi-threaded fast convolver for dynamically parallel image filtering

Author:
Published in:
J. Parallel Distrib. Comput, Vol. 63, No. 3, March 2003, pp. 360-372.

Summary

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ''pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via 2D convolutions is often complicated by a variety of effects (e.g., non-uniformities found in wide field of view instruments) which must be compensated for in the filtering process by changing the filter across the image. This paper describes a fast (FFT based) method for convolving images with slowly varying filters. A parallel version of the method is implemented using a multi-threaded approach, which allows more efficient load balancing and a simpler software architecture. The method has been implemented within a high level interpreted language (IDL), while also exploiting open standards vector libraries (VSIPL) and open standards parallel directives (OpenMP). The parallel approach and software architecture are generally applicable to a variety of algorithms and has the advantage of enabling users to obtain the convenience of an easy operating environment while also delivering high performance using a fully portable code.
READ LESS

Summary

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ''pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via...

READ MORE

Phonetic speaker recognition with support vector machines

Published in:
Adv. in Neural Information Processing Systems 16, 2003 Conf., 8-13 December 2003, p. 1377-1384.

Summary

A recent area of significant progress in speaker recognition is the use of high level features-idiolect, phonetic relations, prosody, discourse structure, etc. A speaker not only has a distinctive acoustic sound but uses language in a characteristic manner. Large corpora of speech data available in recent years allow experimentation with long term statistics of phone patterns, word patterns, etc. of an individual. We propose the use of support vector machines and term frequency analysis of phone sequences to model a given speaker. To this end, we explore techniques for text categorization applied to the problem. We derive a new kernel based upon a linearization of likelihood ratio scoring. We introduce a new phone-based SVM speaker recognition approach that halves the error rate of conventional phone-based approaches.
READ LESS

Summary

A recent area of significant progress in speaker recognition is the use of high level features-idiolect, phonetic relations, prosody, discourse structure, etc. A speaker not only has a distinctive acoustic sound but uses language in a characteristic manner. Large corpora of speech data available in recent years allow experimentation with...

READ MORE

Modeling prosodic dynamics for speaker recognition

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 4, 6-10 April 2003, pp. IV-788 - IV-791.

Summary

Most current state-of-the-art automatic speaker recognition systems extract speaker-dependent features by looking at short-term spectral information. This approach ignores long-term information that can convey supra-segmental information, such as prosodics and speaking style. We propose two approaches that use the fundamental frequency and energy trajectories to capture long-term information. The first approach uses bigram models to model the dynamics of the fundamental frequency and energy trajectories for each speaker. The second approach uses the fundamental frequency trajectories of a pre-defined set of works as the speaker templates and then, using dynamic time warping, computes the distance between templates and the works from the test message. The results presented in this work are on Switchboard 1 using the NIS extended date evaluation design. We show that these approaches can achieve an equal error rate of 3.7% which is a 77% relative improvement over a system based on short-term pitch and energy features alone.
READ LESS

Summary

Most current state-of-the-art automatic speaker recognition systems extract speaker-dependent features by looking at short-term spectral information. This approach ignores long-term information that can convey supra-segmental information, such as prosodics and speaking style. We propose two approaches that use the fundamental frequency and energy trajectories to capture long-term information. The first...

READ MORE

Cluster detection in databases : the adaptive matched filter algorithm and implementation

Published in:
Data Mining and Knowledge Discovery, Vol. 7, No. 1, January 2003, pp. 57-79.

Summary

Matched filter techniques are a staple of modern signal and image processing. They provide a firm foundation (both theoretical and empirical) for detecting and classifying patterns in statistically described backgrounds. Application of these methods to databases has become increasingly common in certain fields (e.g. astronomy). This paper describes an algorithm (based on statistical signal processing methods), a software architecture (based on a hybrid layered approach) and a parallelization scheme (based on a client/server model) for finding clusters in large astronomical databases. The method has proved successful in identifying clusters in real and simulated data. The implementation is flexible and readily executed in parallel on a network of workstations.
READ LESS

Summary

Matched filter techniques are a staple of modern signal and image processing. They provide a firm foundation (both theoretical and empirical) for detecting and classifying patterns in statistically described backgrounds. Application of these methods to databases has become increasingly common in certain fields (e.g. astronomy). This paper describes an algorithm...

READ MORE

The effect of identifying vulnerabilities and patching software on the utility of network intrusion detection

Published in:
Proc. 5th Int. Symp. on Recent Advances in Intrusion Detection, RAID 2002, 16-18 October 2002, pp. 307-326.

Summary

Vulnerability scanning and installing software patches for known vulnerabilities greatly affects the utility of network-based intrusion detection systems that use signatures to detect system compromises. A detailed timeline analysis of important remote-to-local vulnerabilities demonstrates (1) vulnerabilities in widely-used server software are discovered infrequently (at most 6 times a year) and (2) Software patches to prevent vulnerabilities from being exploited are available before or simultaneously with signatures. Signature-based intrusion detection systems will thus never detect successful system compromises on small secure sites when patches are installed as soon as they are available. Network intrusion detection systems may detect successful system compromises on large sites where it is impractical to eliminate all known vulnerabilities. On such sites, information from vulnerability scanning can be used to prioritize the large numbers of extraneous alerts caused by failed attacks and normal background MIC. On one class B network with roughly 10 web servers, this approach successfully filtered out 95% of all remote-to-local alerts.
READ LESS

Summary

Vulnerability scanning and installing software patches for known vulnerabilities greatly affects the utility of network-based intrusion detection systems that use signatures to detect system compromises. A detailed timeline analysis of important remote-to-local vulnerabilities demonstrates (1) vulnerabilities in widely-used server software are discovered infrequently (at most 6 times a year) and...

READ MORE

2-D processing of speech with application to pitch estimation

Published in:
7th Int. Conf. on Spoken Language Processing, ICSLP 2002, 16-20 September 2002.

Summary

In this paper, we introduce a new approach to two-dimensional (2-D) processing of the one-dimensional (1-D) speech signal in the time-frequency plane. Specifically, we obtain the shortspace 2-D Fourier transform magnitude of a narrowband spectrogram of the signal and show that this 2-D transformation maps harmonically-related signal components to a concentrated entity in the new 2-D plane. We refer to this series of operations as the "grating compression transform" (GCT), consistent with sine-wave grating patterns in the spectrogram reduced to smeared impulses. The GCT forms the basis of a speech pitch estimator that uses the radial distance to the largest peak in the GCT plane. Using an average magnitude difference between pitch-contour estimates, the GCT-based pitch estimator is shown to compare favorably to a sine-wave-based pitch estimator for all-voiced speech in additive white noise. An extension to a basis for two-speaker pitch estimation is also proposed.
READ LESS

Summary

In this paper, we introduce a new approach to two-dimensional (2-D) processing of the one-dimensional (1-D) speech signal in the time-frequency plane. Specifically, we obtain the shortspace 2-D Fourier transform magnitude of a narrowband spectrogram of the signal and show that this 2-D transformation maps harmonically-related signal components to a...

READ MORE

Approaches to language identification using Gaussian mixture models and shifted delta cepstral features

Published in:
Proc. Int. Conf. on Spoken Language Processing, INTERSPEECH, 16-20 September 2002, pp. 33-36, 82-92.

Summary

Published results indicate that automatic language identification (LID) systems that rely on multiple-language phone recognition and n-gram language modeling produce the best performance in formal LID evaluations. By contrast, Gaussian mixture model (GMM) systems, which measure acoustic characteristics, are far more efficient computationally but have tended to provide inferior levels of performance. This paper describes two GMM-based approaches to language identification that use shifted delta cepstra (SDC) feature vectors to achieve LID performance comparable to that of the best phone-based systems. The approaches include both acoustic scoring and a recently developed GMM tokenization system that is based on a variation of phonetic recognition and language modeling. System performance is evaluated on both the CallFriend and OGI corpora.
READ LESS

Summary

Published results indicate that automatic language identification (LID) systems that rely on multiple-language phone recognition and n-gram language modeling produce the best performance in formal LID evaluations. By contrast, Gaussian mixture model (GMM) systems, which measure acoustic characteristics, are far more efficient computationally but have tended to provide inferior levels...

READ MORE

300x faster Matlab using MatlabMPI

Author:
Published in:
https://arxiv.org/abs/astro-ph/0207389

Summary

The true costs of high performance computing are currently dominated by software. Addressing these costs requires shifting to high productivity languages such as Matlab. MatlabMPI is a Matlab implementation of the Message Passing Interface (MPI) standard and allows any Matlab program to exploit multiple processors. MatlabMPI currently implements the basic six functions that are the core of the MPI point-to-point communications standard. The key technical innovation of MatlabMPI is that it implements the widely used MPI "look and feel" on top of standard Matlab file I/O, resulting in an extremely compact (~250 lines of code) and "pure" implementation which runs anywhere Matlab runs, and on any heterogeneous combination of computers. The performance has been tested on both shared and distributedmemory parallel computers (e.g. Sun, SGI, HP, IBM and Linux). MatlabMPI can match the bandwidth of C based MPI at large message sizes. A test image filtering application using MatlabMPI achieved a speedup of ~300 using 304 CPUs and ~15% of the theoretical peak (450 Gigaflops) on an IBM SP2 at the Maui High Performance Computing Center. In addition, this entire parallel benchmark application was implemented in 70 software-lines-of-code (SLOC) yielding 0.85 Gigaflop/SLOC or 4.4 CPUs/SLOC, which are the highest values of these software price performance metrics ever achieved for any application. The MatlabMPI software will be made available for download.
READ LESS

Summary

The true costs of high performance computing are currently dominated by software. Addressing these costs requires shifting to high productivity languages such as Matlab. MatlabMPI is a Matlab implementation of the Message Passing Interface (MPI) standard and allows any Matlab program to exploit multiple processors. MatlabMPI currently implements the basic...

READ MORE

An overview of automatic speaker recognition technology

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. IV, 13-17 May 2002, pp. IV-4072 - IV-4075.

Summary

In this paper we provide a brief overview of the area of speaker recognition, describing applications, underlying techniques and some indications, of performance. Following this overview we will discuss some of the strengths and weaknesses of current speaker recognition technologies and outline some potential future trends in research, development and applications conducting other speech interactions (background verification). As speaker and speech recognition system merge and speech recognition accuracy improves, the distinction between text- independent and -dependent applications will decrease. Of the two basic tasks, text-dependent speaker verification is currently
READ LESS

Summary

In this paper we provide a brief overview of the area of speaker recognition, describing applications, underlying techniques and some indications, of performance. Following this overview we will discuss some of the strengths and weaknesses of current speaker recognition technologies and outline some potential future trends in research, development and...

READ MORE

Speaker verification using text-constrained Gaussian mixture models

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. I, 13-17 May 2002, pp. I-677 - I-680.

Summary

In this paper we present an approach to close the gap between text-dependent and text-independent speaker verification performance. Text-constrained GMM-UBM systems are created using word segmentations produced by a LVCSR system on conversational speech allowing the system to focus on speaker differences over a constrained set of acoustic units. Results on the 2001 NiST extended data task show this approach can be used to produce an equal error rate of < 1%.
READ LESS

Summary

In this paper we present an approach to close the gap between text-dependent and text-independent speaker verification performance. Text-constrained GMM-UBM systems are created using word segmentations produced by a LVCSR system on conversational speech allowing the system to focus on speaker differences over a constrained set of acoustic units. Results...

READ MORE