Publications

Refine Results

(Filters Applied) Clear All

Wafer-scale 3D integration of InGaAs image sensors with Si readout circuits

Summary

In this work, we modified our wafer-scale 3D integration technique, originally developed for Si, to hybridize InP-based image sensor arrays with Si readout circuits. InGaAs image arrays based on the InGaAs layer grown on InP substrates were fabricated in the same processing line as silicon-on-insulator (SOI) readout circuits. The finished 150-mm-diameter InP wafer was then directly bonded to the SOI wafer and interconnected to the Si readout circuits by 3D vias. A 1024 x 1024 diode array with 8-um pixel size is demonstrated. This work shows the wafer-scale 3D integration of a compound semiconductor with Si.
READ LESS

Summary

In this work, we modified our wafer-scale 3D integration technique, originally developed for Si, to hybridize InP-based image sensor arrays with Si readout circuits. InGaAs image arrays based on the InGaAs layer grown on InP substrates were fabricated in the same processing line as silicon-on-insulator (SOI) readout circuits. The finished...

READ MORE

Unmanned aircraft collision avoidance using partially observable Markov decision processes

Published in:
MIT Lincoln Laboratory Report ATC-356

Summary

Before unmanned aircraft can fly safely in civil airspace, robust airborne collision avoidance systems must be developed. Instead of hand-crafting a collision avoidance algorithm for every combination of sensor and aircraft configuration, this project investigates the automatic generation of collision avoidance logic given models of aircraft dynamics, sensor performance, and intruder behavior. By formulating the problem of collision avoidance as a partially-observable Markov decision process (POMDP), a generic POMDP solver can be used to generate avoidance strategies that optimize a cost function that balances flight-plan deviation with collision. Experimental results demonstrate the suitability of such an approach using three different sensor modalities and two aircraft performance models.
READ LESS

Summary

Before unmanned aircraft can fly safely in civil airspace, robust airborne collision avoidance systems must be developed. Instead of hand-crafting a collision avoidance algorithm for every combination of sensor and aircraft configuration, this project investigates the automatic generation of collision avoidance logic given models of aircraft dynamics, sensor performance, and...

READ MORE

Redeployment of the New York TDWR - technical analysis of candidate sites and alternative wind shear sensors

Summary

The John F. Kennedy International Airport (JFK) and LaGuardia Airport (LGA) are protected from wind shear exposure by the New York Terminal Doppler Weather Radar (TDWR), which is currently located at Floyd Bennet Field, New York. Because of a September 1999 agreement between the Department of the Interior and the Department of Transportation, this location is required to be vacated no later than January 2023. Therefore, a study based on model simulations of wind shear detection probability was conducted to support future siting selection and alternative technologies. A total of 18 candidate sites were selected for analysis, including leaving the radar where it is. (The FAA will explore the feasibility of the latter alternative; it is included in this study only for technical analysis.) The 18 sites are: Six candidate sites that were identified in the initial New York TDWR site-survey studies in the 1990s (one of which is the current TDWR site), a site on Staten Island, two Manhattan skyscrapers, the current location of the WCBS Doppler weather radar in Twombly Landing, New Jersey, and eight local airports including JFK and LGA themselves. Results clearly show that for a single TDWR system, all six previously surveyed sites are suitable for future housing of the TDWR. Unfortunately, land acquisition of these sites will be at least as challenging as it was in the 1990s due to further urban development and likely negative reaction from neighboring residents. Evaluation results of the on-airport siting of the TDWR (either at JFK or at LGA) indicate that this option is feasible if data from the Newark TDWR are simultaneously used. This on-airport option would require software modification such as integration of data from the two radar systems an dimplementation of "overhead" feature detection. The radars on the Manhattan skyscrapers are not an acceptable alternative due to severe ground clutter. The Staten Island site and most other candidate airports are also not acceptable due to distance and/or beam blockage. The existing Airport Surveillance Radar (ASR-9) Weather Systems Processor (WSP) at JFK and the Bookhaven (OKX) Weather Surveillance Radar 1988-Doppler (WSR-88D, commonly known as NEXRAD) on Long Island cannot provide sufficient wind shear protection mainly due to limited wind shear detection capability and/or distance.
READ LESS

Summary

The John F. Kennedy International Airport (JFK) and LaGuardia Airport (LGA) are protected from wind shear exposure by the New York Terminal Doppler Weather Radar (TDWR), which is currently located at Floyd Bennet Field, New York. Because of a September 1999 agreement between the Department of the Interior and the...

READ MORE

2-D processing of speech for multi-pitch analysis.

Published in:
INTERSPEECH 2009, 6-10 September 2009.

Summary

This paper introduces a two-dimensional (2-D) processing approach for the analysis of multi-pitch speech sounds. Our framework invokes the short-space 2-D Fourier transform magnitude of a narrowband spectrogram, mapping harmonically related signal components to multiple concentrated entities in a new 2-D space. First, localized time-frequency regions of the spectrogram are analyzed to extract pitch candidates. These candidates are then combined across multiple regions for obtaining separate pitch estimates of each speech-signal component at a single point in time. We refer to this as multi-region analysis (MRA). By explicitly accounting for pitch dynamics within localized time segments, this separability is distinct from that which can be obtained using short-time autocorrelation methods typically employed in state-of-the-art multi-pitch tracking algorithms. We illustrate the feasibility of MRA for multi-pitch estimation on mixtures of synthetic and real speech.
READ LESS

Summary

This paper introduces a two-dimensional (2-D) processing approach for the analysis of multi-pitch speech sounds. Our framework invokes the short-space 2-D Fourier transform magnitude of a narrowband spectrogram, mapping harmonically related signal components to multiple concentrated entities in a new 2-D space. First, localized time-frequency regions of the spectrogram are...

READ MORE

A comparison of query-by-example methods for spoken term detection

Published in:
INTERSPEECH 2009, 6-10 September 2009.

Summary

In this paper we examine an alternative interface for phonetic search, namely query-by-example, that avoids OOV issues associated with both standard word-based and phonetic search methods. We develop three methods that compare query lattices derived from example audio against a standard ngrambased phonetic index and we analyze factors affecting the performance of these systems. We show that the best systems under this paradigm are able to achieve 77% precision when retrieving utterances from conversational telephone speech and returning 10 results from a single query (performance that is better than a similar dictionary-based approach) suggesting significant utility for applications requiring high precision. We also show that these systems can be further improved using relevance feedback: By incorporating four additional queries the precision of the best system can be improved by 13.7% relative. Our systems perform well despite high phone recognition error rates (> 40%) and make use of no pronunciation or letter-to-sound resources.
READ LESS

Summary

In this paper we examine an alternative interface for phonetic search, namely query-by-example, that avoids OOV issues associated with both standard word-based and phonetic search methods. We develop three methods that compare query lattices derived from example audio against a standard ngrambased phonetic index and we analyze factors affecting the...

READ MORE

A framework for discriminative SVM/GMM systems for language recognition

Published in:
INTERSPEECH 2009, 6-10 September 2009.

Summary

Language recognition with support vector machines and shifted-delta cepstral features has been an excellent performer in NIST-sponsored language evaluation for many years. A novel improvement of this method has been the introduction of hybrid SVM/GMM systems. These systems use GMM supervectors as an SVM expansion for classification. In prior work, methods for scoring SVM/GMM systems have been introduced based upon either standard SVM scoring or GMM scoring with a pushed model. Although prior work showed experimentally that GMM scoring yielded better results, no framework was available to explain the connection between SVM scoring and GMM scoring. In this paper, we show that there are interesting connections between SVM scoring and GMM scoring. We provide a framework both theoretically and experimentally that connects the two scoring techniques. This connection should provide the basis for further research in SVM discriminative training for GMM models.
READ LESS

Summary

Language recognition with support vector machines and shifted-delta cepstral features has been an excellent performer in NIST-sponsored language evaluation for many years. A novel improvement of this method has been the introduction of hybrid SVM/GMM systems. These systems use GMM supervectors as an SVM expansion for classification. In prior work...

READ MORE

Discriminative N-gram selection for dialect recognition

Summary

Dialect recognition is a challenging and multifaceted problem. Distinguishing between dialects can rely upon many tiers of interpretation of speech data - e.g., prosodic, phonetic, spectral, and word. High-accuracy automatic methods for dialect recognition typically rely upon either phonetic or spectral characteristics of the input. A challenge with spectral system, such as those based on shifted-delta cepstral coefficients, is that they achieve good performance but do not provide insight into distinctive dialect features. In this work, a novel method based upon discriminative training and phone N- grams is proposed. This approach achieves excellent classification performance, fuses well with other systems, and has interpretable dialect characteristics in the phonetic tier. The method is demonstrated on data from the LDC and prior NIST language recognition evaluations. The method is also combined with spectral methods to demonstrate state-of-the-art performance in dialect recognition.
READ LESS

Summary

Dialect recognition is a challenging and multifaceted problem. Distinguishing between dialects can rely upon many tiers of interpretation of speech data - e.g., prosodic, phonetic, spectral, and word. High-accuracy automatic methods for dialect recognition typically rely upon either phonetic or spectral characteristics of the input. A challenge with spectral system...

READ MORE

Large-scale analysis of formant frequency estimation variability in conversational telephone speech

Published in:
INTERSPEECH 2009, 6-10 September 2009.

Summary

We quantify how the telephone channel and regional dialect influence formant estimates extracted from Wavesurfer in spontaneous conversational speech from over 3,600 native American English speakers. To the best of our knowledge, this is the largest scale study on this topic. We found that F1 estimates are higher in cellular channels than those in landline, while F2 in general shows an opposite trend. We also characterized vowel shift trends in northern states in U.S.A. and compared them with the Northern city chain shift (NCCS). Our analysis is useful in forensic applications where it is important to distinguish between speaker, dialect, and channel characteristics.
READ LESS

Summary

We quantify how the telephone channel and regional dialect influence formant estimates extracted from Wavesurfer in spontaneous conversational speech from over 3,600 native American English speakers. To the best of our knowledge, this is the largest scale study on this topic. We found that F1 estimates are higher in cellular...

READ MORE

The MIT Lincoln Laboratory 2008 speaker recognition system

Summary

In recent years methods for modeling and mitigating variational nuisances have been introduced and refined. A primary emphasis in this years NIST 2008 Speaker Recognition Evaluation (SRE) was to greatly expand the use of auxiliary microphones. This offered the additional channel variations which has been a historical challenge to speaker verification systems. In this paper we present the MIT Lincoln Laboratory Speaker Recognition system applied to the task in the NIST 2008 SRE. Our approach during the evaluation was two-fold: 1) Utilize recent advances in variational nuisance modeling (latent factor analysis and nuisance attribute projection) to allow our spectral speaker verification systems to better compensate for the channel variation introduced, and 2) fuse systems targeting the different linguistic tiers of information, high and low. The performance of the system is presented when applied on a NIST 2008 SRE task. Post evaluation analysis is conducted on the sub-task when interview microphones are present.
READ LESS

Summary

In recent years methods for modeling and mitigating variational nuisances have been introduced and refined. A primary emphasis in this years NIST 2008 Speaker Recognition Evaluation (SRE) was to greatly expand the use of auxiliary microphones. This offered the additional channel variations which has been a historical challenge to speaker...

READ MORE

Time-varying autoregressive tests for multiscale speech analysis

Published in:
INTERSPEECH 2009, 10th Annual Conf. of the International Speech Communication Association, pp. 2839-2842.

Summary

In this paper we develop hypothesis tests for speech waveform nonstationarity based on time-varying autoregressive models, and demonstrate their efficacy in speech analysis tasks at both segmental and sub-segmental scales. Key to the successful synthesis of these ideas is our employment of a generalized likelihood ratio testing framework tailored to autoregressive coefficient evolutions suitable for speech. After evaluating our framework on speech-like synthetic signals, we present preliminary results for two distinct analysis tasks using speech waveform data. At the segmental level, we develop an adaptive short-time segmentation scheme and evaluate it on whispered speech recordings, while at the sub-segmental level, we address the problem of detecting the glottal flow closed phase. Results show that our hypothesis testing framework can reliably detect changes in the vocal tract parameters across multiple scales, thereby underscoring its broad applicability to speech analysis.
READ LESS

Summary

In this paper we develop hypothesis tests for speech waveform nonstationarity based on time-varying autoregressive models, and demonstrate their efficacy in speech analysis tasks at both segmental and sub-segmental scales. Key to the successful synthesis of these ideas is our employment of a generalized likelihood ratio testing framework tailored to...

READ MORE