Home

About Me

Links

Cool Stuff

Mail Me

 

Previous Work
 

 

Audio-Visual Speech Recognition

Speech Recognition is a field of artificial intelligence where the goal is to get a computer understand human speech. Audio Visual Speech Recognition (AVSR) is a field of speech recognition where we attempt to make use of visual information from the speaker to try and recognize his speech. Visual Information includes lips shapes, position of eye-balls etc.

Integration of Audio and Visual Information:  My first project here involved the implementation of Coupled Hidden Markov Models (CHMM) for of audio and visual integration. If you want to learn more about CHMMs and the results of this project please refer to the following paper:

A. Subramanya, E.K. Patterson, S. Gurbuz, J. N. Gowdy, "Audio-Visual Speech Intergration using Couled Hidden Markov Models for continuous Speech Recognition", IEEE International Conference on Acoustics, Speech and Signal Processing, Hong Kong, 2003.

Facial Feature Tracking: Facial Feature tracking involves locating the position of facial features such as eyes, lips etc., in a given frame. The difficulty associated with facial feature tracking is that the algorithm needs to be robust to changes in lighting condition, independent of skin tone, facial hair etc.

We have developed an algorithm to track eyes in real-time. The images are captured using a web-camera.

A. Subramanya, R. Kumaran, J. Gowdy, "Real time Eye tracking for Human Computer Interfaces", IEEE International Conference on Multimedia and Expo (ICME), Baltimore, 2003.

Graphical Models for Speech Recognition:  In this project we developed a Dynamic Bayesian Network (DBN) to integrate information from multiple audio and visual streams. Results from our implementation using the Clemson University Audio Visual Experiments (CUAVE) database indicated an absolute improvement of about 4% in word accuracy at SNR of -4db. For more information refer:

A. Subramanya, J. Gowdy, C. Bartels, J. Bilmes, "DBN Based Multi-Stream Models for Audio-Visual Speech Recognition", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Montreal, 2004.

 

 

Microphone Array Calibration

Applications using microphone arrays such as acoustic localization and beam-forming require the locations of microphone arrays to be known. Microphone array calibration deals with the problem of automatically determining the position of the microphones in the array. Classical multidimensional scaling (MDS) is a simple, global, non-iterative technique for determining the locations of microphones given their inter-point distances, even when all such distances are not available. In this project we made use of MDS to automatically determine the positions of microphone arrays. Simulations and experiments demonstrated that the accuracy of the algorithm for practical scenarios is of the order of 1cm. More information maybe obtained in the following papers:

A. Subramanya, S. T. Birchfield, "Extension and Evaluation of MDS for Geometric Microphone Array Calibration", European Signal Processing Conference, Vienna (EUSIPCO), Austria, 2004.

S. T. Birchfield and A. Subramanya, "Microphone Array Position Calibration by Basis-Point Classical Multidimensional Scaling", IEEE Transactions on Speech and Audio Processing (accepted, 2004).

 

Here is a list of some of the projects that I have been associated with during the early years of my graduate studies (This list is about four years old (i.e. out-dated), I promise to update it as and when I have more time on my hands. For now, look at my publications for the most recent information on my work.) :

1. Audio-Visual Speech Recognition

2. Microphone Array Calibration

3. Other Projects