Kooperativer Bibliotheksverbund

Berlin Brandenburg

and
and

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Drude, Lukas  (10)
Type of Medium
Language
Year
Journal
  • 1
    Language: English
    In: Computer Speech & Language, November 2017, Vol.46, pp.374-385
    Description: Acoustic beamforming can greatly improve the performance of Automatic Speech Recognition(ASR) and speech enhancement systems when multiple channels are available. We recently proposed a way to support the model-based Generalized Eigenvalue beamforming operation with a powerful neural network for spectral mask estimation. The enhancement system has a number of desirable properties. In particular, neither assumptions need to be made about the nature of the acoustic transfer function (e.g., being anechonic), nor does the array configuration need to be known. While the system has been originally developed to enhance speech in noisy environments, we show in this article that it is also effective in suppressing reverberation, thus leading to a generic trainable multi-channel speech enhancement system for robust speech processing. To support this claim, we consider two distinct datasets: The CHiME 3challenge, which features challenging real-world noise distortions, and the challenge, which focuses on distortions caused by reverberation. We evaluate the system both with respect to a speech enhancement and a recognition task. For the first task we propose a new way to cope with the distortions introduced by the Generalized Eigenvalue beamformer by renormalizing the target energy for each frequency bin, and measure its effectiveness in terms of the PESQ score. For the latter we feed the enhanced signal to a strong DNN back-end and achieve state-of-the-art ASR results on both datasets. We further experiment with different network architectures for spectral mask estimation: One small feed-forward network with only one hidden layer, one Convolutional Neural Network and one bi-directional Long Short-Term Memory network, showing that even a small network is capable of delivering significant performance improvements.
    Keywords: Robust Speech Recognition ; Acoustic Beamforming ; Multi-Channel Speech Enhancement ; Deep Neural Network ; Engineering ; Computer Science
    ISSN: 0885-2308
    E-ISSN: 1095-8363
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 2
    Description: This report describes the computation of gradients by algorithmic differentiation for statistically optimum beamforming operations. Especially the derivation of complex-valued functions is a key component of this approach. Therefore the real-valued algorithmic differentiation is extended via the complex-valued chain rule. In addition to the basic mathematic operations the derivative of the eigenvalue problem with complex-valued eigenvectors is one of the key results of this report. The potential of this approach is shown with experimental results on the CHiME-3 challenge database. There, the beamforming task is used as a front-end for an ASR system. With the developed derivatives a joint optimization of a speech enhancement and speech recognition system w.r.t. the recognition optimization criterion is possible. Comment: Technical Report
    Keywords: Computer Science - Numerical Analysis ; Computer Science - Computational Engineering, Finance, And Science
    Source: Cornell University
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 3
    Language: English
    In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), December 2015, pp.444-451
    Description: We present a new beamformer front-end for Automatic Speech Recognition and apply it to the 3rd-CHiME Speech Separation and Recognition Challenge. Without any further modification of the back-end, we achieve a 53% relative reduction of the word error rate over the best baseline enhancement system for the relevant test data set. Our approach leverages the power of a bi-directional Long Short-Term Memory network to robustly estimate soft masks for a subsequent beamforming step. The utilized Generalized Eigenvalue beamforming operation with an optional Blind Analytic Normalization does not rely on a Direction-of-Arrival estimate and can cope with multi-path sound propagation, while at the same time only introducing very limited speech distortions. Our quite simple setup exploits the possibilities provided by simulated training data while still being able to generalize well to the fairly different real data. Finally, combining our front-end with data augmentation and another language model nearly yields a 64 % reduction of the word error rate on the real data test set.
    Keywords: Speech ; Training ; Speech Recognition ; Array Signal Processing ; Estimation ; Artificial Neural Networks ; Robust Speech Recognition ; Beamforming ; Feature Enhancement ; Neural Networks
    Source: IEEE Conference Publications
    Source: IEEE Xplore
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 4
    Language: English
    In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2016, pp.196-200
    Description: We present a neural network based approach to acoustic beamforming. The network is used to estimate spectral masks from which the Cross-Power Spectral Density matrices of speech and noise are estimated, which in turn are used to compute the beamformer coefficients. The network training is independent of the number and the geometric configuration of the microphones. We further show that it is possible to train the network on clean speech only, avoiding the need for stereo data with separated speech and noise. Two types of networks are evaluated. One small feed-forward network with only one hidden layer and one more elaborated bi-directional Long Short-Term Memory network. We compare our system with different parametric approaches to mask estimation and using different beamforming algorithms. We show that our system yields superior results, both in terms of perceptual speech quality and with respect to speech recognition error rate. The results for the simple feed-forward network are especially encouraging considering its low computational requirements.
    Keywords: Speech ; Estimation ; Training ; Array Signal Processing ; Acoustics ; Signal to Noise Ratio ; Neural Networks ; Robust Speech Recognition ; Acoustic Beam-Forming ; Feature Enhancement ; Deep Neural Network ; Engineering
    ISSN: 15206149
    E-ISSN: 2379-190X
    Source: IEEE Conference Publications
    Source: IEEE Xplore
    Source: IEEE Journals & Magazines 
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 5
    Language: English
    In: 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), September 2018, pp.466-470
    Description: Signal dereverberation using the weighted prediction error (WPE) method has been proven to be an effective means to raise the accuracy of far-field speech recognition. But in its original formulation, WPE requires multiple iterations over a sufficiently long utterance, rendering it unsuitable for online low-latency applications. Recently, two methods have been proposed to overcome this limitation. One utilizes a neural network to estimate the power spectral density (PSD) of the target signal and works in a block-online fashion. The other method relies on a rather simple PSD estimation which smoothes the observed PSD and utilizes a recursive formulation which enables it to work on a frame-by-frame basis. In this paper, we integrate a deep neural network (DNN) based estimator into the recursive frame-online formulation. We evaluate the performance of the recursive system with different PSD estimators in comparison to the block-online and offline variant on two distinct corpora. The REVERB challenge data, where the signal is mainly deteriorated by reverberation, and a database which combines WSJ and VoiceHome to also consider (directed) noise sources. The results show that although smoothing works surprisingly well, the more sophisticated DNN based estimator shows promising improvements and shortens the performance gap between online and offline processing.
    Keywords: Reverberation ; Estimation ; Conferences ; Neural Networks ; Indexes ; Microphones ; Speech Recognition ; Online Speech Enhancement ; Dereverberation
    Source: IEEE Conference Publications
    Source: IEEE Xplore
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 6
    Language: English
    In: 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), October 2017, pp.1-6
    Description: Multi-channel speech enhancement algorithms rely on a synchronous sampling of the microphone signals. This, however, cannot always be guaranteed, especially if the sensors are distributed in an environment. To avoid performance degradation the sampling rate offset needs to be estimated and compensated for. In this contribution we extend the recently proposed coherence drift based method in two important directions. First, the increasing phase shift in the short-time Fourier transform domain is estimated from the coherence drift in a Matched Filter-like fashion, where intermediate estimates are weighted by their instantaneous SNR. Second, an observed bias is removed by iterating between offset estimation and compensation by resampling a couple of times. The effectiveness of the proposed method is demonstrated by speech recognition results on the output of a beamformer with and without sampling rate offset compensation between the input channels. We compare MVDR and maximum-SNR beamformers in reverberant environments and further show that both benefit from a novel phase normalization, which we also propose in this contribution.
    Keywords: Coherence ; Delays ; Array Signal Processing ; Estimation ; Signal to Noise Ratio ; Synchronization ; Acoustics ; Engineering
    E-ISSN: 2473-3628
    Source: IEEE Conference Publications
    Source: IEEE Xplore
    Source: IEEE Journals & Magazines 
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 7
    Language: English
    In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2017, pp.171-175
    Description: In this paper we show how a neural network for spectral mask estimation for an acoustic beamformer can be optimized by algorithmic differentiation. Using the beamformer output SNR as the objective function to maximize, the gradient is propagated through the beamformer all the way to the neural network which provides the clean speech and noise masks from which the beamformer coefficients are estimated by eigenvalue decomposition. A key theoretical result is the derivative of an eigenvalue problem involving complex-valued eigenvectors. Experimental results on the CHiME-3 challenge database demonstrate the effectiveness of the approach. The tools developed in this paper are a key component for an end-to-end optimization of speech enhancement and speech recognition.
    Keywords: Speech ; Artificial Neural Networks ; Array Signal Processing ; Acoustics ; Linear Programming ; Eigenvalues and Eigenfunctions ; Signal to Noise Ratio ; Acoustic Beamforming ; Deep Neural Network ; Complex-Valued Algorithmic Differentiation ; Generalized Eigenvalue Problem ; Engineering
    ISSN: 15206149
    E-ISSN: 2379-190X
    Source: IEEE Conference Publications
    Source: IEEE Xplore
    Source: IEEE Journals & Magazines 
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 8
    Language: English
    In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2017, pp.5325-5329
    Description: This paper presents an end-to-end training approach for a beamformer-supported multi-channel ASR system. A neural network which estimates masks for a statistically optimum beamformer is jointly trained with a network for acoustic modeling. To update its parameters, we propagate the gradients from the acoustic model all the way through feature extraction and the complex valued beamforming operation. Besides avoiding a mismatch between the front-end and the back-end, this approach also eliminates the need for stereo data, i.e., the parallel availability of clean and noisy versions of the signals. Instead, it can be trained with real noisy multi-channel data only. Also, relying on the signal statistics for beamforming, the approach makes no assumptions on the configuration of the microphone array. We further observe a performance gain through joint training in terms of word error rate in an evaluation of the system on the CHiME 4 dataset.
    Keywords: Acoustics ; Array Signal Processing ; Training ; Neural Networks ; Computational Modeling ; Microphones ; Noise Measurement ; Robust Asr ; Multi-Channel Asr ; Acoustic Beamforming ; Complex Backpropagation ; Engineering
    ISSN: 15206149
    E-ISSN: 2379-190X
    Source: IEEE Conference Publications
    Source: IEEE Xplore
    Source: IEEE Journals & Magazines 
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 9
    Description: We present an unsupervised training approach for a neural network-based mask estimator in an acoustic beamforming application. The network is trained to maximize a likelihood criterion derived from a spatial mixture model of the observations. It is trained from scratch without requiring any parallel data consisting of degraded input and clean training targets. Thus, training can be carried out on real recordings of noisy speech rather than simulated ones. In contrast to previous work on unsupervised training of neural mask estimators, our approach avoids the need for a possibly pre-trained teacher model entirely. We demonstrate the effectiveness of our approach by speech recognition experiments on two different datasets: one mainly deteriorated by noise (CHiME 4) and one by reverberation (REVERB). The results show that the performance of the proposed system is on par with a supervised system using oracle target masks for training and with a system trained using a model-based teacher. Comment: Correction to Eq. 11: Hermite symbol was on the wrong variable. Replaces y with the normalized version
    Keywords: Computer Science - Sound ; Computer Science - Machine Learning ; Statistics - Machine Learning
    Source: Cornell University
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 10
    Language: English
    In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2019, pp.6655-6659
    Description: Signal dereverberation using the Weighted Prediction Error (WPE) method has been proven to be an effective means to raise the accuracy of far-field speech recognition. First proposed as an iterative algorithm, follow-up works have reformulated it as a recursive least squares algorithm and therefore enabled its use in online applications. For this algorithm, the estimation of the power spectral density (PSD) of the anechoic signal plays an important role and strongly influences its performance. Recently, we showed that using a neural network PSD estimator leads to improved performance for online automatic speech recognition. This, however, comes at a price. To train the network, we require parallel data, i.e., utterances simultaneously available in clean and reverberated form. Here we propose to overcome this limitation by training the network jointly with the acoustic model of the speech recognizer. To be specific, the gradients computed from the cross-entropy loss between the target senone sequence and the acoustic model network output is backpropagated through the complex-valued dereverberation filter estimation to the neural network for PSD estimation. Evaluation on two databases demonstrates improved performance for on-line processing scenarios while imposing fewer requirements on the available training data and thus widening the range of applications.
    Keywords: Dereverberation ; Speech Enhancement ; Joint Optimization ; Robust Asr ; Engineering
    ISSN: 15206149
    E-ISSN: 2379-190X
    Source: IEEE Conference Publications
    Source: IEEE Xplore
    Source: IEEE Journals & Magazines 
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. Further information can be found on the KOBV privacy pages