The Experts below are selected from a list of 1662 Experts worldwide ranked by ideXlab platform
Tomoki Toda - One of the best experts on this subject based on the ideXlab platform.
-
efficient shallow wavenet vocoder using multiple samples output based on Laplacian Distribution and linear prediction
International Conference on Acoustics Speech and Signal Processing, 2020Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki TodaAbstract:This paper presents a novel way for an efficient implementation scheme of shallow WaveNet vocoder with multiple samples (segment) output based on the use of Laplacian Distribution and linear prediction. In our previous work, we have proposed a shallow architecture for WaveNet vocoder that utilizes only 9 dilated convolutional layers while capable of generating high-quality speech with the use of Laplacian Distribution in speech samples modeling. However, there is still a lot of room for improvements to increase the computation efficiency, such as by the inference of segment output and the use of a more compact structure. In this work, we tackle this issue by proposing a simple implementation scheme of segment output modeling, that can be easily extended into other neural vocoders, where the Laplacian Distribution parameters of multiple samples are estimated simultaneously. Further, to preserve the dependencies of the samples within the segment, we also propose utilizing linear prediction (LP) to compute the Distribution parameters, where data- driven LP-coefficients are estimated by the WaveNet vocoder along with locations and scales. Finally, a shallower WaveNet vocoder with 6 layers is deployed. The experimental results demonstrate that the proposed LP-based Laplacian Distribution can alleviate the quality degradation caused by segment generation.
-
ICASSP - Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction
ICASSP 2020 - 2020 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki TodaAbstract:This paper presents a novel way for an efficient implementation scheme of shallow WaveNet vocoder with multiple samples (segment) output based on the use of Laplacian Distribution and linear prediction. In our previous work, we have proposed a shallow architecture for WaveNet vocoder that utilizes only 9 dilated convolutional layers while capable of generating high-quality speech with the use of Laplacian Distribution in speech samples modeling. However, there is still a lot of room for improvements to increase the computation efficiency, such as by the inference of segment output and the use of a more compact structure. In this work, we tackle this issue by proposing a simple implementation scheme of segment output modeling, that can be easily extended into other neural vocoders, where the Laplacian Distribution parameters of multiple samples are estimated simultaneously. Further, to preserve the dependencies of the samples within the segment, we also propose utilizing linear prediction (LP) to compute the Distribution parameters, where data- driven LP-coefficients are estimated by the WaveNet vocoder along with locations and scales. Finally, a shallower WaveNet vocoder with 6 layers is deployed. The experimental results demonstrate that the proposed LP-based Laplacian Distribution can alleviate the quality degradation caused by segment generation.
-
investigation of shallow wavenet vocoder with Laplacian Distribution output
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Tomoki TodaAbstract:In this paper, an investigation of shallow architecture and Laplacian Distribution output for WaveNet vocoder trained with limited training data is presented. The use of shallower WaveNet architecture is proposed to accommodate the possibility of more suitable use case with limited data and to reduce the computation time. In order to further improve the modeling of WaveNet vocoder, the use of Laplacian Distribution output is proposed. Laplacian Distribution is inherently a sparse Distribution, with higher peak and fatter tail than the Gaussian, which might be more suitable for speech signal modeling. The experimental results demonstrate that: 1) the proposed shallow variant of WaveNet architecture gives comparable performance compared to the deep one with softmax output, while reducing the computation time by 73%; and 2) the use of Laplacian Distribution output consistently improves the speech quality in various amounts of limited training data, reaching a value of 4.22 for the two highest mean opinion scores.
-
ASRU - Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Tomoki TodaAbstract:In this paper, an investigation of shallow architecture and Laplacian Distribution output for WaveNet vocoder trained with limited training data is presented. The use of shallower WaveNet architecture is proposed to accommodate the possibility of more suitable use case with limited data and to reduce the computation time. In order to further improve the modeling of WaveNet vocoder, the use of Laplacian Distribution output is proposed. Laplacian Distribution is inherently a sparse Distribution, with higher peak and fatter tail than the Gaussian, which might be more suitable for speech signal modeling. The experimental results demonstrate that: 1) the proposed shallow variant of WaveNet architecture gives comparable performance compared to the deep one with softmax output, while reducing the computation time by 73%; and 2) the use of Laplacian Distribution output consistently improves the speech quality in various amounts of limited training data, reaching a value of 4.22 for the two highest mean opinion scores.
Patrick Lumban Tobing - One of the best experts on this subject based on the ideXlab platform.
-
efficient shallow wavenet vocoder using multiple samples output based on Laplacian Distribution and linear prediction
International Conference on Acoustics Speech and Signal Processing, 2020Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki TodaAbstract:This paper presents a novel way for an efficient implementation scheme of shallow WaveNet vocoder with multiple samples (segment) output based on the use of Laplacian Distribution and linear prediction. In our previous work, we have proposed a shallow architecture for WaveNet vocoder that utilizes only 9 dilated convolutional layers while capable of generating high-quality speech with the use of Laplacian Distribution in speech samples modeling. However, there is still a lot of room for improvements to increase the computation efficiency, such as by the inference of segment output and the use of a more compact structure. In this work, we tackle this issue by proposing a simple implementation scheme of segment output modeling, that can be easily extended into other neural vocoders, where the Laplacian Distribution parameters of multiple samples are estimated simultaneously. Further, to preserve the dependencies of the samples within the segment, we also propose utilizing linear prediction (LP) to compute the Distribution parameters, where data- driven LP-coefficients are estimated by the WaveNet vocoder along with locations and scales. Finally, a shallower WaveNet vocoder with 6 layers is deployed. The experimental results demonstrate that the proposed LP-based Laplacian Distribution can alleviate the quality degradation caused by segment generation.
-
ICASSP - Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction
ICASSP 2020 - 2020 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki TodaAbstract:This paper presents a novel way for an efficient implementation scheme of shallow WaveNet vocoder with multiple samples (segment) output based on the use of Laplacian Distribution and linear prediction. In our previous work, we have proposed a shallow architecture for WaveNet vocoder that utilizes only 9 dilated convolutional layers while capable of generating high-quality speech with the use of Laplacian Distribution in speech samples modeling. However, there is still a lot of room for improvements to increase the computation efficiency, such as by the inference of segment output and the use of a more compact structure. In this work, we tackle this issue by proposing a simple implementation scheme of segment output modeling, that can be easily extended into other neural vocoders, where the Laplacian Distribution parameters of multiple samples are estimated simultaneously. Further, to preserve the dependencies of the samples within the segment, we also propose utilizing linear prediction (LP) to compute the Distribution parameters, where data- driven LP-coefficients are estimated by the WaveNet vocoder along with locations and scales. Finally, a shallower WaveNet vocoder with 6 layers is deployed. The experimental results demonstrate that the proposed LP-based Laplacian Distribution can alleviate the quality degradation caused by segment generation.
-
investigation of shallow wavenet vocoder with Laplacian Distribution output
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Tomoki TodaAbstract:In this paper, an investigation of shallow architecture and Laplacian Distribution output for WaveNet vocoder trained with limited training data is presented. The use of shallower WaveNet architecture is proposed to accommodate the possibility of more suitable use case with limited data and to reduce the computation time. In order to further improve the modeling of WaveNet vocoder, the use of Laplacian Distribution output is proposed. Laplacian Distribution is inherently a sparse Distribution, with higher peak and fatter tail than the Gaussian, which might be more suitable for speech signal modeling. The experimental results demonstrate that: 1) the proposed shallow variant of WaveNet architecture gives comparable performance compared to the deep one with softmax output, while reducing the computation time by 73%; and 2) the use of Laplacian Distribution output consistently improves the speech quality in various amounts of limited training data, reaching a value of 4.22 for the two highest mean opinion scores.
-
ASRU - Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Tomoki TodaAbstract:In this paper, an investigation of shallow architecture and Laplacian Distribution output for WaveNet vocoder trained with limited training data is presented. The use of shallower WaveNet architecture is proposed to accommodate the possibility of more suitable use case with limited data and to reduce the computation time. In order to further improve the modeling of WaveNet vocoder, the use of Laplacian Distribution output is proposed. Laplacian Distribution is inherently a sparse Distribution, with higher peak and fatter tail than the Gaussian, which might be more suitable for speech signal modeling. The experimental results demonstrate that: 1) the proposed shallow variant of WaveNet architecture gives comparable performance compared to the deep one with softmax output, while reducing the computation time by 73%; and 2) the use of Laplacian Distribution output consistently improves the speech quality in various amounts of limited training data, reaching a value of 4.22 for the two highest mean opinion scores.
Tomoki Hayashi - One of the best experts on this subject based on the ideXlab platform.
-
efficient shallow wavenet vocoder using multiple samples output based on Laplacian Distribution and linear prediction
International Conference on Acoustics Speech and Signal Processing, 2020Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki TodaAbstract:This paper presents a novel way for an efficient implementation scheme of shallow WaveNet vocoder with multiple samples (segment) output based on the use of Laplacian Distribution and linear prediction. In our previous work, we have proposed a shallow architecture for WaveNet vocoder that utilizes only 9 dilated convolutional layers while capable of generating high-quality speech with the use of Laplacian Distribution in speech samples modeling. However, there is still a lot of room for improvements to increase the computation efficiency, such as by the inference of segment output and the use of a more compact structure. In this work, we tackle this issue by proposing a simple implementation scheme of segment output modeling, that can be easily extended into other neural vocoders, where the Laplacian Distribution parameters of multiple samples are estimated simultaneously. Further, to preserve the dependencies of the samples within the segment, we also propose utilizing linear prediction (LP) to compute the Distribution parameters, where data- driven LP-coefficients are estimated by the WaveNet vocoder along with locations and scales. Finally, a shallower WaveNet vocoder with 6 layers is deployed. The experimental results demonstrate that the proposed LP-based Laplacian Distribution can alleviate the quality degradation caused by segment generation.
-
ICASSP - Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction
ICASSP 2020 - 2020 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki TodaAbstract:This paper presents a novel way for an efficient implementation scheme of shallow WaveNet vocoder with multiple samples (segment) output based on the use of Laplacian Distribution and linear prediction. In our previous work, we have proposed a shallow architecture for WaveNet vocoder that utilizes only 9 dilated convolutional layers while capable of generating high-quality speech with the use of Laplacian Distribution in speech samples modeling. However, there is still a lot of room for improvements to increase the computation efficiency, such as by the inference of segment output and the use of a more compact structure. In this work, we tackle this issue by proposing a simple implementation scheme of segment output modeling, that can be easily extended into other neural vocoders, where the Laplacian Distribution parameters of multiple samples are estimated simultaneously. Further, to preserve the dependencies of the samples within the segment, we also propose utilizing linear prediction (LP) to compute the Distribution parameters, where data- driven LP-coefficients are estimated by the WaveNet vocoder along with locations and scales. Finally, a shallower WaveNet vocoder with 6 layers is deployed. The experimental results demonstrate that the proposed LP-based Laplacian Distribution can alleviate the quality degradation caused by segment generation.
-
investigation of shallow wavenet vocoder with Laplacian Distribution output
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Tomoki TodaAbstract:In this paper, an investigation of shallow architecture and Laplacian Distribution output for WaveNet vocoder trained with limited training data is presented. The use of shallower WaveNet architecture is proposed to accommodate the possibility of more suitable use case with limited data and to reduce the computation time. In order to further improve the modeling of WaveNet vocoder, the use of Laplacian Distribution output is proposed. Laplacian Distribution is inherently a sparse Distribution, with higher peak and fatter tail than the Gaussian, which might be more suitable for speech signal modeling. The experimental results demonstrate that: 1) the proposed shallow variant of WaveNet architecture gives comparable performance compared to the deep one with softmax output, while reducing the computation time by 73%; and 2) the use of Laplacian Distribution output consistently improves the speech quality in various amounts of limited training data, reaching a value of 4.22 for the two highest mean opinion scores.
-
ASRU - Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Tomoki TodaAbstract:In this paper, an investigation of shallow architecture and Laplacian Distribution output for WaveNet vocoder trained with limited training data is presented. The use of shallower WaveNet architecture is proposed to accommodate the possibility of more suitable use case with limited data and to reduce the computation time. In order to further improve the modeling of WaveNet vocoder, the use of Laplacian Distribution output is proposed. Laplacian Distribution is inherently a sparse Distribution, with higher peak and fatter tail than the Gaussian, which might be more suitable for speech signal modeling. The experimental results demonstrate that: 1) the proposed shallow variant of WaveNet architecture gives comparable performance compared to the deep one with softmax output, while reducing the computation time by 73%; and 2) the use of Laplacian Distribution output consistently improves the speech quality in various amounts of limited training data, reaching a value of 4.22 for the two highest mean opinion scores.
Kazuhiro Kobayashi - One of the best experts on this subject based on the ideXlab platform.
-
efficient shallow wavenet vocoder using multiple samples output based on Laplacian Distribution and linear prediction
International Conference on Acoustics Speech and Signal Processing, 2020Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki TodaAbstract:This paper presents a novel way for an efficient implementation scheme of shallow WaveNet vocoder with multiple samples (segment) output based on the use of Laplacian Distribution and linear prediction. In our previous work, we have proposed a shallow architecture for WaveNet vocoder that utilizes only 9 dilated convolutional layers while capable of generating high-quality speech with the use of Laplacian Distribution in speech samples modeling. However, there is still a lot of room for improvements to increase the computation efficiency, such as by the inference of segment output and the use of a more compact structure. In this work, we tackle this issue by proposing a simple implementation scheme of segment output modeling, that can be easily extended into other neural vocoders, where the Laplacian Distribution parameters of multiple samples are estimated simultaneously. Further, to preserve the dependencies of the samples within the segment, we also propose utilizing linear prediction (LP) to compute the Distribution parameters, where data- driven LP-coefficients are estimated by the WaveNet vocoder along with locations and scales. Finally, a shallower WaveNet vocoder with 6 layers is deployed. The experimental results demonstrate that the proposed LP-based Laplacian Distribution can alleviate the quality degradation caused by segment generation.
-
ICASSP - Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction
ICASSP 2020 - 2020 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020Co-Authors: Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki TodaAbstract:This paper presents a novel way for an efficient implementation scheme of shallow WaveNet vocoder with multiple samples (segment) output based on the use of Laplacian Distribution and linear prediction. In our previous work, we have proposed a shallow architecture for WaveNet vocoder that utilizes only 9 dilated convolutional layers while capable of generating high-quality speech with the use of Laplacian Distribution in speech samples modeling. However, there is still a lot of room for improvements to increase the computation efficiency, such as by the inference of segment output and the use of a more compact structure. In this work, we tackle this issue by proposing a simple implementation scheme of segment output modeling, that can be easily extended into other neural vocoders, where the Laplacian Distribution parameters of multiple samples are estimated simultaneously. Further, to preserve the dependencies of the samples within the segment, we also propose utilizing linear prediction (LP) to compute the Distribution parameters, where data- driven LP-coefficients are estimated by the WaveNet vocoder along with locations and scales. Finally, a shallower WaveNet vocoder with 6 layers is deployed. The experimental results demonstrate that the proposed LP-based Laplacian Distribution can alleviate the quality degradation caused by segment generation.
Shanqjang Ruan - One of the best experts on this subject based on the ideXlab platform.
-
scene analysis for object detection in advanced surveillance systems using Laplacian Distribution model
Systems Man and Cybernetics, 2011Co-Authors: Fanchieh Cheng, Shihchia Huang, Shanqjang RuanAbstract:In this paper, we propose a novel background subtraction approach in order to accurately detect moving objects. Our method involves three important proposed modules: a block alarm module, a background modeling module, and an object extraction module. The block alarm module efficiently checks each block for the presence of either a moving object or background information. This is accomplished by using temporal differencing pixels of the Laplacian Distribution model and allows the subsequent background modeling module to process only those blocks that were found to contain background pixels. Next, the background modeling module is employed in order to generate a high-quality adaptive background model using a unique two-stage training procedure and a novel mechanism for recognizing changes in illumination. As the final step of our process, the proposed object extraction module will compute the binary object detection mask through the applied suitable threshold value. This is accomplished by using our proposed threshold training procedure. The performance evaluation of our proposed method was analyzed by quantitative and qualitative evaluation. The overall results show that our proposed method attains a substantially higher degree of efficacy, outperforming other state-of-the-art methods by Similarity and F1 accuracy rates of up to 35.50% and 26.09%, respectively.
-
advanced background subtraction approach using Laplacian Distribution model
International Conference on Multimedia and Expo, 2010Co-Authors: Fanchieh Cheng, Shihchia Huang, Shanqjang RuanAbstract:In this paper, we propose a novel background subtraction approach in order to accurately detect moving objects. Our method involves three important proposed modules: a block alarm module, a background modeling module, and an object extraction module. Our proposed block alarm module efficiently checks each block for the presence of either moving object or background information. This is accomplished by using temporal differencing pixels of the Laplacian Distribution model and allows the subsequent background modeling module to process only those blocks found to contain background pixels. For our proposed background modeling module, a unique two-stage background training procedure is performed using Rough Training followed by Precise Training in order to generate a high-quality adaptive background model. As the final step of our process, we present an object extraction module which will compute the binary object detection mask through the applied suitable threshold value. This is accomplished by using our proposed threshold training procedure in order to achieve accurate and complete detection of moving objects. The overall results of these analyses demonstrate that our proposed method attains a substantially higher degree of efficacy, outperforming other state-of-the-art methods by Similarity and F 1 accuracy rates of up to 57.17% and 48.48%, respectively.
-
ICME - Advanced background subtraction approach using Laplacian Distribution model
2010 IEEE International Conference on Multimedia and Expo, 2010Co-Authors: Fanchieh Cheng, Shihchia Huang, Shanqjang RuanAbstract:In this paper, we propose a novel background subtraction approach in order to accurately detect moving objects. Our method involves three important proposed modules: a block alarm module, a background modeling module, and an object extraction module. Our proposed block alarm module efficiently checks each block for the presence of either moving object or background information. This is accomplished by using temporal differencing pixels of the Laplacian Distribution model and allows the subsequent background modeling module to process only those blocks found to contain background pixels. For our proposed background modeling module, a unique two-stage background training procedure is performed using Rough Training followed by Precise Training in order to generate a high-quality adaptive background model. As the final step of our process, we present an object extraction module which will compute the binary object detection mask through the applied suitable threshold value. This is accomplished by using our proposed threshold training procedure in order to achieve accurate and complete detection of moving objects. The overall results of these analyses demonstrate that our proposed method attains a substantially higher degree of efficacy, outperforming other state-of-the-art methods by Similarity and F 1 accuracy rates of up to 57.17% and 48.48%, respectively.