PCVC Speech Dataset: Difference between revisions

Content deleted Content added
 
(11 intermediate revisions by 8 users not shown)
Line 1:
The '''PCVC (Persian Consonant Vowel Combination) Speech Dataset''' is a [[Modern Persian]] [[speech corpus]] for [[speech recognition]] and also [[speaker recognition]]. The dataset contains sound samples of [[Modern Persian]] combination of [[vowel]] and [[consonant]] phonemes from different speakers. Every sound sample contains just one consonant and one vowel So it is somehow labeled in phoneme level. This dataset containsconsists of 23 Persian consonants and 6 vowels. The sound samples are all possible combinations of vowels and consonants (138 samples for each speaker). The sample rate of all speech samples is 48000 which means there are 48000 sound samples in every 1 second. Every sound sample starts with consonant then continues with vowel. In each sample, in average, 0.5 second of each sample is speech and the rest is silence. Each sound sample ends with silence.<ref>{{Cite journal|lastlast1=Malekzadeh|firstfirst1=Saber|last2=Gholizadeh|first2=Mohammad Hossein|last3=Razavi|first3=Seyyed Naser|date=|title=Persian phonemes recognition using PPNet|url=https://arxiv.org/abs/1812.08600|journal=Journal of Signal Processing Systems|volume=|pagesyear=2018|arxiv=1812.08600|doi=10.13140/RG.2.2.34836.96647|vias2cid=214612057 }}</ref><ref>Malekzadeh, S., Gholizadeh, M.H. and Razavi, S.N., 2018. Persian Vowel recognition with MFCC and ANN on PCVC speech dataset. ''arXiv preprint arXiv:1812.06953''.</ref> All of sound samples are denoised with "Adaptive noise reduction" algorithm.<ref>{{cite paperweb |title=PCVC Kaggle page |url=https://www.kaggle.com/sabermalek/pcvcspeech/home }}</ref>
<!-- Please do not remove or change this AfD message until the discussion has been closed. -->
Compared to Farsdat speech dataset<ref>Bijankhan, M., Sheikhzadegan, J., Roohani, M. R., Samareh, Y., Lucas, C., & Tebyani, M. (1994). FARSDAT-The Speech Database of Farsi Spoken Language. The Proceedings of the Australian Conference on Speech Science and Technology (Vol. 2, pp. 826–831).</ref> and Persian speech corpus<ref>Halabi, Nawar (2016). Modern Standard Persian Phonetics for Speech Synthesis. University of Southampton, School of Electronics and Computer Science.</ref> it is more easy to use because it is prepared in .mat data files.<ref>{{cite paperweb |title= Access and change variables directly in MAT-files, without loading into memory. |url=https://uk.mathworks.com/help/matlab/ref/matfile.html }}</ref> Also it is more based on phoneme based separation and all samples are denoised.
{{Article for deletion/dated|page=PCVC Speech Dataset|timestamp=20190105131041|year=2019|month=January|day=5|substed=yes|help=off}}
<!-- Once discussion is closed, please place on talk page: {{Old AfD multi|page=PCVC Speech Dataset|date=5 January 2019|result='''keep'''}} -->
<!-- End of AfD message, feel free to edit beyond this point -->
The '''PCVC (Persian Consonant Vowel Combination) Speech Dataset''' is a [[Modern Persian]] [[speech corpus]] for [[speech recognition]] and also [[speaker recognition]]. The dataset contains sound samples of [[Modern Persian]] combination of [[vowel]] and [[consonant]] phonemes from different speakers. Every sound sample contains just one consonant and one vowel So it is somehow labeled in phoneme level. This dataset contains of 23 Persian consonants and 6 vowels. The sound samples are all possible combinations of vowels and consonants (138 samples for each speaker). The sample rate of all speech samples is 48000 which means there are 48000 sound samples in every 1 second. Every sound sample starts with consonant then continues with vowel. In each sample, in average, 0.5 second of each sample is speech and the rest is silence. Each sound sample ends with silence.<ref>{{Cite journal|last=Malekzadeh|first=Saber|last2=Gholizadeh|first2=Mohammad Hossein|last3=Razavi|first3=Seyyed Naser|date=|title=Persian phonemes recognition using PPNet|url=https://arxiv.org/abs/1812.08600|journal=Journal of Signal Processing Systems|volume=|pages=|arxiv=1812.08600|doi=10.13140/RG.2.2.34836.96647|via=}}</ref> All of sound samples are denoised with "Adaptive noise reduction" algorithm.<ref>{{cite paper |title=PCVC Kaggle page |url=https://www.kaggle.com/sabermalek/pcvcspeech/home }}</ref>
Compared to Farsdat speech dataset<ref>Bijankhan, M., Sheikhzadegan, J., Roohani, M. R., Samareh, Y., Lucas, C., & Tebyani, M. (1994). FARSDAT-The Speech Database of Farsi Spoken Language. The Proceedings of the Australian Conference on Speech Science and Technology (Vol. 2, pp. 826–831).</ref> and Persian speech corpus<ref>Halabi, Nawar (2016). Modern Standard Persian Phonetics for Speech Synthesis. University of Southampton, School of Electronics and Computer Science.</ref> it is more easy to use because it is prepared in .mat data files.<ref>{{cite paper |title= Access and change variables directly in MAT-files, without loading into memory. |url=https://uk.mathworks.com/help/matlab/ref/matfile.html }}</ref> Also it is more based on phoneme based separation and all samples are denoised.
 
==Contents==
The corpus is downloadable from its Kaggle web page, and contains the following:
* .mat data files of sound samples in a 23*6*30000 matrix, in which 23 is number of consonants, 6 is the number of vowels and 30000 is the length of sound sample.
 
==See also==
Line 23 ⟶ 19:
 
[[Category:Datasets in machine learning]]
[[Category:Artificial intelligence]]
[[Category:Speech recognition]]
[[Category:Speaker recognition]]