PCVC Speech Dataset: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 19:46, 31 March 2018 edit Sabemalek (talk \| contribs) 79 edits mNo edit summary ← Previous edit		Latest revision as of 17:15, 25 December 2022 edit undo Conan (talk \| contribs) Extended confirmed users 2,333 edits m Bot: Removing category Category:Artificial intelligence, bcoz it's already in Category:Applications of artificial intelligence
(43 intermediate revisions by 11 users not shown)
Line 1: The '''PCVC (Persian Consonant Vowel Combination) Speech Dataset''' is a [[Modern Persian]] [[speech corpus]] for [[speech recognition]] and also [[speaker recognition]]. The dataset contains sound samples of [[Modern Persian]] combination of [[vowel]] and [[consonant]] phonemes from different speakers. Every sound sample contains just one consonant and one vowel So it is somehow labeled in phoneme level. This dataset ~~contains~~consists of 23 Persian consonants and 6 vowels. The sound samples are all possible combinations of vowels and consonants (138 samples for each speaker). The sample rate of all speech samples is 48000 which means there are 48000 sound samples in every 1 second. Every sound sample isstarts ~~276~~with ~~seconds(138~~consonant ~~two~~then ~~seconds~~continues ~~samples)~~with vowel. In each 2s sample, in average, 0.5 second of each sample is speech and the rest is silence. ~~In each~~Each sound sample ~~0.25s~~ends ofwith ~~start and 0.25s of end of it is surely scilence~~silence.<ref>{{Cite journal\|last1=Malekzadeh\|first1=Saber ~~MalekzadeH,~~ \|last2=Gholizadeh\|first2=Mohammad Hossein ~~Gholizadeh, Seyed~~\|last3=Razavi\|first3=Seyyed Naser ~~Razavi {{cite paper~~ \|title=~~Full~~ Persian ~~Vowel~~phonemes recognition ~~with~~using ~~MFCC~~PPNet\|journal=Journal ~~and~~of ~~ANN~~Signal ~~on PCVC speech dataset~~Processing Systems\|~~url~~year=~~http://bayanbox~~2018\|arxiv=1812.ir08600\|doi=10.13140/~~download/2723849504007807268/Full-Persian-Vowel-recognition-with-MFCC-and-ANN-on-PCVC-speech-dataset~~RG.~~pdf~~2.2.34836.96647\|s2cid=214612057 }}</ref><ref>Malekzadeh, ~~5th International~~S., ~~conference of electrical engineering~~Gholizadeh, ~~computer science~~M.H. and ~~information technology~~Razavi, ~~Iran, Tehran~~S.N., 2018.~~</ref>~~ ~~Also~~Persian inVowel ~~each~~recognition 2swith ~~first~~MFCC ~~consonant~~and ~~phoneme~~ANN ~~pronounced~~on ~~and~~PCVC ~~then~~speech ~~vowel~~dataset. is''arXiv preprint arXiv:1812.06953''.</ref> All of sound samples are denoised with "Adaptive noise reduction" algorithm.<ref>{{cite ~~paper~~web \|title=PCVC ~~GitHub~~Kaggle page \|url=https://~~github~~www.kaggle.com/~~S-Malek~~sabermalek/~~PCVC~~pcvcspeech/home }}</ref>▼ ~~{{AFC submission\|\|\|ts=20180331194605\|u=Sabemalek\|ns=118}}~~ Compared to Farsdat speech dataset<ref>Bijankhan, M., Sheikhzadegan, J., Roohani, M. R., Samareh, Y., Lucas, C., & Tebyani, M. (1994). FARSDAT-The Speech Database of Farsi Spoken Language. The Proceedings of the Australian Conference on Speech Science and Technology (Vol. 2, pp. 826–831).</ref> and Persian ~~Speech~~speech ~~Corpus~~corpus<ref>Halabi, Nawar (2016). Modern Standard Persian Phonetics for Speech Synthesis. University of Southampton, School of Electronics and Computer Science.</ref> it is more easy to use because it is prepared in .mat data files.<ref>{{cite ~~paper~~web \|title= Access and change variables directly in MAT-files, without loading into memory. \|url=https://uk.mathworks.com/help/matlab/ref/matfile.html }}</ref> Also it is more based on phoneme based separation and ~~also~~all itsamples isare denoised.▼ ▲The '''PCVC Speech Dataset''' is a [[Modern Persian]] [[speech corpus]] for [[speech recognition]]. The dataset contains sound samples of [[Modern Persian]] combination of [[vowel]] and [[consonant]] phonemes from different speakers. Every sound sample contains just one consonant and one vowel So it is somehow labeled in phoneme level. This dataset contains of 23 Persian consonants and 6 vowels. The sound samples are all possible combinations of vowels and consonants (138 samples for each speaker). The sample rate of all speech samples is 48000 which means there are 48000 sound samples in every 1 second. Every sound sample is 276 seconds(138 two seconds samples). In each 2s sample, in average, 0.5 second of each sample is speech and the rest is silence. In each sound sample 0.25s of start and 0.25s of end of it is surely scilence.<ref> Saber MalekzadeH, Mohammad Hossein Gholizadeh, Seyed Naser Razavi {{cite paper \|title=Full Persian Vowel recognition with MFCC and ANN on PCVC speech dataset \|url=http://bayanbox.ir/download/2723849504007807268/Full-Persian-Vowel-recognition-with-MFCC-and-ANN-on-PCVC-speech-dataset.pdf }} 5th International conference of electrical engineering, computer science and information technology, Iran, Tehran, 2018.</ref> Also in each 2s first consonant phoneme pronounced and then vowel is. All of sound samples are denoised with "Adaptive noise reduction" algorithm.<ref>{{cite paper \|title=PCVC GitHub page \|url=https://github.com/S-Malek/PCVC }}</ref> ▲Compared to Farsdat speech dataset<ref>Bijankhan, M., Sheikhzadegan, J., Roohani, M. R., Samareh, Y., Lucas, C., & Tebyani, M. (1994). FARSDAT-The Speech Database of Farsi Spoken Language. The Proceedings of the Australian Conference on Speech Science and Technology (Vol. 2, pp. 826–831).</ref> and Persian Speech Corpus<ref>Halabi, Nawar (2016). Modern Standard Persian Phonetics for Speech Synthesis. University of Southampton, School of Electronics and Computer Science.</ref> it is more easy to use because it is prepared in .mat data files.<ref>{{cite paper \|title= Access and change variables directly in MAT-files, without loading into memory. \|url=https://uk.mathworks.com/help/matlab/ref/matfile.html }}</ref> Also it is more based on phoneme based separation and also it is denoised. ==Contents== The corpus is downloadable from its ~~GitHub~~Kaggle web page, and contains the following: * .mat data files of sound samples in a 23630000 matrix, in which 23 is number of consonants, 6 is the number of vowels and 30000 is the length of 2s sound sample. ==See also== Line 15 ⟶ 13: ==External links== * [https://~~github~~www.kaggle.com/~~S-Malek~~sabermalek/pcvcspeech/~~PCVC~~home The ~~GitHub~~Kaggle page of PCVC speech dataset] * [https://www.researchgate.net/publication/322298311_Full_Persian_Vowel_recognition_with_MFCC_and_ANN_on_PCVC_speech_dataset PCVC Paper on ResearchGate] {{Corpus linguistics}} [[:Category:~~Corpora~~Datasets in machine learning]] [[:Category:~~Datasets in machine~~Speech ~~learning~~recognition]] [[:Category:~~Persian~~Speaker ~~language~~recognition]] [[Category:Persian language]] [[Category:Speech synthesis]]