Speech Dataset Collection and Annotation

We provide a worldwide collection of speech datasets that are diverse, scalable, and meticulously transcribed, perfect for training machines to accurately recognize and understand different types of languages.
CONTACT US
How it Works
Algorithm
Development
Data Demand
Generation
Dataset
Definition/Design
Trial and
Improvement
Mass
production
Quality Control
Data Package
Delivery
Data Collection and Annotation
We have all kinds of audio classification dataset,for example speech command dataset, common voice dataset.Beside this there is North American voice dataset,African sound dataset,Asian audio datasets,European voices dataset.
Environments
Indoor
Studio
Outdoor
Incar
Devices
Mobile (iOS/Android)
Computer (Desktop/Laptop)
Pro (Hi-Fi recorder/Mic Array)
Speakers
Language: Chinese/English/French/German…
Gender balanced: 1:1
Age: Children/Senior
Education Background
Machine
annotate
Human
transcribe / Validate
Rounds QA by
human & machine
Voice Dataset Annotation
Accuracy between 95%~98%
Surfing Tech applies its own algorithm during speech dataset annotation to ensure high efficiency and accuracy. We achieve above 95% accuracy rate after three rounds of quality inspection, which makes the audio datasets more valuable for speech emotion recognition dataset, semantic understanding, and human-computer interaction.
Speech Data Portfolio
Speech Dataset
Over 30,000 hours of voice dataset collection
Multi-region audio datasets: Australia, North America, South America, Europe, Asia
Speech Data Age Range
Children mandarin speech emotion dataset 3-12 years
Adult audio dataset Senior Mandarin: 800 speakers
Accent
We have speech emotion recognition dataset in various regional dialects
Such as Hakka, English dialects,Hindi voice data, etc.
Central China: 1,000 speakers
Audio Dataset Environment
Indoor voice dataset,Customer service voice dataset
Work environment speech dataset
Language
English voice dataest: North American, Australian, Singapores
Multilingual voice data:Swahili speech data
Russian speech dataest,French voices dataset
Singaporean English,Kazakh speech dataests