Dataset

Mandarin-China Children Speech Dataset

  • Name: Mandarin-China Children Speech Dataset
  • Producer: Surfingtech
  • IPR Owner: Surfingtech
  • Language: Mandarin-China
  • Type: Reading
  • Hours: It contains a substantial 1,105 hours of recorded speech data.
  • Speaker Number: The dataset comprises recordings from a total of 10,060 speakers.
  • Gender Distribution: The gender distribution is well-balanced, with a 1:1 ratio between male and female speakers.
  • Scene: All recordings were conducted in a quiet room, ensuring minimal background noise and disturbances.
  • Recording Device: The data was captured using mainstream smartphones, making it easily accessible and reproducible.
  • Parameters: The recordings are stored as 16kHz, 16-bit, mono audio files in the .wav format.
  • Date: The dataset was created in 2017.