English General multi-modal audio/video dataset

High-Quality Multi-Modal Face Dataset to Empower Face Recognition Models

Professional-Grade Dataset with Multi-Angle Face Images, Videos and Audio

Leveraging years of expertise in image data processing, Surfing Tech provides high-quality datasets tailored for training and testing various facial recognition tasks to AI practitioners. The datastes contains 300 subjects, with each reciting 300 sentences. The recording setup includes 1 lapel mic for audio, 1 smartphone for audio, and another smartphone for video, which are manually synced.
The audio captured by smartphone has a sampling rate of 44.1KHz at 16bit depth. The video is recorded at 1440×1080 resolution at 60FPS frame rate. The audio from lapel mic is also sampled at 44.1KHz with 16bit depth. The three modalities are aligned manually into synchronous audio-visual data.
Capturing speech and facial expressions across a range of emotions, this dataset enables building robust multi-angle face recognition models with improved adaptability.
We welcome AI developers to use our face dataset solutions for expedited iterations and upgrades of facial recognition capabilities.

Total Price : ???? Get data sample Ask for a quote

Data Name	English General multi-modal audio/video dataset
Producer	Surfingtech
IPR Ownership	Surfingtech
Quantity	100 ID
Ethnicity	English Speaker
Background	indoor
Captured Device	smartphone, mic
Details (each person)	each subject speak 300 sentence. There are 3 devices: 1 smartphone recording video+1 smartphone recording audio+1 mic recording audio, and aligned manually.