English General multi-modal audio/video dataset
High-Quality Multi-Modal Face Dataset to Empower Face Recognition Models
Professional-Grade Dataset with Multi-Angle Face Images, Videos and Audio
Leveraging years of expertise in image data processing, Surfing Tech provides high-quality datasets tailored for training and testing various facial recognition tasks to AI practitioners. The datastes contains 300 subjects, with each reciting 300 sentences. The recording setup includes 1 lapel mic for audio, 1 smartphone for audio, and another smartphone for video, which are manually synced.
The audio captured by smartphone has a sampling rate of 44.1KHz at 16bit depth. The video is recorded at 1440×1080 resolution at 60FPS frame rate. The audio from lapel mic is also sampled at 44.1KHz with 16bit depth. The three modalities are aligned manually into synchronous audio-visual data.
Capturing speech and facial expressions across a range of emotions, this dataset enables building robust multi-angle face recognition models with improved adaptability.
We welcome AI developers to use our face dataset solutions for expedited iterations and upgrades of facial recognition capabilities.