Domain: Speech & Audio AI
Acoustic Model
Intermediate
Maps audio signals to linguistic units.
Forced Alignment
Intermediate
Aligns transcripts with audio timestamps.
Neural Vocoder
Intermediate
Generates audio waveforms from spectrograms.
Prosody
Intermediate
Temporal and pitch characteristics of speech.
Speaker Diarization
Intermediate
Identifying speakers in audio.
Speech Recognition
Intermediate
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Speech Synthesis
Intermediate
Generating human-like speech from text.
Text-to-Speech
Intermediate
Generating speech audio from text, with control over prosody, speaker identity, and style.
Voice Conversion
Intermediate
Changing speaker characteristics while preserving content.
Wake Word Detection
Intermediate
Detects trigger phrases in audio streams.