Browse: Speech & Audio AI

Acoustic Model Intermediate

Maps audio signals to linguistic units.

Forced Alignment Intermediate

Aligns transcripts with audio timestamps.

Neural Vocoder Intermediate

Generates audio waveforms from spectrograms.

Prosody Intermediate

Temporal and pitch characteristics of speech.

Speaker Diarization Intermediate

Identifying speakers in audio.

Speech Recognition Intermediate

Converting audio speech into text, often using encoder-decoder or transducer architectures.

Speech Synthesis Intermediate

Generating human-like speech from text.

Text-to-Speech Intermediate

Generating speech audio from text, with control over prosody, speaker identity, and style.

Voice Conversion Intermediate

Changing speaker characteristics while preserving content.

Wake Word Detection Intermediate

Detects trigger phrases in audio streams.

Domain: Speech & Audio AI