Search: text to audio — AI Glossary

Text-to-Speech Intermediate

Generating speech audio from text, with control over prosody, speaker identity, and style.

Speech & Audio AI

Wake Word Detection Intermediate

Detects trigger phrases in audio streams.

Speech & Audio AI

Neural Vocoder Intermediate

Generates audio waveforms from spectrograms.

Speech & Audio AI

Speech Recognition Intermediate

Converting audio speech into text, often using encoder-decoder or transducer architectures.

Speech & Audio AI

Speaker Diarization Intermediate

Identifying speakers in audio.

Speech & Audio AI

Acoustic Model Intermediate

Maps audio signals to linguistic units.

Speech & Audio AI

Forced Alignment Intermediate

Aligns transcripts with audio timestamps.

Speech & Audio AI

Speech Synthesis Intermediate

Generating human-like speech from text.

Speech & Audio AI

Multimodal Model Intermediate

Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.

Foundations & Theory

Tokenization Intermediate

Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.

Foundations & Theory

CLIP Intermediate

Joint vision-language model aligning images and text.

Computer Vision

Exteroception Advanced

External sensing of surroundings (vision, audio, lidar).

Robotics & Embodied AI

Language Model Intermediate

A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.

Large Language Models

Large Language Model Intermediate

A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.

Large Language Models

Multimodal Fusion Intermediate

Combining signals from multiple modalities.

Computer Vision

Transformer Intermediate

Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.

Transformers & LLMs

Autoregressive Model Intermediate

Generates sequences one token at a time, conditioning on past tokens.

Foundations & Theory

Prompt Intermediate

The text (and possibly other modalities) given to an LLM to condition its output behavior.

Prompting & Instructions

Attention Intermediate

Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.

Transformers & LLMs

Next-Token Prediction Intermediate

Training objective where the model predicts the next token given previous tokens (causal modeling).

Foundations & Theory

Data Labeling Intermediate

Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.

Foundations & Theory

Data Augmentation Intermediate

Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.

Foundations & Theory

Adversarial Example Intermediate

Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.

Foundations & Theory

Prompt Injection Intermediate

Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.

Foundations & Theory

NLP Intermediate

AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.

Foundations & Theory

Causal Mask Intermediate

Prevents attention to future tokens during training/inference.

AI Economics & Strategy

Key-Value Cache Intermediate

Stores past attention states to speed up autoregressive decoding.

AI Economics & Strategy

Generative Model Advanced

Models that learn to generate samples resembling training data.

Diffusion & Generative Models

Cross-Attention Intermediate

Attention between different modalities.

Computer Vision

Legal AI Intermediate

AI supporting legal research, drafting, and analysis.

AI in Law

Results for "text to audio"

Welcome to AI Glossary

Search

Browse

3D WordGraph