Results for "text to audio"

AdvertisementAd space — search-top

30 results

Text-to-Speech Intermediate

Generating speech audio from text, with control over prosody, speaker identity, and style.

Speech & Audio AI
Wake Word Detection Intermediate

Detects trigger phrases in audio streams.

Speech & Audio AI
Neural Vocoder Intermediate

Generates audio waveforms from spectrograms.

Speech & Audio AI
Speech Recognition Intermediate

Converting audio speech into text, often using encoder-decoder or transducer architectures.

Speech & Audio AI
Speaker Diarization Intermediate

Identifying speakers in audio.

Speech & Audio AI
Acoustic Model Intermediate

Maps audio signals to linguistic units.

Speech & Audio AI
Forced Alignment Intermediate

Aligns transcripts with audio timestamps.

Speech & Audio AI
Speech Synthesis Intermediate

Generating human-like speech from text.

Speech & Audio AI
Tokenization Intermediate

Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.

Foundations & Theory
CLIP Intermediate

Joint vision-language model aligning images and text.

Computer Vision
Multimodal Model Intermediate

Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.

Foundations & Theory
Language Model Intermediate

A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.

Large Language Models
Large Language Model Intermediate

A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.

Large Language Models
Exteroception Advanced

External sensing of surroundings (vision, audio, lidar).

Robotics & Embodied AI
Multimodal Fusion Intermediate

Combining signals from multiple modalities.

Computer Vision
Transformer Intermediate

Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.

Transformers & LLMs
Autoregressive Model Intermediate

Generates sequences one token at a time, conditioning on past tokens.

Foundations & Theory
Prompt Intermediate

The text (and possibly other modalities) given to an LLM to condition its output behavior.

Prompting & Instructions
Attention Intermediate

Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.

Transformers & LLMs
Next-Token Prediction Intermediate

Training objective where the model predicts the next token given previous tokens (causal modeling).

Foundations & Theory
Data Labeling Intermediate

Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.

Foundations & Theory
Data Augmentation Intermediate

Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.

Foundations & Theory
Adversarial Example Intermediate

Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.

Foundations & Theory
Prompt Injection Intermediate

Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.

Foundations & Theory
NLP Intermediate

AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.

Foundations & Theory
Causal Mask Intermediate

Prevents attention to future tokens during training/inference.

AI Economics & Strategy
Key-Value Cache Intermediate

Stores past attention states to speed up autoregressive decoding.

AI Economics & Strategy
Generative Model Advanced

Models that learn to generate samples resembling training data.

Diffusion & Generative Models
Cross-Attention Intermediate

Attention between different modalities.

Computer Vision
Legal AI Intermediate

AI supporting legal research, drafting, and analysis.

AI in Law

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.