Results for "text+image+audio"

AdvertisementAd space — search-top

51 results

Text-to-Speech Intermediate

Generating speech audio from text, with control over prosody, speaker identity, and style.

Speech & Audio AI
Wake Word Detection Intermediate

Detects trigger phrases in audio streams.

Speech & Audio AI
Neural Vocoder Intermediate

Generates audio waveforms from spectrograms.

Speech & Audio AI
Speech Recognition Intermediate

Converting audio speech into text, often using encoder-decoder or transducer architectures.

Speech & Audio AI
CLIP Intermediate

Joint vision-language model aligning images and text.

Computer Vision
Speaker Diarization Intermediate

Identifying speakers in audio.

Speech & Audio AI
Acoustic Model Intermediate

Maps audio signals to linguistic units.

Speech & Audio AI
Forced Alignment Intermediate

Aligns transcripts with audio timestamps.

Speech & Audio AI
Vision Transformer Intermediate

Transformer applied to image patches.

Computer Vision
Speech Synthesis Intermediate

Generating human-like speech from text.

Speech & Audio AI
Segmentation Intermediate

Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.

Computer Vision
Image Classification Intermediate

Assigning category labels to images.

Computer Vision
Tokenization Intermediate

Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.

Foundations & Theory
Multimodal Model Intermediate

Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.

Foundations & Theory
Semantic Segmentation Intermediate

Pixel-wise classification of image regions.

Computer Vision
Language Model Intermediate

A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.

Large Language Models
Large Language Model Intermediate

A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.

Large Language Models
Exteroception Advanced

External sensing of surroundings (vision, audio, lidar).

Robotics & Embodied AI
Multimodal Fusion Intermediate

Combining signals from multiple modalities.

Computer Vision
Transformer Intermediate

Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.

Transformers & LLMs
3D Reconstruction Intermediate

Recovering 3D structure from images.

Computer Vision
Data Labeling Intermediate

Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.

Foundations & Theory
Data Augmentation Intermediate

Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.

Foundations & Theory
Generative Model Advanced

Models that learn to generate samples resembling training data.

Diffusion & Generative Models
Cross-Attention Intermediate

Attention between different modalities.

Computer Vision
Autoregressive Model Intermediate

Generates sequences one token at a time, conditioning on past tokens.

Foundations & Theory
Prompt Intermediate

The text (and possibly other modalities) given to an LLM to condition its output behavior.

Prompting & Instructions
Embedding Intermediate

A continuous vector encoding of an item (word, image, user) such that semantic similarity corresponds to geometric closeness.

Machine Learning
Convolutional Neural Network Intermediate

Networks using convolution operations with weight sharing and locality, effective for images and signals.

Neural Networks Computer Vision
Benchmark Intermediate

A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.

Evaluation & Benchmarking

Welcome to AI Glossary

The free, self-building AI dictionary. Help us keep it free—click an ad once in a while!

Search

Type any question or keyword into the search bar at the top.

Browse

Tap a letter in the A–Z bar to browse terms alphabetically, or filter by domain, industry, or difficulty level.

3D WordGraph

Fly around the interactive 3D graph to explore how AI concepts connect. Click any word to read its full definition.