Results for "text+image+audio"

17 results

Text-to-Speech Intermediate

Generating speech audio from text, with control over prosody, speaker identity, and style.

Speech & Audio AI
Speech Recognition Intermediate

Converting audio speech into text, often using encoder-decoder or transducer architectures.

Speech & Audio AI
Embedding Intermediate

A continuous vector encoding of an item (word, image, user) such that semantic similarity corresponds to geometric closeness.

Machine Learning
Semantic Segmentation Intermediate

Pixel-wise classification of image regions.

Computer Vision
Vision Transformer Intermediate

Transformer applied to image patches.

Computer Vision
Tokenization Intermediate

Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.

Foundations & Theory
Prompt Intermediate

The text (and possibly other modalities) given to an LLM to condition its output behavior.

Prompting & Instructions
Adversarial Example Intermediate

Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.

Foundations & Theory
CLIP Intermediate

Joint vision-language model aligning images and text.

Computer Vision
Speech Synthesis Intermediate

Generating human-like speech from text.

Speech & Audio AI
Acoustic Model Intermediate

Maps audio signals to linguistic units.

Speech & Audio AI
Forced Alignment Intermediate

Aligns transcripts with audio timestamps.

Speech & Audio AI
Wake Word Detection Intermediate

Detects trigger phrases in audio streams.

Speech & Audio AI
Speaker Diarization Intermediate

Identifying speakers in audio.

Speech & Audio AI
Neural Vocoder Intermediate

Generates audio waveforms from spectrograms.

Speech & Audio AI
Exteroception Advanced

External sensing of surroundings (vision, audio, lidar).

Robotics & Embodied AI
Image Classification Intermediate

Assigning category labels to images.

Computer Vision