Search: contrastive vision-language

CLIP Intermediate

Joint vision-language model aligning images and text.

Computer Vision

Multimodal Model Intermediate

Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.

Foundations & Theory

Computer Vision Intermediate

AI focused on interpreting images/video: classification, detection, segmentation, tracking, and 3D understanding.

Computer Vision

Boltzmann Machine Intermediate

Probabilistic energy-based neural network with hidden variables.

Model Architectures

Restricted Boltzmann Machine Intermediate

Simplified Boltzmann Machine with bipartite structure.

Model Architectures

Language Model Intermediate

A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.

Large Language Models

Exteroception Advanced

External sensing of surroundings (vision, audio, lidar).

Robotics & Embodied AI

Deep Learning Intermediate

A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.

Deep Learning

Vision Transformer Intermediate

Transformer applied to image patches.

Computer Vision

Natural Language Instruction Frontier

Controlling robots via language.

World Models & Cognition

NLP Intermediate

AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.

Foundations & Theory

Object Detection Intermediate

Identifying and localizing objects in images, often with confidence scores and bounding rectangles.

Computer Vision

Segmentation Intermediate

Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.

Computer Vision

Large Language Model Intermediate

A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.

Large Language Models

Transfer Learning Intermediate

Reusing knowledge from a source task/domain to improve learning on a target task/domain, typically via pretrained models.

Machine Learning

Data Augmentation Intermediate

Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.

Foundations & Theory

Conditional Random Field Intermediate

Probabilistic graphical model for structured prediction.

Model Architectures

Narrow AI Frontier

AI limited to specific domains.

AGI & General Intelligence

Vocabulary Intermediate

The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.

Transformers & LLMs

Masked Language Model Intermediate

Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.

Foundations & Theory

Prompt Intermediate

The text (and possibly other modalities) given to an LLM to condition its output behavior.

Prompting & Instructions

Prompt Engineering Intermediate

Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.

Prompting & Instructions

Speech Recognition Intermediate

Converting audio speech into text, often using encoder-decoder or transducer architectures.

Speech & Audio AI

Convolutional Neural Network Intermediate

Networks using convolution operations with weight sharing and locality, effective for images and signals.

Neural Networks Computer Vision

Adversarial Example Intermediate

Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.

Foundations & Theory

Image Classification Intermediate

Assigning category labels to images.

Computer Vision

Instance Segmentation Intermediate

Pixel-level separation of individual object instances.

Computer Vision

Semantic Segmentation Intermediate

Pixel-wise classification of image regions.

Computer Vision

Optical Flow Intermediate

Pixel motion estimation between frames.

Computer Vision

SLAM Intermediate

Simultaneous Localization and Mapping for robotics.

Computer Vision

Results for "contrastive vision-language"

Welcome to AI Glossary

Search

Browse

3D WordGraph