Results for "images"
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Joint vision-language model aligning images and text.
AI applied to X-rays, CT, MRI, ultrasound, pathology slides.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
AI focused on interpreting images/video: classification, detection, segmentation, tracking, and 3D understanding.
Assigning category labels to images.
Recovering 3D structure from images.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
A continuous vector encoding of an item (word, image, user) such that semantic similarity corresponds to geometric closeness.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
The text (and possibly other modalities) given to an LLM to condition its output behavior.
Diffusion performed in latent space for efficiency.
Transformer applied to image patches.
Combining signals from multiple modalities.
Automated assistance identifying disease indicators.
Attention between different modalities.