Domain: Computer Vision
Recovering 3D structure from images.
Joint vision-language model aligning images and text.
AI focused on interpreting images/video: classification, detection, segmentation, tracking, and 3D understanding.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Attention between different modalities.
Assigning category labels to images.
Pixel-level separation of individual object instances.
Combining signals from multiple modalities.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Pixel motion estimation between frames.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
Pixel-wise classification of image regions.
Simultaneous Localization and Mapping for robotics.
Transformer applied to image patches.