Results for "vision"
Computer Vision
IntermediateAI focused on interpreting images/video: classification, detection, segmentation, tracking, and 3D understanding.
Computer vision is like giving a computer the ability to see and understand pictures and videos, similar to how humans do. Just as you can recognize your friend's face in a crowd or identify a cat in a photo, computer vision allows machines to perform these tasks. It uses special algorithms and m...
AI focused on interpreting images/video: classification, detection, segmentation, tracking, and 3D understanding.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
External sensing of surroundings (vision, audio, lidar).
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.
Transformer applied to image patches.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
Reusing knowledge from a source task/domain to improve learning on a target task/domain, typically via pretrained models.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.
Probabilistic graphical model for structured prediction.
Assigning category labels to images.
Pixel-level separation of individual object instances.
Pixel-wise classification of image regions.
Joint vision-language model aligning images and text.
Pixel motion estimation between frames.
Simultaneous Localization and Mapping for robotics.
Recovering 3D structure from images.
Monte Carlo method for state estimation.
Devices measuring physical quantities (vision, lidar, force, IMU, etc.).
Software pipeline converting raw sensor data into structured representations.
Artificial sensor data generated in simulation.
Perceived actions an environment allows.
Interpreting human gestures.
AI limited to specific domains.