Results for "text+image+audio"
AI focused on interpreting images/video: classification, detection, segmentation, tracking, and 3D understanding.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Early architecture using learned gates for skip connections.
Allows model to attend to information from different subspaces simultaneously.
Routes inputs to subsets of parameters for scalable capacity.
Autoencoder using probabilistic latent variables and KL regularization.
Two-network setup where generator fools a discriminator.
Pixel-level separation of individual object instances.
Pixel motion estimation between frames.
Decomposes a matrix into orthogonal components; used in embeddings and compression.
Software pipeline converting raw sensor data into structured representations.
AI applied to X-rays, CT, MRI, ultrasound, pathology slides.
Automated assistance identifying disease indicators.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Prevents attention to future tokens during training/inference.
Stores past attention states to speed up autoregressive decoding.
AI supporting legal research, drafting, and analysis.