Results for "dynamics learning"
Central system to store model versions, metadata, approvals, and deployment state.
Logging hyperparameters, code versions, data snapshots, and results to reproduce and compare experiments.
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
Time from request to response; critical for real-time inference and UX.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.
System for running consistent evaluations across tasks, versions, prompts, and model settings.
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
Error due to sensitivity to fluctuations in the training dataset.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Measures divergence between true and predicted probability distributions.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Measures how one probability distribution diverges from another.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Updating beliefs about parameters using observed evidence and prior distributions.
Estimating parameters by maximizing likelihood of observed data.
Bayesian parameter estimation using the mode of the posterior distribution.
Optimization problems where any local minimum is global.
A point where gradient is zero but is neither a max nor min; common in deep nets.
The shape of the loss function over parameter space.
A wide basin often correlated with better generalization.
Limiting gradient magnitude to prevent exploding gradients.