Results for "learning like humans"
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
Time from request to response; critical for real-time inference and UX.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
System for running consistent evaluations across tasks, versions, prompts, and model settings.
Stress-testing models for failures, vulnerabilities, policy violations, and harmful behaviors before release.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
Identifying and localizing objects in images, often with confidence scores and bounding rectangles.
Error due to sensitivity to fluctuations in the training dataset.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Measures divergence between true and predicted probability distributions.
Measures how one probability distribution diverges from another.
Updating beliefs about parameters using observed evidence and prior distributions.
Estimating parameters by maximizing likelihood of observed data.
Bayesian parameter estimation using the mode of the posterior distribution.
A point where gradient is zero but is neither a max nor min; common in deep nets.
The shape of the loss function over parameter space.
A wide basin often correlated with better generalization.
Limiting gradient magnitude to prevent exploding gradients.
Matrix of second derivatives describing local curvature of loss.
Allows gradients to bypass layers, enabling very deep networks.
The range of functions a model can represent.
Capabilities that appear only beyond certain model sizes.
Optimizing policies directly via gradient ascent on expected reward.
Extending agents with long-term memory stores.
Models trained to decide when to call tools.
Multiple agents interacting cooperatively or competitively.