Results for "compute-data-performance"
Models that define an energy landscape rather than explicit probabilities.
Learns the score (∇ log p(x)) for generative sampling.
CNNs applied to time series.
Two-network setup where generator fools a discriminator.
Belief before observing data.
Attention between different modalities.
Software pipeline converting raw sensor data into structured representations.
Running predictions on large datasets periodically.
Models estimating recidivism risk.
Learning physical parameters from data.
Learning only from current policy’s data.
Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.
One complete traversal of the training dataset during training.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
Model-generated content that is fluent but unsupported by evidence or incorrect; mitigated by grounding and verification.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
Updating beliefs about parameters using observed evidence and prior distributions.
Bayesian parameter estimation using the mode of the posterior distribution.
Using same parameters across different parts of a model.
A single attention mechanism within multi-head attention.
Probabilistic energy-based neural network with hidden variables.
Probabilistic graphical model for structured prediction.
Probabilistic model for sequential data with latent states.
Autoencoder using probabilistic latent variables and KL regularization.