Results for "trial-and-error"
Injects sequence order into Transformers, since attention alone is permutation-invariant.
Training objective where the model predicts the next token given previous tokens (causal modeling).
The text (and possibly other modalities) given to an LLM to condition its output behavior.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
Ordering training samples from easier to harder to improve convergence or generalization.
Protecting data during network transfer and while stored; essential for ML pipelines handling sensitive data.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Stochastic generation strategies that trade determinism for diversity; key knobs include temperature and nucleus sampling.
A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
System design where humans validate or guide model outputs, especially for high-stakes decisions.
Mechanisms for retaining context across turns/sessions: scratchpads, vector memories, structured stores.
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
A measure of randomness or uncertainty in a probability distribution.
Measures how one probability distribution diverges from another.
Measures how much information an observable random variable carries about unknown parameters.
Estimating parameters by maximizing likelihood of observed data.
Optimization with multiple local minima/saddle points; typical in neural networks.
Built-in assumptions guiding learning efficiency and generalization.
Fundamental recursive relationship defining optimal value functions.
Formal framework for sequential decision-making under uncertainty.
Balancing learning new behaviors vs exploiting known rewards.
Expected cumulative reward from a state or state-action pair.
Extending agents with long-term memory stores.
Expected return of taking action in a state.
Coordination arising without explicit programming.
Legal or policy requirement to explain AI decisions.