Results for "trial-and-error"
Early signals disproportionately influence outcomes.
Sudden jump to superintelligence.
Stored compute or algorithms enabling rapid jumps.
Signals indicating dangerous behavior.
Tendency to gain control/resources.
Intelligence and goals are independent.
Goals useful regardless of final objective.
Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.
Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.
Minimizing average loss on training data; can overfit when data is limited or biased.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
A robust evaluation technique that trains/evaluates across multiple splits to estimate performance variability.
One complete traversal of the training dataset during training.
Halting training when validation performance stops improving to reduce overfitting.
Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.
Local surrogate explanation method approximating model behavior near a specific input.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Allows gradients to bypass layers, enabling very deep networks.
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.
Encodes token position explicitly, often via sinusoids.
Routes inputs to subsets of parameters for scalable capacity.
Recovering training data from gradients.
Learning from data generated by a different policy.
Transformer applied to image patches.
Persistent directional movement over time.
Identifying abrupt changes in data generation.
Number of linearly independent rows or columns.
Sensitivity of a function to input perturbations.
Direction of steepest ascent of a function.