Results for "scaling effects"
Scaling law optimizing compute vs data.
Increasing model capacity via compute.
Increasing performance via more data.
Empirical laws linking model size, data, compute to performance.
Models effects of interventions (do(X=x)).
Dynamic resource allocation.
The degree to which predicted probabilities match true frequencies (e.g., 0.8 means ~80% correct).
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
Framework for reasoning about cause-effect relationships beyond correlation, often using structural assumptions and experiments.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Stochastic generation strategies that trade determinism for diversity; key knobs include temperature and nucleus sampling.
Scales logits before sampling; higher increases randomness/diversity, lower increases determinism.
Variability introduced by minibatch sampling during SGD.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Capabilities that appear only beyond certain model sizes.
Directed acyclic graph encoding causal relationships.
Formal model linking causal mechanisms and variables.
What would have happened under different conditions.
Probability of treatment assignment given covariates.
Cost to run models in production.
Cost of model training.
Minimum relative to nearby points.
Probabilities do not reflect true correctness.
Methods like Adam adjusting learning rates dynamically.
Control that remains stable under model uncertainty.
Motion considering forces and mass.
Mathematical representation of friction forces.
Systems where failure causes physical harm.
Mechanics of price formation.