Results for "step size"
Diffusion model trained to remove noise step by step.
Choosing step size along gradient direction.
Popular optimizer combining momentum and per-parameter adaptive step sizes via first/second moment estimates.
Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.
Controls amount of noise added at each diffusion step.
Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Breaking documents into pieces for retrieval; chunk size/overlap strongly affect RAG quality.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Empirical laws linking model size, data, compute to performance.
Number of samples per gradient update; impacts compute efficiency, generalization, and stability.