Results for "step size"
Choosing step size along gradient direction.
Diffusion model trained to remove noise step by step.
Number of samples per gradient update; impacts compute efficiency, generalization, and stability.
Empirical laws linking model size, data, compute to performance.
Adjusting learning rate over training to improve convergence.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Breaking documents into pieces for retrieval; chunk size/overlap strongly affect RAG quality.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.
Controls amount of noise added at each diffusion step.
Generative model that learns to reverse a gradual noise process.
Diffusion performed in latent space for efficiency.
Monte Carlo method for state estimation.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Matrix of second derivatives describing local curvature of loss.
Optimization under uncertainty.
Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
Capabilities that appear only beyond certain model sizes.
Increasing performance via more data.
Popular optimizer combining momentum and per-parameter adaptive step sizes via first/second moment estimates.
Architecture that retrieves relevant documents (e.g., from a vector DB) and conditions generation on them to reduce hallucinations.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Samples from the k highest-probability tokens to limit unlikely outputs.
Coordinating tools, models, and steps (retrieval, calls, validation) to deliver reliable end-to-end behavior.
Prevents attention to future tokens during training/inference.
Balancing learning new behaviors vs exploiting known rewards.
Models time evolution via hidden states.
Breaking tasks into sub-steps.