Results for "response time"
Samples from the k highest-probability tokens to limit unlikely outputs.
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Adjusting learning rate over training to improve convergence.
Using same parameters across different parts of a model.
Prevents attention to future tokens during training/inference.
Attention mechanisms that reduce quadratic complexity.
Routes inputs to subsets of parameters for scalable capacity.
Formal framework for sequential decision-making under uncertainty.
Models evaluating and improving their own outputs.
Expected cumulative reward from a state or state-action pair.
Probabilistic model for sequential data with latent states.
Optimizing policies directly via gradient ascent on expected reward.
Diffusion model trained to remove noise step by step.
Models trained to decide when to call tools.
Controls amount of noise added at each diffusion step.
Generative model that learns to reverse a gradual noise process.
Pixel motion estimation between frames.
Generates audio waveforms from spectrograms.
Predicting future values from past observations.
Model execution path in production.
Running new model alongside production without user impact.
Running predictions on large datasets periodically.
Centralized repository for curated features.
Shift in model outputs.
Interleaving reasoning and tool use.
Using production outcomes to improve models.
Organizational uptake of AI technologies.
Differences between training and inference conditions.
Startup latency for services.