Results for "sequence modeling"
Samples from the k highest-probability tokens to limit unlikely outputs.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Coordinating tools, models, and steps (retrieval, calls, validation) to deliver reliable end-to-end behavior.
Stores past attention states to speed up autoregressive decoding.
Encodes positional information via rotation in embedding space.
Techniques to handle longer documents without quadratic cost.
Attention mechanisms that reduce quadratic complexity.
Continuous cycle of observation, reasoning, action, and feedback.
Separates planning from execution in agent architectures.
Sequential data indexed by time.
Monte Carlo method for state estimation.
Model execution path in production.
Control without feedback after execution begins.
Optimizing continuous action sequences.
Computing collision-free trajectories.
Fabrication of cases or statutes by LLMs.
Predicting protein 3D structure from sequence.
A system that perceives state, selects actions, and pursues goals—often combining LLM reasoning with tools and memory.
Learning structure from unlabeled data, such as discovering groups, compressing representations, or modeling data distributions.
The degree to which predicted probabilities match true frequencies (e.g., 0.8 means ~80% correct).
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Framework for reasoning about cause-effect relationships beyond correlation, often using structural assumptions and experiments.
Tracking where data came from and how it was transformed; key for debugging and compliance.
Updating beliefs about parameters using observed evidence and prior distributions.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Using same parameters across different parts of a model.
Allows model to attend to information from different subspaces simultaneously.
Routes inputs to subsets of parameters for scalable capacity.