Results for "sequences"
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Differences between training and inference conditions.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Generates sequences one token at a time, conditioning on past tokens.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Scales logits before sampling; higher increases randomness/diversity, lower increases determinism.
Stores past attention states to speed up autoregressive decoding.
Encodes token position explicitly, often via sinusoids.
Techniques to handle longer documents without quadratic cost.
Attention mechanisms that reduce quadratic complexity.
Graphs containing multiple node or edge types with different semantics.
Recovering 3D structure from images.
Optimizing continuous action sequences.
Imagined future trajectories.
Deep learning system for protein structure prediction.