Results for "long-term dependencies"
Extending agents with long-term memory stores.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
Persistent directional movement over time.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Mechanisms for retaining context across turns/sessions: scratchpads, vector memories, structured stores.
Compromising AI systems via libraries, models, or datasets.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
Balancing learning new behaviors vs exploiting known rewards.
Identifying speakers in audio.
Predicting future values from past observations.
Number of steps considered in planning.
Competitive advantage from proprietary models/data.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Generates sequences one token at a time, conditioning on past tokens.
Coordinating tools, models, and steps (retrieval, calls, validation) to deliver reliable end-to-end behavior.
Quantifies shared information between random variables.
Models that define an energy landscape rather than explicit probabilities.
Probabilistic graphical model for structured prediction.
Graphical model expressing factorization of a probability distribution.
Transformer applied to image patches.
Sequential data indexed by time.
CNNs applied to time series.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
Error due to sensitivity to fluctuations in the training dataset.
Systematic error introduced by simplifying assumptions in a learning algorithm.
Stores past attention states to speed up autoregressive decoding.
Attention mechanisms that reduce quadratic complexity.
Autoencoder using probabilistic latent variables and KL regularization.