Results for "long context"
A wide basin often correlated with better generalization.
Matrix of second derivatives describing local curvature of loss.
Neural networks can approximate any continuous function under certain conditions.
Early architecture using learned gates for skip connections.
Encodes positional information via rotation in embedding space.
Empirical laws linking model size, data, compute to performance.
Chooses which experts process each token.
All possible configurations an agent may encounter.
Strategy mapping states to actions.
Embedding signals to prove model ownership.
Models trained to decide when to call tools.
Neural networks that operate on graph-structured data by propagating information along edges.
Compromising AI systems via libraries, models, or datasets.
Graphical model expressing factorization of a probability distribution.
Pixel-wise classification of image regions.
End-to-end process for model training.
Interleaving reasoning and tool use.
Scaling law optimizing compute vs data.
Cost to run models in production.
Declining differentiation among models.
Vector whose direction remains unchanged under linear transformation.
Number of linearly independent rows or columns.
Sensitivity of a function to input perturbations.
Probability of data given parameters.
Minimum relative to nearby points.
Lowest possible loss.
Correctly specifying goals.
Methods like Adam adjusting learning rates dynamically.
Ensuring learned behavior matches intended objective.
Using limited human feedback to guide large models.