Results for "matrix dimension"
A measure of a model class’s expressive capacity based on its ability to shatter datasets.
A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.
Matrix of second derivatives describing local curvature of loss.
Decomposes a matrix into orthogonal components; used in embeddings and compression.
Number of linearly independent rows or columns.
Matrix of first-order derivatives for vector-valued functions.
Optimization using curvature information; often expensive at scale.
Optimal estimator for linear dynamic systems.
Sensitivity of a function to input perturbations.
Matrix of curvature information.
Optimal control for linear systems with quadratic cost.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Encodes positional information via rotation in embedding space.
Models whose weights are publicly available.
Internal representation of the agent itself.
A narrow minimum often associated with poorer generalization.
Mathematical foundation for ML involving vector spaces, matrices, and linear transformations.
Vector whose direction remains unchanged under linear transformation.
Of predicted positives, the fraction that are truly positive; sensitive to false positives.
Of true positives, the fraction correctly identified; sensitive to false negatives.
Of true negatives, the fraction correctly identified.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
A broader capability to infer internal system state from telemetry, crucial for AI services and agents.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
A point where gradient is zero but is neither a max nor min; common in deep nets.
A wide basin often correlated with better generalization.