Results for "partial derivatives"
Matrix of second derivatives describing local curvature of loss.
Matrix of first-order derivatives for vector-valued functions.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
A narrow minimum often associated with poorer generalization.
Direction of steepest ascent of a function.
Matrix of curvature information.
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.