Results for "performance"
Time from request to response; critical for real-time inference and UX.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Techniques that fine-tune small additional components rather than all weights to reduce compute and storage.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Mechanisms for retaining context across turns/sessions: scratchpads, vector memories, structured stores.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.
Reduction in uncertainty achieved by observing a variable; used in decision trees and active learning.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Measures how one probability distribution diverges from another.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Quantifies shared information between random variables.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
A narrow minimum often associated with poorer generalization.
A wide basin often correlated with better generalization.
Adjusting learning rate over training to improve convergence.
Gradually increasing learning rate at training start to avoid divergence.
A narrow hidden layer forcing compact representations.
Tradeoffs between many layers vs many neurons per layer.
Allows model to attend to information from different subspaces simultaneously.
Encodes positional information via rotation in embedding space.
Techniques to handle longer documents without quadratic cost.
Routes inputs to subsets of parameters for scalable capacity.
Extending agents with long-term memory stores.
Chooses which experts process each token.
Multiple agents interacting cooperatively or competitively.
Models evaluating and improving their own outputs.
Framework for identifying, measuring, and mitigating model risks.
Central catalog of deployed and experimental models.