Results for "compute-data-performance"
Learning a function from input-output pairs (labeled data), optimizing performance on predicting outputs for unseen inputs.
Reusing knowledge from a source task/domain to improve learning on a target task/domain, typically via pretrained models.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Central system to store model versions, metadata, approvals, and deployment state.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Techniques to handle longer documents without quadratic cost.
Chooses which experts process each token.
Model execution path in production.
Central catalog of deployed and experimental models.
Maintaining alignment under new conditions.
Centralized AI expertise group.
High-fidelity virtual model of a physical system.
Predicting case success probabilities.
A conceptual framework describing error as the sum of systematic error (bias) and sensitivity to data (variance).
Protecting data during network transfer and while stored; essential for ML pipelines handling sensitive data.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Ordering training samples from easier to harder to improve convergence or generalization.
Time from request to response; critical for real-time inference and UX.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Mechanisms for retaining context across turns/sessions: scratchpads, vector memories, structured stores.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.