Results for "model-based"
The relationship between inputs and outputs changes over time, requiring monitoring and model updates.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
The learned numeric values of a model adjusted during training to minimize a loss function.
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
When a model cannot capture underlying structure, performing poorly on both training and test data.
How well a model performs on new data drawn from the same (or similar) distribution as training.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Model-generated content that is fluent but unsupported by evidence or incorrect; mitigated by grounding and verification.
Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
Local surrogate explanation method approximating model behavior near a specific input.
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
Observing model inputs/outputs, latency, cost, and quality over time to catch regressions and drift.
Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
System for running consistent evaluations across tasks, versions, prompts, and model settings.
Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.
System design where humans validate or guide model outputs, especially for high-stakes decisions.
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.
Measures a model’s ability to fit random noise; used to bound generalization error.