Results for "supervised finetune"
Learning a function from input-output pairs (labeled data), optimizing performance on predicting outputs for unseen inputs.
Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
A subfield of AI where models learn patterns from data to make predictions or decisions, improving with experience rather than explicit rule-coding.
Learning structure from unlabeled data, such as discovering groups, compressing representations, or modeling data distributions.
A learning paradigm where an agent interacts with an environment and learns to choose actions to maximize cumulative reward.
A structured collection of examples used to train/evaluate models; quality, bias, and coverage often dominate outcomes.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
Measure of consistency across labelers; low agreement indicates ambiguous tasks or poor guidelines.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.
Assigning category labels to images.
Systematic error introduced by simplifying assumptions in a learning algorithm.
Asking model to review and improve output.
Applying learned patterns incorrectly.
Model trained on its own outputs degrades quality.
Predicts next state given current state and action.
Learning policies from expert demonstrations.
Learning action mapping directly from demonstrations.
Learned model of environment dynamics.
Inferring human goals from behavior.
AI-assisted review of legal documents.
Identifying suspicious transactions.
Deep learning system for protein structure prediction.
AI limited to specific domains.
Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.