Results for "data preservation"
Learning a function from input-output pairs (labeled data), optimizing performance on predicting outputs for unseen inputs.
Reusing knowledge from a source task/domain to improve learning on a target task/domain, typically via pretrained models.
Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
One complete traversal of the training dataset during training.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Letting an LLM call external functions/APIs to fetch data, compute, or take actions, improving reliability.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Model-generated content that is fluent but unsupported by evidence or incorrect; mitigated by grounding and verification.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.
Central system to store model versions, metadata, approvals, and deployment state.
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Observing model inputs/outputs, latency, cost, and quality over time to catch regressions and drift.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Updating beliefs about parameters using observed evidence and prior distributions.
Bayesian parameter estimation using the mode of the posterior distribution.
Using same parameters across different parts of a model.
A single attention mechanism within multi-head attention.
Techniques to handle longer documents without quadratic cost.
Chooses which experts process each token.
Capabilities that appear only beyond certain model sizes.
Central catalog of deployed and experimental models.
Probabilistic energy-based neural network with hidden variables.