Results for "tokens set"
Models effects of interventions (do(X=x)).
Stores past attention states to speed up autoregressive decoding.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Transformer applied to image patches.
Cost to run models in production.
Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
A continuous vector encoding of an item (word, image, user) such that semantic similarity corresponds to geometric closeness.
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
How well a model performs on new data drawn from the same (or similar) distribution as training.
Uses an exponential moving average of gradients to speed convergence and reduce oscillation.
Popular optimizer combining momentum and per-parameter adaptive step sizes via first/second moment estimates.
Halting training when validation performance stops improving to reduce overfitting.
Methods to set starting weights to preserve signal/gradient scales across layers.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Constraining outputs to retrieved or provided sources, often with citation, to improve factual reliability.
Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.
Techniques that fine-tune small additional components rather than all weights to reduce compute and storage.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
System for running consistent evaluations across tasks, versions, prompts, and model settings.
Reconstructing a model or its capabilities via API queries or leaked artifacts.
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.
Estimating parameters by maximizing likelihood of observed data.
Neural networks can approximate any continuous function under certain conditions.
Using same parameters across different parts of a model.
Built-in assumptions guiding learning efficiency and generalization.
All possible configurations an agent may encounter.
Recovering training data from gradients.
Probabilistic model for sequential data with latent states.