Results for "learning rate"
Learning Rate
IntermediateControls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
Think of the learning rate as the size of your steps when walking towards a destination. If you take giant steps, you might overshoot and miss your goal, but if you take tiny steps, you might take forever to get there. In machine learning, the learning rate controls how big of a change we make to...
A robust evaluation technique that trains/evaluates across multiple splits to estimate performance variability.
A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.
Scalar summary of ROC; measures ranking ability, not calibration.
A proper scoring rule measuring squared error of predicted probabilities for binary outcomes.
Penalizes confident wrong predictions heavily; standard for classification and language modeling.
Average of squared residuals; common regression objective.
One complete traversal of the training dataset during training.
Halting training when validation performance stops improving to reduce overfitting.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Methods to set starting weights to preserve signal/gradient scales across layers.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).
Local surrogate explanation method approximating model behavior near a specific input.
Framework for reasoning about cause-effect relationships beyond correlation, often using structural assumptions and experiments.
Protecting data during network transfer and while stored; essential for ML pipelines handling sensitive data.
A hidden variable influences both cause and effect, biasing naive estimates of causal impact.
Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.