Results for "step optimization"
Average of squared residuals; common regression objective.
One complete traversal of the training dataset during training.
A gradient method using random minibatches for efficient training on large datasets.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
Constraining outputs to retrieved or provided sources, often with citation, to improve factual reliability.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Ordering training samples from easier to harder to improve convergence or generalization.
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.
Maliciously inserting or altering training data to implant backdoors or degrade performance.
Reconstructing a model or its capabilities via API queries or leaked artifacts.
AI focused on interpreting images/video: classification, detection, segmentation, tracking, and 3D understanding.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
Measures how much information an observable random variable carries about unknown parameters.
Estimating parameters by maximizing likelihood of observed data.
A narrow minimum often associated with poorer generalization.
A wide basin often correlated with better generalization.
Gradually increasing learning rate at training start to avoid divergence.
Attention mechanisms that reduce quadratic complexity.
Recovering training data from gradients.
Inferring sensitive features of training data.
Simultaneous Localization and Mapping for robotics.