Results for "state-action value"
AI systems that perceive and act in the physical world through sensors and actuators.
Optimizing continuous action sequences.
Imagined future trajectories.
Control shared between human and agent.
Of predicted positives, the fraction that are truly positive; sensitive to false positives.
Scalar summary of ROC; measures ranking ability, not calibration.
Systematic error introduced by simplifying assumptions in a learning algorithm.
A wide basin often correlated with better generalization.
Estimating parameters by maximizing likelihood of observed data.
Gradually increasing learning rate at training start to avoid divergence.
A single attention mechanism within multi-head attention.
Competitive advantage from proprietary models/data.
Models effects of interventions (do(X=x)).
Number of linearly independent rows or columns.
Variable whose values depend on chance.
Describes likelihoods of random variable outcomes.
Measure of spread around the mean.
Normalized covariance.
Sampling from easier distribution with reweighting.
Minimum relative to nearby points.
Lowest possible loss.
Choosing step size along gradient direction.
Returns above benchmark.
Quantifying financial risk.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Stepwise reasoning patterns that can improve multi-step tasks; often handled implicitly or summarized for safety/privacy.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Central system to store model versions, metadata, approvals, and deployment state.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
GNN framework where nodes iteratively exchange and aggregate messages from neighbors.