Search: Shapley values

x-Risk Advanced

Existential risk from AI systems.

AI Safety & Alignment

Alignment Tax Advanced

Tradeoff between safety and performance.

AI Safety & Alignment

AI Boxing Advanced

Isolating AI systems.

AI Safety & Alignment

Tripwire Advanced

Signals indicating dangerous behavior.

AI Safety & Alignment

Power-Seeking Behavior Advanced

Tendency to gain control/resources.

AI Safety & Alignment

Orthogonality Thesis Advanced

Intelligence and goals are independent.

AI Safety & Alignment

Alignment Research Intermediate

Research ensuring AI remains safe.

Governance & Ethics

Cooperative AI Intermediate

Designing AI to cooperate with humans and each other.

Governance & Ethics

Differential Privacy Intermediate

A formal privacy framework ensuring outputs do not reveal much about any single individual’s data contribution.

Security & Privacy

Rademacher Complexity Intermediate

Measures a model’s ability to fit random noise; used to bound generalization error.

AI Economics & Strategy

On-Policy Learning Intermediate

Learning only from current policy’s data.

AI Economics & Strategy

Existential Risk Advanced

Risk threatening humanity’s survival.

AI Safety & Alignment

State Estimation Advanced

Inferring the agent’s internal state from noisy sensor data.

Robotics & Embodied AI

Results for "Shapley values"