Results for "modality alignment"
Attention between different modalities.
Combining signals from multiple modalities.
Maintaining alignment under new conditions.
Aligns transcripts with audio timestamps.
Ensuring learned behavior matches intended objective.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Correctly specifying goals.
Tradeoff between safety and performance.
Model behaves well during training but not deployment.
Research ensuring AI remains safe.
Ensuring AI systems pursue intended human goals.
Tendency to gain control/resources.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Joint vision-language model aligning images and text.
Model exploits poorly specified objectives.
Maximizing reward without fulfilling real goal.
Review process before deployment.
Existential risk from AI systems.
Sudden jump to superintelligence.
Isolating AI systems.
Signals indicating dangerous behavior.
Ensuring AI allows shutdown.
Intelligence and goals are independent.
Goals useful regardless of final objective.
Designing AI to cooperate with humans and each other.
Inferring and aligning with human preferences.
Risk threatening humanity’s survival.