Results for "allocation systems"
Collective behavior without central control.
AI limited to specific domains.
Internal representation of the agent itself.
Tradeoff between safety and performance.
Ensuring AI allows shutdown.
Intelligence and goals are independent.
Research ensuring AI remains safe.
Inferring and aligning with human preferences.
A system that perceives state, selects actions, and pursues goals—often combining LLM reasoning with tools and memory.
Risk threatening humanity’s survival.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Breaking documents into pieces for retrieval; chunk size/overlap strongly affect RAG quality.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Model-generated content that is fluent but unsupported by evidence or incorrect; mitigated by grounding and verification.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.
Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.
Structured dataset documentation covering collection, composition, recommended uses, biases, and maintenance.
Central system to store model versions, metadata, approvals, and deployment state.
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
A broader capability to infer internal system state from telemetry, crucial for AI services and agents.
Observing model inputs/outputs, latency, cost, and quality over time to catch regressions and drift.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.
Reconstructing a model or its capabilities via API queries or leaked artifacts.