Results for "rule-based"
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
Injects sequence order into Transformers, since attention alone is permutation-invariant.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
The text (and possibly other modalities) given to an LLM to condition its output behavior.
Letting an LLM call external functions/APIs to fetch data, compute, or take actions, improving reliability.
Architecture that retrieves relevant documents (e.g., from a vector DB) and conditions generation on them to reduce hallucinations.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Ordering training samples from easier to harder to improve convergence or generalization.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Central system to store model versions, metadata, approvals, and deployment state.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Reconstructing a model or its capabilities via API queries or leaked artifacts.
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Limiting gradient magnitude to prevent exploding gradients.
Built-in assumptions guiding learning efficiency and generalization.
Prevents attention to future tokens during training/inference.
A single attention mechanism within multi-head attention.
Encodes token position explicitly, often via sinusoids.
Routes inputs to subsets of parameters for scalable capacity.
Strategy mapping states to actions.
Expected return of taking action in a state.
Extending agents with long-term memory stores.