Results for "architecture tradeoff"
Balancing learning new behaviors vs exploiting known rewards.
Tradeoff between safety and performance.
A conceptual framework describing error as the sum of systematic error (bias) and sensitivity to data (variance).
System-level design for general intelligence.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
A robust evaluation technique that trains/evaluates across multiple splits to estimate performance variability.
Systematic error introduced by simplifying assumptions in a learning algorithm.
Error due to sensitivity to fluctuations in the training dataset.
A parameterized mapping from inputs to outputs; includes architecture + learned parameters.
Early architecture using learned gates for skip connections.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Predicts masked tokens in a sequence, enabling bidirectional context; often used for embeddings rather than generation.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
Architecture that retrieves relevant documents (e.g., from a vector DB) and conditions generation on them to reduce hallucinations.
Tradeoffs between many layers vs many neurons per layer.
Transformer applied to image patches.
Chooses which experts process each token.
Models accessible only via service APIs.
When a model cannot capture underlying structure, performing poorly on both training and test data.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Letting an LLM call external functions/APIs to fetch data, compute, or take actions, improving reliability.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
System for running consistent evaluations across tasks, versions, prompts, and model settings.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.