Results for "attention weights"
Embedding signals to prove model ownership.
Simplified Boltzmann Machine with bipartite structure.
Probabilistic graphical model for structured prediction.
Monte Carlo method for state estimation.
Loss of old knowledge when learning new tasks.
Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Generates sequences one token at a time, conditioning on past tokens.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Extending agents with long-term memory stores.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Extracting system prompts or hidden instructions.
Models trained to decide when to call tools.
Maximizing reward without fulfilling real goal.
Assigning a role or identity to the model.
Breaking tasks into sub-steps.
Temporary reasoning space (often hidden).
Small prompt changes cause large output changes.
Controlling robots via language.
Deep learning system for protein structure prediction.