Search: attention weights

Attention Head Intermediate

A single attention mechanism within multi-head attention.

AI Economics & Strategy

Sparse Attention Intermediate

Attention mechanisms that reduce quadratic complexity.

AI Economics & Strategy

Attention Intermediate

Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.

Transformers & LLMs

Self-Attention Intermediate

Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.

Transformers & LLMs

Graph Attention Network Intermediate

GNN using attention to weight neighbor contributions dynamically.

Model Architectures

Cross-Attention Intermediate

Attention between different modalities.

Computer Vision

Transformer Intermediate

Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.

Transformers & LLMs

Open-Weight Model Intermediate

Models whose weights are publicly available.

AI Economics & Strategy

Multi-Head Attention Intermediate

Allows model to attend to information from different subspaces simultaneously.

AI Economics & Strategy

Causal Mask Intermediate

Prevents attention to future tokens during training/inference.

AI Economics & Strategy

Weight Initialization Intermediate

Methods to set starting weights to preserve signal/gradient scales across layers.

Foundations & Theory

Pruning Intermediate

Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.

Foundations & Theory

Context Compression Intermediate

Techniques to handle longer documents without quadratic cost.

AI Economics & Strategy

Delimited Prompt Intro

Using markers to isolate context segments.

Prompting & Instructions

Fine-Tuning Intermediate

Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.

Large Language Models

Parameter-Efficient Fine-Tuning Intermediate

Techniques that fine-tune small additional components rather than all weights to reduce compute and storage.

Foundations & Theory

Quantization Intermediate

Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.

Foundations & Theory

Parameter Sharing Intermediate

Using same parameters across different parts of a model.

AI Economics & Strategy

Closed Model Intermediate

Models accessible only via service APIs.

AI Economics & Strategy

Positional Encoding Intermediate

Injects sequence order into Transformers, since attention alone is permutation-invariant.

Foundations & Theory

Interpretability Intermediate

Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).

Foundations & Theory

Key-Value Cache Intermediate

Stores past attention states to speed up autoregressive decoding.

AI Economics & Strategy

Vision Transformer Intermediate

Transformer applied to image patches.

Computer Vision

Transfer Learning Intermediate

Reusing knowledge from a source task/domain to improve learning on a target task/domain, typically via pretrained models.

Machine Learning

Model Intermediate

A parameterized mapping from inputs to outputs; includes architecture + learned parameters.

Foundations & Theory

Parameters Intermediate

The learned numeric values of a model adjusted during training to minimize a loss function.

Foundations & Theory

Learning Rate Intermediate

Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.

Foundations & Theory

Neural Network Intermediate

A parameterized function composed of interconnected units organized in layers with nonlinear activations.

Neural Networks

Exploding Gradient Intermediate

Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.

Foundations & Theory

Experiment Tracking Intermediate

Logging hyperparameters, code versions, data snapshots, and results to reproduce and compare experiments.

Evaluation & Benchmarking

Results for "attention weights"

Welcome to AI Glossary

Search

Browse

3D WordGraph