Results for "representation learning"
Representation Learning
IntermediateAutomatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.
Representation learning is like teaching a computer to understand the essence of data without needing someone to explain every detail. Imagine trying to recognize different animals in pictures. Instead of manually pointing out features like fur color or size, a representation learning model can a...
Methods to set starting weights to preserve signal/gradient scales across layers.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
An RNN variant using gates to mitigate vanishing gradients and capture longer context.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).
Local surrogate explanation method approximating model behavior near a specific input.
Framework for reasoning about cause-effect relationships beyond correlation, often using structural assumptions and experiments.
Protecting data during network transfer and while stored; essential for ML pipelines handling sensitive data.
A hidden variable influences both cause and effect, biasing naive estimates of causal impact.
Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Central system to store model versions, metadata, approvals, and deployment state.
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
Logging hyperparameters, code versions, data snapshots, and results to reproduce and compare experiments.
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
Time from request to response; critical for real-time inference and UX.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Converts logits to probabilities by exponentiation and normalization; common in classification and LMs.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.