Results for "accessible weights"
Models whose weights are publicly available.
Models accessible only via service APIs.
Methods to set starting weights to preserve signal/gradient scales across layers.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.
Techniques that fine-tune small additional components rather than all weights to reduce compute and storage.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Using same parameters across different parts of a model.
Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.
Reusing knowledge from a source task/domain to improve learning on a target task/domain, typically via pretrained models.
A parameterized mapping from inputs to outputs; includes architecture + learned parameters.
The learned numeric values of a model adjusted during training to minimize a loss function.
Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.
Logging hyperparameters, code versions, data snapshots, and results to reproduce and compare experiments.
Prevents attention to future tokens during training/inference.
Embedding signals to prove model ownership.
Simplified Boltzmann Machine with bipartite structure.
Probabilistic graphical model for structured prediction.
Monte Carlo method for state estimation.
Loss of old knowledge when learning new tasks.
Using markers to isolate context segments.
Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.