Results for "efficiency"
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.
Removing weights or neurons to shrink models and improve efficiency; can be structured or unstructured.
Number of samples per gradient update; impacts compute efficiency, generalization, and stability.
Built-in assumptions guiding learning efficiency and generalization.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.
Hardware resources used for training/inference; constrained by memory bandwidth, FLOPs, and parallelism.
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
Reducing numeric precision of weights/activations to speed inference and reduce memory with acceptable accuracy loss.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Measures how much information an observable random variable carries about unknown parameters.
Early architecture using learned gates for skip connections.
A narrow hidden layer forcing compact representations.
Using same parameters across different parts of a model.
Routes inputs to subsets of parameters for scalable capacity.
Chooses which experts process each token.
All possible configurations an agent may encounter.
Balancing learning new behaviors vs exploiting known rewards.
Separates planning from execution in agent architectures.
Models trained to decide when to call tools.
Extending agents with long-term memory stores.
Controls amount of noise added at each diffusion step.
Diffusion performed in latent space for efficiency.
Exact likelihood generative models using invertible transforms.
Centralized repository for curated features.