2 results
Stores past attention states to speed up autoregressive decoding.
Generates sequences one token at a time, conditioning on past tokens.