Key-Value caching
Published:
What is Key-Value caching?
Key-value caching, as an optimization technique, focuses on improving the efficiency of the inference process in Large Language Models(LLMs) by reusing previously computed states. In simple terms, it’s a way for the model to “remember” previous calculations to avoid re-computing them for every new word it generates.