KV Cache
KV Cache stands for Key-value Cache. It is a commonly used technology for optimizing the reasoning performance of large models. This technology can improve the reasoning performance by exchanging space for time without affecting any calculation accuracy. KV Cache is an important engineering technology for optimizing Transformer reasoning performance.All major inference frameworks have implemented and encapsulated it (for example, the generate function of the transformers library has encapsulated it, and users do not need to manually pass in past_key_values) and it is enabled by default (use_cache=True in the config.json file).
References
【1】https://zhuanlan.zhihu.com/p/630832593