Open-source single-GPU reproductions of Cartridges and STILL for neural KV-cache compaction [P]
Neural KV-cache compaction — using learned compression rather than heuristic eviction — is one of the more credible paths to running long-context LLMs without bleeding GPU memory. Cartridges and STILL are two recent...