Why won’t Llama 13B fit on my 4090

I recently gave a talk that a bunch of people liked, maybe I’ll write a proper blog post at some point but in the meantime, if you care to learn about the different kinds of memory overhead in Neural Networks and how to ballpark them I’ll bet you’ll enjoy this talk