Dense matrices are the default in deep learning, but most of those parameters are redundant. Block-diagonal structure offers a principled middle ground between dense and sparse, with real hardware advantages.