I have a usecase for OpenFHE, where I need plaintext-matrix A with ciphertext-vector v multiplication. I implemented this via the giant-step, baby-step method with A.v = \sum_{k=0}^{n_2-1} \mathrm{Rot} \left( \sum_{j=0}^{n_1-1} \mathrm{Rot}\left(diag(A, kn_1 + j), -kn_1\right) \cdot \mathrm{Rot} (v,j), kn_1 \right)
where n is the size of the matrix and n_1, n_2 factorize n, i.e. n = n_1\cdot n_2. Now the multiplication itself works really good, but when generating the rotation keys required I do run out of memory very quickly, especially when I try to meet security requirements and the circuit gets deeper. Is this operation just very memory intensive, or is there a smarter way of doing this?

This issue is expected. For the setting you described, you would need n_1 + n_2 - 2 rotation keys.

Theoretically speaking, you can reduce the number of rotation keys by applying the baby-step-giant-step key switching procedure in a hierarchical manner. In other words, the inner loop can be further split into two loops, and so on. The implementation will get more involved though.

You could also use one rotation key (rotate by 1) but have a more expensive key switching, applying this key i times to rotate by i. This is the most extreme option. Intermediate options are also possible where the number of rotation keys is smaller and the cost of key switching is reduced compared to the extreme option.

Another (more practical) option is to decompose A.v into matrix blocks of a smaller size and apply the linear transformation to each block. You could either store one block per ciphertext or multiple blocks (for more efficient packing, but more keys would be needed).