Hello,
I’m wondering the following. Imagine that we have some vector v \in \mathbb{C}^n which we encode to the plaintext m \in \mathcal{R}. Let us assume also that n^2 \leq N/2 where N is the degree defining \mathcal{R}. What would be the best approach to generate all the rotations of v inside a CKKS ciphertext starting from ct = Enc(m) using OpenFHE? For example, if v = (1,2,3) then Decode(Dec(ct)) = (1,2,3,2,3,1,3,1,2,0,\dots,0)
I see two options:

Use EvalRot. We can start with ct and use masks to produce the corresponding output (as suggested in EvalRot behaviour). In our example, we would first multiply by mask = (0,1,1,0,\dots,0) obtaining ct’ such that Decode(Dec(ct’)) = (0,2,3,0,\dots,0). Then we can rotate using EvalRot to get a new ct’ such that Decode(Dec(ct’)) = (0,0,0,2,3,\dots,0). Doing the same with mask = (1,0,0,\dots,0) we get ct’’ such that Decode(Dec(ct’‘)) = (0,0,0,0,0,1,0,\dots,0). Finally, adding all ciphertexts we obtain the first rotation Decode(Dec(ct + ct’ + ct’')) = (1,2,3,2,3,1,0,\dots,0). I feel that this approach will introduce way too much error depending on the number of rotations. Also, there are a lot of rotation keys to generate.

Use EvalLinearTransform. If we generate a list that includes all rotations of a vector of size n, generate the corresponding matrix and then plug it into EvalLinearTransformPrecompute we could produce the vector “in one go”. Using the example above, we would generate the list (1,2,3,5,6,4,9,7,8,10,11,\dots,N/2) and transform it into the corresponding permutation matrix for EvalLinearTransformPrecompute, which we could then multiply by ct using EvalLinearTransform. The main problem with this idea is that the matrix generated will be huge depending on N. This idea also requires that we have n copies of the vector v inside ct, which can be generated using the approach of 1.
I feel that there should be some “HEfriendly” way to produce such a vector but for the moment these are the only options I could think of.
Thank you in advance!
There are multiple ways to do it. Here is one approach that requires n1 rotation keys, roughly log N regular rotations, and n1 fast rotations.
 Pad the vector to the next power of two n'.
 Clone the vector up to the full packing. You can do it with logarithmic complexity, i.e., using log N  log n'  1 rotations.
 Then generate n1 fast rotations (using hoisted automorphisms). You will have n ciphertexts.
 Mask the ciphertexts and add them up.
The third step can be improved to use a smaller number of keys using the babystepgiantstep approach (at the expense of adding regular, slow rotations).
This approach should work quite well if n is relatively small. For larger n, I would use the permutation matrix + diagonalization approach, similar to EvalLinearTransform, working with blocks of size n'.
These are some highlevel guidelines for possible options.
Thanks a lot Yuriy for the reply.
I have implemented both ideas and they work very well. I have a few followup questions:

For the second approach, I have used EvalLinearTransform to compute the vectormatrix multiplication. From your comment, I deduce there must be a better way to perform the matrixvector multiplication, can you point me towards the correct way to do this in OpenFHE?

Do you have an idea of the optimal threshold for n to switch from one approach to the other?

For very large n, having to produce the permutation matrix can be a problem since it is of size n' \times n'. Is there a workaround so we do not need to store this matrix?
I will just provide highlevel guidelines as more detailed answers would require some analysis.
EvalLinearTransform
is optimized for homomorphic encoding/decoding, which is used in CKKS bootstrapping. This typically uses one full plaintext per slot. This may be suboptimal when n' is small. It is better to pack multiple matrixvector products (in a SIMD fashion) in this case, i.e., write your own version of EvalLinearTransform
fitted for your needs.
 In general, you can break down a largematrixvector product into square matrix blocks of some small dimension, then pack multiple matrix blockvectorblock multiplications into single ciphertext products. The size of the block is a tunable parameter.
 When dealing with small blocks, you want to support cyclic rotations within a block (which typically requires two rotations instead of one per unit operation).
I will leave the rest up to you
Perfect, thanks again for help Yuriy.