Thanks for your response. But what I meant was that in the optimized case, where the rotations are pulled out of the sums

\sum_{k=0}^{n_2-1} \mathrm{Rot} ( \sum_{j=0}^{n_1-1} diag(M,kn_1 + j)'\cdot \mathrm{Rot}(v,j),kn_1 )

the diag functions appears with a with an apostrophe. This indeed needs to be another map than the original non primed diagonal function since if you check it for n=4 ( with n_1 = n_2 = 2) you get

diag(M,0) = [a,f,k,p]

diag(M,1) = [b,g,l,m]

diag(M,2) = [c,h,i,n]

diag(M,3) = [d,e,j,o]

trying to multiply that matrix with a vector v=[v_0,v_1,v_2,v_3] you get wrong terms in the sum where k=1 using the naive diag map:

\sum_{j=0}^{n_1-1} diag(M,kn_1 + j)\cdot \mathrm{Rot}(v,j) \overset{k=1}{=}
diag(M,2) \cdot \mathrm{Rot}(v,0) + diag(M,3) \cdot \mathrm{Rot}(v,1)

= [c,h,i,n] \cdot [v_0,v_1,v_2,v_3] + [d,e,j,o] \cdot [v_1,v_2,v_3,v_0] = [cv_0 + dv_1, \dots]

where we can directly see that v_0 should never appear in a term together with c if we want to recover matrix multiplication. So my question is, what is the correct diag(\dots)' map?

Greetings, and thanks in advance!