With classical Transformer, we transform each word in sentence to an embedding vector d dimesional. So we get a matrix X of nxd dimensional for a sentence with n words. But each vector will be transformed on polynomial form in CKKS scheme. And we get a sequence of polynomial for this sentence. How do I calculate the attentions scores for this polynomial sequence?
You should not view the encoded values as polynomials, but rather as vectors.
Assuming that d < num_slots
, you can think of X as n of d-dimensional vectors. When you use CKKS operations such as adding or multiplying two encoded vectors (whether these vectors are plaintexts or ciphertexts), the result is an encoded vector of point-wise addition or multiplication.
In the attention layer, you need to compute the matrix product of X with the query, key, and value weight matrices. For the attention score, you need to process the query and key results and sqrt(d_K). Lastly, you need to compute the softmax_max
function which can be done by polynomial approximation. This all can be done in CKKS.
You can have a look at this paper which evaluated a simple transformer in CKKS.
It means I have wrong. I have encoded each row of the embedding Vector to a polynome