Pre-training Transformer model on encrypted data

Salay · May 17, 2024, 12:26pm

With classical Transformer, we transform each word in sentence to an embedding vector d dimesional. So we get a matrix X of nxd dimensional for a sentence with n words. But each vector will be transformed on polynomial form in CKKS scheme. And we get a sequence of polynomial for this sentence. How do I calculate the attentions scores for this polynomial sequence?

Caesar · May 17, 2024, 7:25pm

You should not view the encoded values as polynomials, but rather as vectors.

Assuming that d < num_slots, you can think of X as n of d-dimensional vectors. When you use CKKS operations such as adding or multiplying two encoded vectors (whether these vectors are plaintexts or ciphertexts), the result is an encoded vector of point-wise addition or multiplication.

In the attention layer, you need to compute the matrix product of X with the query, key, and value weight matrices. For the attention score, you need to process the query and key results and sqrt(d_K). Lastly, you need to compute the softmax_max function which can be done by polynomial approximation. This all can be done in CKKS.

You can have a look at this paper which evaluated a simple transformer in CKKS.

Salay · May 18, 2024, 2:51pm

It means I have wrong. I have encoded each row of the embedding Vector to a polynome

Topic		Replies	Views
Matrix encryption FHE Questions	3	130	June 7, 2024
I have some questions regarding CKKS scheme encoding, encryption and decryption after reading paper FHE Questions openfhe-help	1	58	November 15, 2024
Polynomial Multiplication in CKKS FHE Questions	8	506	November 6, 2023
Printing plaintext polynomial coefficients vs message vector slots Library Questions questions	1	42	August 15, 2024
Question about encoding process in CKKS FHE Questions	2	299	March 19, 2023

Pre-training Transformer model on encrypted data

Related topics