I’m trying to understand the DCRT representation. The reference cited in the documentation ( Homomorphic Evaluation of the AES Circuit) has a concise summary in section A.3. but it’s a couple of levels beyond me. Does any one know of a good reference to look at that takes a bit more time to explain it?

@benreynwar I am not sure if there is a more elaborate reference out there that explains DCRT representation in a better way. I will try to give a high-level explanation that might be helpful.

Essentially, DCRT representation is a mathematical trick used to perform polynomial arithmetic in a more efficient way, compared with classic or school-book methods.

In FHE, the basic mathematical structure used is a polynomial ring (denoted by R_q) of dimension n = phi(m), where m is the index of the m-th cyclotomic polynomial, phi is Euler’s totient function, and q is the coefficient modulus of the ring. n, m, and q are all positive integers. You can view this ring as the set of polynomials of degree less than n with integer coefficients in {-q/2, …, q/2}. In this ring, we need to add, subtract, and multiply polynomials efficiently. DCRT helps us in doing that as follows.

In RLWE-based FHE schemes (BGV, BFV and CKKS), the coefficient modulus q is a very large integer and can be a few hundreds of bits in size. That means we would need to use multi-precision arithmetic to perform polynomial arithmetic in R_q. Multi-precision arithmetic is known to be inefficient due to its serial nature. DCRT helps us convert these multi-precision arithmetic operations into native arithmetic (that is, arithmetic that can be done using processor-register-size operations), this can be done as follows:

A polynomial in R_q can be viewed as an n-vector of multi-precision integers, each coefficient is of size q at most. In the trade, this representation is usually known as the human-readable form of polynomials. Let’s denote this representation by ZX.

Now, the coefficient modulus q is usually set as a composite integer, which is the product of (t+1) small primes (p_0, p_1, …, p_t). Small here can be chosen arbitrarily, but it is close to the processor register size. Nowadays, the typical sizes of these primes are 32, 40, 50, or 60 bits). This allows us to factor ZX into smaller polynomials (known as towers, residues or limbs), thanks to the Chinese Remainder Theorem (CRT). As a result, ZX is converted into a matrix of dimensions (t+1) x n. The coefficients of this matrix are native integers (that is of a size that is close to the processor register size). This representation is known as a coefficient, CRT or Residual Number System (RNS) representation. If we have two polynomials A and B in CRT representation, we can add or subtract them via coefficient-wise modular addition or subtraction, that is, C_ij = A_ij (+ or -) B_ij mod p_i. You might have noticed how embarrassingly parallel these coefficient-wise matrix operations are. Moreover, we only need native processor operations to perform these modular arithmetic operations.

What remains now is polynomial multiplication in R_q. While we can do multiplication in CRT representation, it is extremely inefficient (computational complexity is quadratic in n of q-size operations). We can do better if we represent CRT polynomials in what is known as evaluation, Double CRT (DCRT), or Number-Theoretic Transform (NTT) representation.

To calculate DCRT from CRT, we evaluate each tower (i) in the CRT matrix at the powers of m-th roots of unity in the finite field GF(p_i), there are standard number theory methods to find such roots, and they are guaranteed to exist if p_i meets some conditions. This evaluation can be done efficiently via NTT (computational complexity is nlog(n) p_i-size or native operations). The inverse NTT (INTT) can be used to compute CRT from DCRT. The DCRT matrix is of the same size as the CRT matrix, i.e., (t+1) x n. Given two DCRT polynomials A and B, the product C can be calculated via coefficient-wise modular multiplication, that is, C_ij = A_ij * B_ij mod p_i. The computational cost of multiplying polynomials in R_q is now 2 NTTs, (t+1) x n coefficient-size native modular multiplications and 1 INTT, that is (3nlog(n) + (t+1) x n) which is more efficient than the quadratic complexity of school-book multiplication methods. It should be noted that we can also add or subtract polynomials in DCRT representation via coefficient-wise modular addition or subtraction (similar to what was described above in CRT representation).

Note that conversion back from DCRT to CRT may not always be needed, and we can do further operations in DCRT representation.