I know that many questions have been posted about this topic. What I don’t understand is how this paper can get such small times. The paper was referenced in this post.
According to the authors:
Blockquote
Our implementation of the less-than function in the HElib library is up to 3 times faster than the prior work based on BGV/BFV. It allows to compare a pair of 64-bit integers in 11 milliseconds[…]on an average laptop without multithreading
So, 11 milliseconds to compare two 64-bit integers on an average laptop.
The main idea of their solution is to compute the lower-than (LT) function using univariate interpolation (Z is the difference between the two numbers):
\alpha_{p-1}Z^{p-1} + Z \sum_{i=0, even}^{p-3}\alpha_{i+1}Z^i
where \alpha_i=\sum_{a=1}^{\frac{p-1}{2}}a^{p-1-i}.
As the paper states, you need at least \sqrt{\left(p-3\right)} non-scalar multiplication. With p=65537 we cannot compute the LT function over two encrypted 64-bit integers. Even so, if we set p=65537 in OpenFHE, using BGV, the circuit will require a depth of 256. This type of circuit does not run on my laptop even for a whole day
The authors claim that they perform experiments in Helib. So, my question is: what am I missing? Is encrypted comparison really that efficient?
It looks to me that all you need is to evaluate a polynomial of degree p-1 over encrypted input. This should be done efficiently in \mathcal{O}(\sqrt{p}) non-scalar multiplications as the paper says. Note that these multiplications are not sequential, thus the multiplicative depth is not \sqrt{p}, but much less than that using efficient algorithms such as Paterson-Stockmeyer.
Another thing to note is that the paper reported amortized time, that is, they use SIMD packing to compare a batch of pair of numbers (at no additional cost) to increase the throughput. Amortized time for 1 comparison is the total time of computation divided by the batch size.
If your (p) is large, you might want to consider the option where the client sends not only Z but also certain powers of it Z^i to reduce the evaluation complexity.
Thank you for your quick reply. Now I think I understand what I didn’t understand. When I implemented equality testing over encrypted data I used the classical trick 1-\left(X-Y\right)^{p-1}\ mod\ p. To compute this polynomial I used a circuit of depth log\left(p\right) so when p=65517 the depth of the circuit was 17. For example, to compute X^{65536} I computed X^2, X^4, X^8,\dots,X^{65536}.
What I don’t understand is how to compute that sum using the same strategy. I can compute Z^{p-1} with a log\left(p\right) depth circuit but as far as I know, that sum cannot be computed with a circuit of similar depth. When I implemented that sum I used something like this (not the same code but you get an idea of how I calculated the sum):
Hello, thank you for the suggestion that client sends Z^i beforehand to speedup computation. However, if Z is not known in program, we won’t have precomputed Z^i, so I won’t be able to use this.
I want to work with p in the range of 10^9. However, I find that using even p=65537 with the method above, time to compare 2 encrypted integers with the univariate circuit that uses the Paterson Stockmeyer algorithm is 20 minutes. Can you suggest a faster algorithm or any partial redundant way that can be used to compare 2 unknown Ciphertexts? It will be helpful for my project where I am using BGV to encrypt the inputs of an application.
Another reason why this paper has very fast runtime is that the values being compared are actually the small digits of big numbers. For example, when comparing 64-bit integers, 4-bit digits are what are actually compared. Comparing only the digits makes it possible to choose a small plaintext modulus, e.g., 17, which leads to a lower degree interpolation polynomial (since the degree is determined by the plaintext modulus).
Decomposing 64-bit integers into 4-bit digits needs additional overhead. This part is negligible in this paper as it assumes that the integers can be decomposed before encryption. Based on this assumption, the digits can be extracted almost for free via a linear mapping (Lemma 2 in their paper, which also uses Frobenius automorphism to evaluate).
However, considering that in many cases we encrypt the integers themselves instead of digits, the above decomposition is actually not free. Directly comparing two encrypted integers is still possible, as in this paper. But more time is needed for decomposition, leading to a longer total runtime (see their 2).