I know that many questions have been posted about this topic. What I don’t understand is how this paper can get such small times. The paper was referenced in this post.

According to the authors:

Blockquote
Our implementation of the less-than function in the HElib library is up to 3 times faster than the prior work based on BGV/BFV. It allows to compare a pair of 64-bit integers in 11 milliseconds[…]on an average laptop without multithreading

So, 11 milliseconds to compare two 64-bit integers on an average laptop.

The main idea of their solution is to compute the lower-than (LT) function using univariate interpolation (Z is the difference between the two numbers):

\alpha_{p-1}Z^{p-1} + Z \sum_{i=0, even}^{p-3}\alpha_{i+1}Z^i

where \alpha_i=\sum_{a=1}^{\frac{p-1}{2}}a^{p-1-i}.

As the paper states, you need at least \sqrt{\left(p-3\right)} non-scalar multiplication. With p=65537 we cannot compute the LT function over two encrypted 64-bit integers. Even so, if we set p=65537 in OpenFHE, using BGV, the circuit will require a depth of 256. This type of circuit does not run on my laptop even for a whole day

The authors claim that they perform experiments in Helib. So, my question is: what am I missing? Is encrypted comparison really that efficient?

It looks to me that all you need is to evaluate a polynomial of degree p-1 over encrypted input. This should be done efficiently in \mathcal{O}(\sqrt{p}) non-scalar multiplications as the paper says. Note that these multiplications are not sequential, thus the multiplicative depth is not \sqrt{p}, but much less than that using efficient algorithms such as Paterson-Stockmeyer.

Another thing to note is that the paper reported amortized time, that is, they use SIMD packing to compare a batch of pair of numbers (at no additional cost) to increase the throughput. Amortized time for 1 comparison is the total time of computation divided by the batch size.

If your (p) is large, you might want to consider the option where the client sends not only Z but also certain powers of it Z^i to reduce the evaluation complexity.

Thank you for your quick reply. Now I think I understand what I didn’t understand. When I implemented equality testing over encrypted data I used the classical trick 1-\left(X-Y\right)^{p-1}\ mod\ p. To compute this polynomial I used a circuit of depth log\left(p\right) so when p=65517 the depth of the circuit was 17. For example, to compute X^{65536} I computed X^2, X^4, X^8,\dots,X^{65536}.

What I don’t understand is how to compute that sum using the same strategy. I can compute Z^{p-1} with a log\left(p\right) depth circuit but as far as I know, that sum cannot be computed with a circuit of similar depth. When I implemented that sum I used something like this (not the same code but you get an idea of how I calculated the sum):

Hello, thank you for the suggestion that client sends Z^i beforehand to speedup computation. However, if Z is not known in program, we won’t have precomputed Z^i, so I won’t be able to use this.

I want to work with p in the range of 10^9. However, I find that using even p=65537 with the method above, time to compare 2 encrypted integers with the univariate circuit that uses the Paterson Stockmeyer algorithm is 20 minutes. Can you suggest a faster algorithm or any partial redundant way that can be used to compare 2 unknown Ciphertexts? It will be helpful for my project where I am using BGV to encrypt the inputs of an application.