@jtovartr There’s a bit of nuance here that might go over your head (especially as a newbie). Say we have the following dataset:
X = [
[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]
])
Two things to keep in mind:
- Oftentimes, we have more rows than columns in the dataset.
- Encryption is slow.
So, instead of encrypting row-by-row, we encrypt column-wise. So, we might have something like the following (think of this as a transpose and then an encrypt):
encX = [
enc([ 0, 4, 8, 12, 16, 0,0,...0]),
enc([ 1, 5, 9, 13, 17, 0,0,...0]),
enc([ 2, 6, 10, 14, 18, 0,0,...0])),
enc([ 3, 7, 11, 15, 19, 0,0,...0])
]
in general, the number of generated ciphertexts is the number of features that we have. One question you may have is “Where did those extra 0’s come from?” and the answer is that it’s for the sake of security. These extra 0’s are part of what makes these ciphertexts so large. Depending on your security parameters, each ciphertext can come up to storing 131_072
integers at once. Depending on your use-case that might be acceptable but I think it depends.
Extra Details
Want to know what other things affect your ciphertext size?
HTH!