**By Hemanth Meka **- Senior Associate, Big Data at DataMetica

**Introduction:**

FPE is a special type of encryption. While generating the cipher text there is a lot of interest in preserving the type and length so that the cipher text also looks like original text. For example consider phone number, because the phone number is PII field it cannot be visible to everyone. The easy way to protect it is to encrypt it and replace the original phone number by the cipher string. After encrypting, the information about the type of data is lost. FPE generates a ciphertext which can also be numeric and also be of the same length as the phone number. That is no one can make out that the original field is made of numbers. Also, it can fool the intruder into thinking that the FPE cipher is the original phone number.

**Challenges using traditional encryption:**

Below image shows how encryption typically works.

Plain text, secret key, public algorithm and the ciphertext are the key components of symmetric encryption. As the example shows the original value of the SSN (783-43-3616) looks like random string after encryption. Looking at the cipher, no one can really guess that the original message is SSN number. At times this could be beneficial but the systems receiving the cipher message have to be aware of the changes and they will have to accommodate changes to adjust to the new type of data they would be receiving.

How convenient would it be if we could encrypt a number and get another number or encrypt a string and get another string of the same length. This would make it very easy for the data model to stay intact even after adding encryption to some columns. Systems receiving the cipher fields need not even be aware that encryption has been applied. We might even succeed at convincing someone into thinking that they are looking at the actual data.

**Impact on downstream systems:**

Introducing encryption to the existing systems might have certain effects on the downstream systems. These are a few of them.

- If the downstream systems do not handle the change in the encrypted column, it might break the processes in the queue.
- Applying encryption on numeric fields can result in a string. If not handled this could break at any downstream step
- If the process requires the data to decrypt to continue processing the above-mentioned changes have to be handled again.

**How FPE solves the issue:**

FPE has the security features of encryption while preserving the length and format. FPE works with Message Space for encryption or decryption. A message space for a given range and possible values for each position contains all possible permutations of sequences. For example, a message space for a string of a length of 3 where each position has a possible values set of {0,1} contains all values ranging from 000 to 111. FPE uses this message space to map input to another output using the key. When the key has changed the output for the same key is changed. Decryption uses a similar process in reverse. Below example explains how FPE works.

So an FPE generates an output for every input within the same range. The same can be applicable for alphanumeric inputs as well.

**Security:**

FPE uses block ciphers like AES to produce the cipher text. A good FPE generates a pseudo random permutation for an input in a constant time. An FPE implementation is considered secure if the output can be distinguishable from a truly random permutation and the underlying cipher algorithm defines how secure the FPE implementation is because if the cipher can be broken so can FPE implemented using it.

**Trade-Offs with FPE:**

Though FPE has its own benefits and strong points, it also has some trade-offs to be considered before considering it.

- Though not serious points of concern, FPE does work on the output of AES and hence might need slightly more resources like RAM and CPU cycles.
- Work done in the area of FPE is not so vast or abundant compared to the work around the standard encryption schemes. As a result, current implementations of FPE might consume more time.

**Risks:**

- FPE needs a good amount of distinct values for performing FPE. If not, it becomes easy to figure out the mapping given enough time.
- FPE gives away some info about the data such as the length or data type of the original data. Since this drawback is more related to the requirement than FPE itself, FPE should only be implemented after proper evaluation of the requirement.

**FPE algorithms:**

**FPE from prefix cipher:**

This is one of the easiest implementations. Here a pseudo-random function is used to generate random weights to each integer and sort it by this weight and an integer with same input rank can be taken from the weight sorted integers as output. This works as good as the block cipher used as a pseudorandom function. Though easy to implement, this method can prove to be difficult to use for inputs of larger range as it would take a longer time to precompute the weights and sort them.

**FPE from cycle walking:**

To create an FPE algorithm by repeatedly applying the block cipher until the output falls in the range of allowed values. For finite domains, the process would certainly terminate. This implementation that it doesn’t have to pre-map an input to an output. But for smaller domains, more and more repetitions might be required before the output falls in the range of the domain.

**FPE from Fiestal Network:**

A Fiestal network breaks the input into equal halves of left and right side bits. So for an output of 24 its the least significant bits from the output of block cipher can be used. To make the desired output fall in the same domain as the input the process is cycled enough times so that we get desired output in the same domain.

**The FFSEM/FFX mode of AES:**

Here we use AES for the rounding function with a fiestal network where a single key is used and in each round, the key is slightly modified.

**Datametica’s use case for FPE:**

Datametica’s home brewed masking tool ‘Data Anonymizer’ is developed to help our clients to mask the sensitive data and share the non-secure data with Datametica team for development and testing purposes. The tool uses multiple algorithms to mask the data required for multiple use cases. FPE is a really useful algorithm which can replace sensitive information like customer names and still keep the statistics of the data intact which helps the development team to understand the vitals of the data.

Since multiple runs would be required to implement the business logic and to test it, every run of the data has to be made anonymous. Unlike most masking algorithms which can’t replace a customer name with the same pseudo name or text across multiple times, FPE works well in this case. FPE generates output by generating deterministic pseudo random values. This helps it to map the same random value for a customer name every time, without storing the mapping. Allowing the business logic to still be applied after masking the data while preserving characteristic statistics of the data. Ie a user Ram will always be replaced by saying a token Shyam. This allows for business logic to be applied on tokenized data like calculating total orders to a store by the user Shyam. Shyam is a representing token of the user Ram and any developer or others who can view the data need not aware of the fact that Shyam is not the original name of the user and still use the masked data for development purposes. This saves a lot of effort that goes into writing scripts to mask the data and to further keep the replacements consistent across multiple runs and also saves time making changes to the data model if encryption were to be used. Needless to say, this approach is way better than generating sample data for development and testing purposes.