Skip to main content
All CollectionsLLMAsAService.io
What is PII Redaction? How does it work?
What is PII Redaction? How does it work?
Updated over 4 months ago

Personal Identifiable Information is anything that cand be used to identify an individual. We can intercept this information in a prompt and "tokenize" it. Tokenizing means replacing it with a random tag on the way to the LLM so that never get the true information. We then replace that random token when we get a response back to the original data. It allows responses from LLM with them never getting the personal information. If the original meaning of that data was necessary for the response from the LLM via context, the responses will suffer.

Limitations

Currently we only support English text. It likely even in other languages the most sensitive (credit cards, banks details, etc.) will still be found, but you need to test it. We also plan to support subsets of these PII entities to reduce the loss of prompt context.

How it works

The user enters a prompt:

Hi, my name is Troy Montoya, and my credit card number is 1234-5432-3456-1234.

We send to the LLM vendor -

Hi my name is [NAME-1] and my credit card number is [CREDIT-CARD-1]. Is it a valid visa card?

The LLM provider responds with -

Hi [NAME-1]. No. [CREDIT-CARD-1] is not valid.

And we respond with -

Hi Troy Montoya. No. 1234-5432-3456-1234 is not valid.

PII Entities We Redact

Some PII entity types are universal (not specific to individual countries), such as email addresses and credit card numbers.

ADDRESS

A physical address, such as "100 Main Street, Anytown, USA" or "Suite #12, Building 123". An address can include information such as the street, building, location, city, state, country, county, zip code, precinct, and neighborhood.

AGE

An individual's age, including the quantity and unit of time. For example, in the phrase "I am 40 years old," we recognize "40 years" as an age.

AWS_ACCESS_KEY

A unique identifier that's associated with a secret access key; you use the access key ID and secret access key to sign programmatic AWS requests cryptographically.

AWS_SECRET_KEY

A unique identifier that's associated with an access key. You use the access key ID and secret access key to sign programmatic AWS requests cryptographically.

CREDIT_DEBIT_CVV

A three-digit card verification code (CVV) that is present on VISA, MasterCard, and Discover credit and debit cards. For American Express credit or debit cards, the CVV is a four-digit numeric code.

CREDIT_DEBIT_EXPIRY

The expiration date for a credit or debit card. This number is usually four digits long and is often formatted as month/year or MM/YY. We recognize expiration dates such as 01/21, 01/2021, and Jan 2021.

CREDIT_DEBIT_NUMBER

The number for a credit or debit card. These numbers can vary from 13 to 16 digits in length. However, We recognize credit or debit card numbers when only the last four digits are present.

DATE_TIME

A date can include a year, month, day, day of week, or time of day. For example, we recognize "January 19, 2020" or "11 am" as dates. We will recognize partial dates, date ranges, and date intervals. It will also recognize decades, such as "the 1990s".

DRIVER_ID

The number assigned to a driver's license, which is an official document permitting an individual to operate one or more motorized vehicles on a public road. A driver's license number consists of alphanumeric characters.

EMAIL

An email address, such as [email protected].

INTERNATIONAL_BANK_ACCOUNT_NUMBER

An International Bank Account Number has specific formats in each country. See www.iban.com/structure.

IP_ADDRESS

An IPv4 address, such as 198.51.100.0.

LICENSE_PLATE

A license plate for a vehicle is issued by the state or country where the vehicle is registered. The format for passenger vehicles is typically five to eight digits, consisting of upper-case letters and numbers. The format varies depending on the location of the issuing state or country.

MAC_ADDRESS

A media access control (MAC) address is a unique identifier assigned to a network interface controller (NIC).

NAME

An individual's name. This entity type does not include titles, such as Dr., Mr., Mrs., or Miss. We do not apply this entity type to names that are part of organizations or addresses. For example, we recognize the "John Doe Organization" as an organization, and "Jane Doe Street" as an address.

PASSWORD

An alphanumeric string that is used as a password, such as "*very20special#pass*".

PHONE

A phone number. This entity type also includes fax and pager numbers.

PIN

A four-digit personal identification number (PIN) with which you can access your bank account.

SWIFT_CODE

A SWIFT code is a standard format of Bank Identifier Code (BIC) used to specify a particular bank or branch. Banks use these codes for money transfers such as international wire transfers.

SWIFT codes consist of eight or 11 characters. The 11-digit codes refer to specific branches, while eight-digit codes (or 11-digit codes ending in 'XXX') refer to the head or primary office.

URL

A web address, such as www.example.com.

USERNAME

A user name that identifies an account, such as a login name, screen name, nick name, or handle.

VEHICLE_IDENTIFICATION_NUMBER

A Vehicle Identification Number (VIN) uniquely identifies a vehicle. VIN content and format are defined in the ISO 3779 specification. Each country has specific codes and formats for VINs.

Country-specific PII entity types

Some PII entity types are country-specific, such as passport numbers and other government-issued ID numbers.

CA_HEALTH_NUMBER

A Canadian Health Service Number is a 10-digit unique identifier, required for individuals to access healthcare benefits.

CA_SOCIAL_INSURANCE_NUMBER

A Canadian Social Insurance Number (SIN) is a nine-digit unique identifier, required for individuals to access government programs and benefits.

The SIN is formatted as three groups of three digits, such as 123-456-789. A SIN can be validated through a simple check-digit process called the Luhn algorithm.

IN_AADHAAR

An Indian Aadhaar is a 12-digit unique identification number issued by the Indian government to the residents of India. The Aadhaar format has a space or hyphen after the fourth and eighth digit.

IN_NREGA

An Indian National Rural Employment Guarantee Act (NREGA) number consists of two letters followed by 14 numbers.

IN_PERMANENT_ACCOUNT_NUMBER

An Indian Permanent Account Number is a 10-digit unique alphanumeric number issued by the Income Tax Department.

IN_VOTER_NUMBER

An Indian Voter ID consists of three letters followed by seven numbers.

UK_NATIONAL_HEALTH_SERVICE_NUMBER

A UK National Health Service Number is a 10-17 digit number, such as 485 777 3456. The current system formats the 10-digit number with spaces after the third and sixth digits. The final digit is an error-detecting checksum.

The 17-digit number format has spaces after the 10th and 13th digits.

UK_NATIONAL_INSURANCE_NUMBER

A UK National Insurance Number (NINO) provides individuals with access to National Insurance (social security) benefits. It is also used for some purposes in the UK tax system.

The number is nine digits long and starts with two letters, followed by six numbers and one letter. A NINO can be formatted with a space or a dash after the two letters and after the second, forth, and sixth digits.

UK_UNIQUE_TAXPAYER_REFERENCE_NUMBER

A UK Unique Taxpayer Reference (UTR) is a 10-digit number that identifies a taxpayer or a business.

BANK_ACCOUNT_NUMBER

A US bank account number, which is typically 10 to 12 digits long. We also recognize bank account numbers when only the last four digits are present.

BANK_ROUTING

A US bank account routing number. These are typically nine digits long, but we also recognize routing numbers when only the last four digits are present.

PASSPORT_NUMBER

A US passport number. Passport numbers range from six to nine alphanumeric characters.

US_INDIVIDUAL_TAX_IDENTIFICATION_NUMBER

A US Individual Taxpayer Identification Number (ITIN) is a nine-digit number that starts with a "9" and contain a "7" or "8" as the fourth digit. An ITIN can be formatted with a space or a dash after the third and forth digits.

SSN

A US Social Security Number (SSN) is a nine-digit number that is issued to US citizens, permanent residents, and temporary working residents. We also recognize Social Security Numbers when only the last four digits are present.

Did this answer your question?