PII Redaction
The PII Redaction model lets you minimize sensitive information about individuals by automatically identifying and removing it from your transcript.
Personal Identifiable Information (PII) is any information that can be used to identify a person, such as a name, email address, or phone number.
When you enable the PII Redaction model, your transcript will look like this:
- With
hash
substitution:Hi, my name is ####!
- With
entity_name
substitution:Hi, my name is [PERSON_NAME]!
You can also Create redacted audio files to replace sensitive information with a beeping sound.
Supported languages
PII Redaction is available in multiple languages. See Supported languages.
Redacted properties
PII only redacts words in the text
property. Properties from other features may still include PII, such as entities
from Entity Detection or summary
from Summarization.
Quickstart
Enable PII Redaction by setting redact_pii
to true
in the transcription config.
Use redact_pii_policies
to specify the information you want to redact. For the full list of policies, see PII policies.
Example output
Create redacted audio files
In addition to redacting sensitive information from the transcription text, you can also generate a version of the original audio file with the PII “beeped” out.
To create a redacted version of the audio file, use the set_redact_pii()
method on the TranscriptionConfig
with redact_audio
to True
.
Use get_redacted_audio_url()
on the transcript to get the URL to the redacted audio file.
Supported languages
You can only create redacted audio files for transcriptions in English and Spanish.
Maximum audio file size
You can only create redacted versions of audio files if the original file is smaller than 1 GB.
Example output
API reference
Request
Key | Type | Description |
---|---|---|
redact_pii | boolean | Enable PII Redaction. |
redact_pii_policies | array | PII policies for what information to redact. |
redact_pii_sub | string | Method used to substitute PII in the transcript. Can be entity_name or hash . |
redact_pii_audio | boolean | Create a redacted version of the audio file. |
redact_pii_audio_quality | string | Quality of the redacted PII audio file. Can be mp3 or wav . |
Response
Key | Type | Description |
---|---|---|
text | string | Transcript with redacted PII. |
The response also includes the request parameters used to generate the transcript.
PII policies
Policy name | Description | Example |
---|---|---|
account_number | Customer account or membership identification number | Policy No. 10042992; Member ID: HZ-5235-001 |
banking_information | Banking information, including account and routing numbers | |
blood_type | Blood type | O-, AB positive |
credit_card_cvv | Credit card verification code | CVV: 080 |
credit_card_expiration | Expiration date of a credit card | |
credit_card_number | Credit card number | |
date | Specific calendar date | December 18 |
date_of_birth | Date of birth | Date of Birth: March 7,1961 |
drivers_license | Driver’s license number. | DL# 356933-540 |
drug | Medications, vitamins, or supplements | Advil, Acetaminophen, Panadol |
email_address | Email address | support@assemblyai.com |
event | Name of an event or holiday | Olympics, Yom Kippur |
gender_sexuality | Terms indicating gender identity or sexual orientation, including slang terms | female; bisexual; trans |
healthcare_number | Healthcare numbers and health plan beneficiary numbers | Policy No.: 5584-486-674-YM |
injury | Bodily injury | I broke my arm, I have a sprained wrist |
ip_address | Internet IP address, including IPv4 and IPv6 formats | 192.168.0.1 |
language | Name of a natural language | Spanish, French |
location | Any Location reference including mailing address, postal code, city, state, province, country, or coordinates. | Lake Victoria, 145 Windsor St., 90210 |
medical_condition | Name of a medical condition, disease, syndrome, deficit, or disorder | chronic fatigue syndrome, arrhythmia, depression |
medical_process | Medical process, including treatments, procedures, and tests | heart surgery, CT scan |
money_amount | Name and/or amount of currency | 15 pesos, $94.50 |
nationality | Terms indicating nationality, ethnicity, or race | American, Asian, Caucasian |
number_sequence | Numerical PII (including alphanumeric strings) that doesn’t fall under other categories | |
occupation | Job title or profession | professor, actors, engineer, CPA |
organization | Name of an organization | CNN, McDonalds, University of Alaska, Northwest General Hospital |
passport_number | Passport numbers, issued by any country | PA4568332; NU3C6L86S12 |
password | Account passwords, PINs, access keys, or verification answers | 27%alfalfa, temp1234, My mother’s maiden name is Smith |
person_age | Number associated with an age | 27, 75 |
person_name | Name of a person | Bob, Doug Jones, Dr. Kay Martinez, MD |
phone_number | Telephone or fax number | |
political_affiliation | Terms referring to a political party, movement, or ideology | Republican, Liberal |
religion | Terms indicating religious affiliation | Hindu, Catholic |
url | Internet addresses | https://www.assemblyai.com/ |
us_social_security_number | Social Security Number or equivalent | |
username | Usernames, login names, or handles | @AssemblyAI |
vehicle_id | Vehicle identification numbers (VINs), vehicle serial numbers, and license plate numbers | 5FNRL38918B111818; BIF7547 |