The PII Redaction model lets you minimize sensitive information about individuals by automatically identifying and removing it from your transcript.
hash
substitution: Hi, my name is ####!
entity_name
substitution: Hi, my name is [PERSON_NAME]!
text
property. Properties from other features may still include PII, such as entities
from Entity Detection or summary
from Summarization.redact_pii
to true
in the transcription config.
Use redact_pii_policies
to specify the information you want to redact. For the full list of policies, see PII policies.
set_redact_pii()
method on the TranscriptionConfig
with redact_audio
to True
.
Use get_redacted_audio_url()
on the transcript to get the URL to the redacted audio file.
Key | Type | Description |
---|---|---|
redact_pii | boolean | Enable PII Redaction. |
redact_pii_policies | array | PII policies for what information to redact. |
redact_pii_sub | string | Method used to substitute PII in the transcript. Can be entity_name or hash . |
redact_pii_audio | boolean | Create a redacted version of the audio file. |
redact_pii_audio_quality | string | Quality of the redacted PII audio file. Can be mp3 or wav . |
Key | Type | Description |
---|---|---|
text | string | Transcript with redacted PII. |
Policy name | Description | Example |
---|---|---|
account_number | Customer account or membership identification number | Policy No. 10042992; Member ID: HZ-5235-001 |
banking_information | Banking information, including account and routing numbers | |
blood_type | Blood type | O-, AB positive |
credit_card_cvv | Credit card verification code | CVV: 080 |
credit_card_expiration | Expiration date of a credit card | |
credit_card_number | Credit card number | |
date | Specific calendar date | December 18 |
date_of_birth | Date of birth | Date of Birth: March 7,1961 |
drivers_license | Driver’s license number. | DL# 356933-540 |
drug | Medications, vitamins, or supplements | Advil, Acetaminophen, Panadol |
email_address | Email address | support@assemblyai.com |
event | Name of an event or holiday | Olympics, Yom Kippur |
gender_sexuality | Terms indicating gender identity or sexual orientation, including slang terms | female; bisexual; trans |
healthcare_number | Healthcare numbers and health plan beneficiary numbers | Policy No.: 5584-486-674-YM |
injury | Bodily injury | I broke my arm, I have a sprained wrist |
ip_address | Internet IP address, including IPv4 and IPv6 formats | 192.168.0.1 |
language | Name of a natural language | Spanish, French |
location | Any Location reference including mailing address, postal code, city, state, province, country, or coordinates. | Lake Victoria, 145 Windsor St., 90210 |
medical_condition | Name of a medical condition, disease, syndrome, deficit, or disorder | chronic fatigue syndrome, arrhythmia, depression |
medical_process | Medical process, including treatments, procedures, and tests | heart surgery, CT scan |
money_amount | Name and/or amount of currency | 15 pesos, $94.50 |
nationality | Terms indicating nationality, ethnicity, or race | American, Asian, Caucasian |
number_sequence | Numerical PII (including alphanumeric strings) that doesn’t fall under other categories | |
occupation | Job title or profession | professor, actors, engineer, CPA |
organization | Name of an organization | CNN, McDonalds, University of Alaska, Northwest General Hospital |
passport_number | Passport numbers, issued by any country | PA4568332; NU3C6L86S12 |
password | Account passwords, PINs, access keys, or verification answers | 27%alfalfa, temp1234, My mother’s maiden name is Smith |
person_age | Number associated with an age | 27, 75 |
person_name | Name of a person | Bob, Doug Jones, Dr. Kay Martinez, MD |
phone_number | Telephone or fax number | |
political_affiliation | Terms referring to a political party, movement, or ideology | Republican, Liberal |
religion | Terms indicating religious affiliation | Hindu, Catholic |
url | Internet addresses | https://www.assemblyai.com/ |
us_social_security_number | Social Security Number or equivalent | |
username | Usernames, login names, or handles | @AssemblyAI |
vehicle_id | Vehicle identification numbers (VINs), vehicle serial numbers, and license plate numbers | 5FNRL38918B111818; BIF7547 |
Why is the PII not redacted in my transcription?
redact_pii_policies
parameter. If you’re still experiencing issues, please reach out to our support team for assistance.Why is my webhook not being sent?
webhook_url
parameter is included with a valid URL that can be reached by AssemblyAI’s API. If you’re using custom authentication headers, ensure that the webhook_auth_header_name
and webhook_auth_header_value
parameters are included and are correct. If you’re still having issues, please contact our support team for assistance.Why does my redacted audio file sound worse than the original?
redact_pii_audio_quality
to wav
.