- With
hash
substitution:Hi, my name is ####!
- With
entity_name
substitution:Hi, my name is [PERSON_NAME]!
Supported languages
PII Redaction is available in multiple languages. See Supported languages.Redacted properties
PII only redacts words in thetext
property. Properties from other features may still include PII, such as entities
from Entity Detection or summary
from Summarization.Quickstart
Enable PII Redaction by settingredact_pii
to true
in the transcription config.
Use redact_pii_policies
to specify the information you want to redact. For the full list of policies, see PII policies.
Example output
Create redacted audio files
In addition to redacting sensitive information from the transcription text, you can also generate a version of the original audio file with the PII “beeped” out. To create a redacted version of the audio file, use theset_redact_pii()
method on the TranscriptionConfig
with redact_audio
to True
.
Use get_redacted_audio_url()
on the transcript to get the URL to the redacted audio file.
Supported languages
You can only create redacted audio files for transcriptions in English and Spanish.Maximum audio file size
You can only create redacted versions of audio files if the original file is smaller than 1 GB.Example output
API reference
Request
Key | Type | Description |
---|---|---|
redact_pii | boolean | Enable PII Redaction. |
redact_pii_policies | array | PII policies for what information to redact. |
redact_pii_sub | string | Method used to substitute PII in the transcript. Can be entity_name or hash . |
redact_pii_audio | boolean | Create a redacted version of the audio file. |
redact_pii_audio_quality | string | Quality of the redacted PII audio file. Can be mp3 or wav . |
Response
Key | Type | Description |
---|---|---|
text | string | Transcript with redacted PII. |
PII policies
Policy name | Description | Example |
---|---|---|
account_number | Customer account or membership identification number | Policy No. 10042992; Member ID: HZ-5235-001 |
banking_information | Banking information, including account and routing numbers | |
blood_type | Blood type | O-, AB positive |
credit_card_cvv | Credit card verification code | CVV: 080 |
credit_card_expiration | Expiration date of a credit card | |
credit_card_number | Credit card number | |
date | Specific calendar date | December 18 |
date_of_birth | Date of birth | Date of Birth: March 7,1961 |
drivers_license | Driver’s license number. | DL# 356933-540 |
drug | Medications, vitamins, or supplements | Advil, Acetaminophen, Panadol |
email_address | Email address | support@assemblyai.com |
event | Name of an event or holiday | Olympics, Yom Kippur |
gender_sexuality | Terms indicating gender identity or sexual orientation, including slang terms | female; bisexual; trans |
healthcare_number | Healthcare numbers and health plan beneficiary numbers | Policy No.: 5584-486-674-YM |
injury | Bodily injury | I broke my arm, I have a sprained wrist |
ip_address | Internet IP address, including IPv4 and IPv6 formats | 192.168.0.1 |
language | Name of a natural language | Spanish, French |
location | Any Location reference including mailing address, postal code, city, state, province, country, or coordinates. | Lake Victoria, 145 Windsor St., 90210 |
medical_condition | Name of a medical condition, disease, syndrome, deficit, or disorder | chronic fatigue syndrome, arrhythmia, depression |
medical_process | Medical process, including treatments, procedures, and tests | heart surgery, CT scan |
money_amount | Name and/or amount of currency | 15 pesos, $94.50 |
nationality | Terms indicating nationality, ethnicity, or race | American, Asian, Caucasian |
number_sequence | Numerical PII (including alphanumeric strings) that doesn’t fall under other categories | |
occupation | Job title or profession | professor, actors, engineer, CPA |
organization | Name of an organization | CNN, McDonalds, University of Alaska, Northwest General Hospital |
passport_number | Passport numbers, issued by any country | PA4568332; NU3C6L86S12 |
password | Account passwords, PINs, access keys, or verification answers | 27%alfalfa, temp1234, My mother’s maiden name is Smith |
person_age | Number associated with an age | 27, 75 |
person_name | Name of a person | Bob, Doug Jones, Dr. Kay Martinez, MD |
phone_number | Telephone or fax number | |
political_affiliation | Terms referring to a political party, movement, or ideology | Republican, Liberal |
religion | Terms indicating religious affiliation | Hindu, Catholic |
url | Internet addresses | https://www.assemblyai.com/ |
us_social_security_number | Social Security Number or equivalent | |
username | Usernames, login names, or handles | @AssemblyAI |
vehicle_id | Vehicle identification numbers (VINs), vehicle serial numbers, and license plate numbers | 5FNRL38918B111818; BIF7547 |
Troubleshooting
Why is the PII not redacted in my transcription?
Why is the PII not redacted in my transcription?
Make sure that at least one PII policy has been specified in your request, using the
redact_pii_policies
parameter. If you’re still experiencing issues, please reach out to our support team for assistance.Why is my webhook not being sent?
Why is my webhook not being sent?
There could be several reasons why your webhook isn’t being sent, such as a misconfigured URL, an unreachable endpoint, or an issue with the authentication headers. Double-check your request and ensure that the
webhook_url
parameter is included with a valid URL that can be reached by AssemblyAI’s API. If you’re using custom authentication headers, ensure that the webhook_auth_header_name
and webhook_auth_header_value
parameters are included and are correct. If you’re still having issues, please contact our support team for assistance.Why does my redacted audio file sound worse than the original?
Why does my redacted audio file sound worse than the original?
By default, the API returns redacted audio files in MP3 format, a lossy format. Lossy formats remove audio information to reduce file size, which may cause a reduction in quality. The difference may be particularly noticeable if the submitted audio is in a lossless file format. To retain as much quality as possible, you can instead return your redacted audio files in a lossless format, by setting
redact_pii_audio_quality
to wav
.