PII in AI Pipelines: The Hidden Compliance Risk in Every LLM Call
In 2024, a financial services company's AI assistant started receiving social security numbers. Users, trusting the conversational interface, entered them unprompted when asked for account help. The data flowed to a third-party LLM API. The compliance team found out six months later.
This is not an unusual story.
The PII problem in AI pipelines
Traditional applications have well-defined input fields. Name goes here, email goes there. The application knows what it receives. Conversational AI interfaces are different - users type whatever they're thinking. When they're thinking about a problem involving their health, finances, or identity, they include that information in free text.
What Autrace's PII filter detects
Pattern-matching and NER-based detection across 12 built-in entity types: credit card numbers (Luhn-validated), SSN and equivalent national IDs, email addresses, phone numbers, IP addresses, passport numbers, dates of birth, bank account and routing numbers, medical record numbers, driver's license numbers, named persons, and physical addresses.
Redaction in the request path
# Input message
"My card 4532 0151 1283 0366 was charged twice"
# After PII filtering (redact mode)
"My card [CREDIT_CARD] was charged twice"
# Audit log entry (no raw PII stored)
{
"pii_detected": true,
"entities": [{"type": "CREDIT_CARD", "offset": 8, "length": 19}],
"action": "redacted",
"message_hash": "sha256:a3f9..."
}Block vs. redact
REDACT: Replace the PII with a typed placeholder before forwarding to the model. The model sees "[CREDIT_CARD]" - not the actual number. Conversation flow is preserved; sensitive data never leaves your perimeter.
BLOCK: Reject the request with HTTP 400. Use this for entity types where the risk of any exposure is too high.
GDPR implications
Under GDPR Article 5(1)(c), personal data should be "adequate, relevant and limited to what is necessary." Sending raw PII to a third-party LLM API means that data is processed by the API provider. Unless you have a DPA with the provider covering that specific data category, you're likely out of compliance. Redaction at the gateway means PII never reaches the provider.