Privacy Classification

SafeClaw provides multi-level PII classification with pluggable backends, a policy engine for routing decisions, and compliance checks for HIPAA, PCI-DSS, and GDPR.

Sensitivity Levels

pub enum SensitivityLevel {
    Public,              // No PII detected
    Normal,              // General data
    Sensitive,           // Email, phone, address
    HighlySensitive,     // Credit card, SSN, API key
    Critical,            // Medical records, passwords
}

Classifier

The primary classifier uses regex-based pattern matching:

pub struct Classifier {
    inner: a3s_common::privacy::RegexClassifier,
}

pub struct ClassificationResult {
    pub level: SensitivityLevel,
    pub matches: Vec<Match>,
    pub requires_tee: bool,
}

pub struct Match {
    pub rule_name: String,
    pub level: SensitivityLevel,
    pub start: usize,
    pub end: usize,
    pub redacted: String,
}

Usage

curl -X POST http://localhost:18790/api/v1/privacy/classify \
  -H "Content-Type: application/json" \
  -d '{"text": "Call me at 555-0123 or email john@example.com"}'

Response:

{
  "level": "Sensitive",
  "matches": [
    {
      "rule_name": "phone",
      "level": "Sensitive",
      "start": 14,
      "end": 22,
      "redacted": "[PHONE]"
    },
    {
      "rule_name": "email",
      "level": "Sensitive",
      "start": 32,
      "end": 48,
      "redacted": "[EMAIL]"
    }
  ],
  "requires_tee": false
}

Pluggable Backends

#[async_trait]
pub trait ClassifierBackend: Send + Sync {
    async fn classify(&self, text: &str) -> Vec<PiiMatch>;
    fn confidence_floor(&self) -> f64;
    fn name(&self) -> &str;
}

pub struct CompositeClassifier {
    backends: Vec<Box<dyn ClassifierBackend>>,
}

pub struct PiiMatch {
    pub rule_name: String,
    pub level: SensitivityLevel,
    pub start: usize,
    pub end: usize,
    pub confidence: f64,
    pub backend: String,
}

The CompositeClassifier chains multiple backends in order:

Regex — Fast pattern matching (phone, email, SSN, credit card, etc.)
Semantic — Context-aware analysis (detects PII in natural language)
LLM — (extensible) LLM-based classification for ambiguous cases

Semantic Analyzer

Context-aware PII detection beyond simple regex:

curl -X POST http://localhost:18790/api/v1/privacy/analyze \
  -H "Content-Type: application/json" \
  -d '{"text": "My name is John and I live at 123 Main Street"}'

The semantic analyzer detects PII that regex alone would miss, such as names in context, addresses without standard formatting, and implicit personal information.

Policy Engine

Routes messages based on sensitivity classification:

pub enum PolicyDecision {
    ProcessLocal,           // No TEE required
    ProcessInTee,           // Route to TEE
    Reject,                 // Block entirely
    RequireConfirmation,    // Ask user first
}

pub struct DataPolicy {
    pub name: String,
    pub tee_threshold: SensitivityLevel,
    pub allow_highly_sensitive: bool,
    pub type_rules: HashMap<String, PolicyDecision>,
}

pub struct PolicyEngine {
    policies: HashMap<String, DataPolicy>,
    default_policy: DataPolicy,
}

Policy Evaluation

impl PolicyEngine {
    pub fn evaluate(
        &self,
        level: SensitivityLevel,
        data_type: Option<&str>,
        policy_name: Option<&str>,
    ) -> PolicyDecision;

    pub fn requires_tee(&self, level: SensitivityLevel) -> bool;
}

Default behavior:

Public / Normal → ProcessLocal
Sensitive → ProcessLocal (with taint tracking)
HighlySensitive → ProcessInTee
Critical → ProcessInTee (or Reject if TEE unavailable)

Cumulative Risk Tracking

Tracks PII exposure across conversation turns:

pub struct CumulativeRiskDecision {
    // Per-session PII accumulation tracking
}

A single message might be Sensitive, but if a conversation accumulates multiple PII types (name + address + phone + email), the cumulative risk escalates to HighlySensitive or Critical, triggering TEE routing.

Configuration

[privacy]
default_sensitivity = "Normal"
enable_semantic_analysis = true
enable_compliance_checks = true

[[privacy.classification_rules]]
name = "custom-api-key"
pattern = "sk-[a-zA-Z0-9]{48}"
level = "HighlySensitive"

Privacy Classification

Privacy Classification

Sensitivity Levels

Classifier

Usage

Pluggable Backends

Semantic Analyzer

Policy Engine

Policy Evaluation

Compliance Engine

Cumulative Risk Tracking

Configuration

On this page