helix_ir.pii¶
helix_ir.pii — PII detection and classification.
- helix_ir.pii.detect_pii(schema: Schema, sample_values: dict[str, list[Any]] | None = None, locale: str = 'in', layers: list[str] | None = None, confidence_threshold: float = 0.8) Schema[source]¶
Annotate schema fields with PII classes.
- Parameters:
schema – The schema to annotate.
sample_values – Dict mapping path strings to sample values for regex matching.
locale – Locale for PII patterns (‘in’, ‘us’, ‘eu’, ‘all’).
layers – Detection layers to use. Default: [‘name’, ‘regex’].
confidence_threshold – Minimum fraction of matched values to assign PII class.
- Returns:
A new Schema with pii_class annotations on relevant fields.