helix_ir.pii

helix_ir.pii — PII detection and classification.

helix_ir.pii.detect_pii(schema: Schema, sample_values: dict[str, list[Any]] | None = None, locale: str = 'in', layers: list[str] | None = None, confidence_threshold: float = 0.8) Schema[source]

Annotate schema fields with PII classes.

Parameters:
  • schema – The schema to annotate.

  • sample_values – Dict mapping path strings to sample values for regex matching.

  • locale – Locale for PII patterns (‘in’, ‘us’, ‘eu’, ‘all’).

  • layers – Detection layers to use. Default: [‘name’, ‘regex’].

  • confidence_threshold – Minimum fraction of matched values to assign PII class.

Returns:

A new Schema with pii_class annotations on relevant fields.

helix_ir.pii.detect_pii_from_field_name(field_name: str) str | None[source]

Return a PII class label for a field name, or None if not detected.

Uses the last component of a dotted path for matching.