Execution & Healing
The Execution Engine is responsible for generating and applying remediations to fix identified data quality issues.
Execution Modes
ADQA operates in three distinct modes to balance autonomy with safety:
1. Advisory Mode
- Behavior: Profiles and detects issues, but only proposes actions.
- Output: An
ActionPlancontaining a list of recommended fixes. - Use Case: Initial analysis or low-trust environments.
2. Human-in-the-loop (HIL)
- Behavior: Proposes an
ActionPlanand waits for external approval. - Output: Executes only the actions that have been explicitly approved.
- Use Case: Critical datasets where every change must be verified.
3. Automatic Mode
- Behavior: Fully autonomous. Detects and applies remediations immediately.
- Output: A new, healed DataFrame.
- Use Case: CI/CD pipelines, data pre-processing for ML.
Common Remediations
- DROP: Removes constant columns or redundant features.
- DEDUPE: Removes duplicate rows.
- IMPUTE: Fills missing values using mean, median, or constant values.
- CLIP: Winsorizes outliers to a specific percentile.
- MASK: Anonymizes PII data.
- TRANSFORM: Applies mathematical transformations (e.g., Log scale for skewed data).
Example: Automatic Healing
from adqa import ADQA, ADQAConfig, ExecutionMode
config = ADQAConfig(execution_mode=ExecutionMode.AUTOMATIC)
agent = ADQA.from_path("dirty_data.csv", config=config)
# This will detect issues and apply fixes in one call
result = agent.analyze()
# result.dataframe now contains the healed data
healed_df = result.dataframe