Profiling

The Profiling Engine is the first step in ADQA's analysis pipeline. It analyzes the dataset to understand its structure and content.

Profiling Dimensions

ADQA profiles data across four main categories:

1. Structural Profiling

Captures basic properties of columns: - Null counts and ratios. - Uniqueness and cardinality. - Memory usage. - Data types (Physical and Logical).

2. Behavioral Profiling

Analyzes the distribution and patterns of values: - Range (min, max). - Central tendency (mean, median). - Dispersion (std, variance). - Outlier ratios. - Skewness and Kurtosis.

3. Semantic Profiling

Identifies the meaning of the data: - Uses a hybrid approach: Regex patterns + ML Classifiers. - Common tags: EMAIL, PHONE, ADDRESS, SSN, CREDIT_CARD, etc.

4. Relational Profiling

Identifies relationships between columns: - Correlation matrix (Pearson/Spearman). - Functional dependencies (planned).

ML-Enhanced Profiling

When ml_enabled is set to True, ADQA uses pre-trained models to: - Predict logical types with higher accuracy. - Detect complex semantic violations. - Amplify anomaly signals.