Metadata-Version: 2.4
Name: cleancore-python
Version: 0.1.2
Summary: A dependency-free audit trail system for Python data transformations.
Home-page: https://github.com/yourusername/provena
Author: Your Name
Author-email: Sidra Saqlain <sidrasaqlain11@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Sidra-009/cleancore-python-library
Project-URL: Source, https://github.com/Sidra-009/cleancore-python-library
Project-URL: Issues, https://github.com/Sidra-009/cleancore-python-library/issues
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: System :: Logging
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# CleanCore Python

A lightweight, dependency-free audit trail system for Python data pipelines.

CleanCore automatically creates immutable, row-level audit logs for every data transformation. It helps with debugging, compliance, and understanding how your data changes across cleaning steps.

---

## Features

- Automatic row-level audit logs
- Zero external dependencies (pure Python)
- JSON-based audit output for compliance and record keeping
- Works with lists, dictionaries, and CSV-style data
- Simple decorator-based API

---

## Installation

```bash
pip install cleancore-python
Quick Start
Audit a Single Function
python
Copy code
from cleancore import audit_trail, ProvenaLogger, generate_terminal_report

@audit_trail(rule_id="GDPR_EMAIL_MASKING")
def clean_emails(data):
    result = []
    for row in data:
        new_row = row.copy()
        if '@' in new_row.get('email', ''):
            new_row['email'] = '***@***.***'
        result.append(new_row)
    return result

logger = ProvenaLogger("Single_Transformation")

data = [
    {'id': 1, 'email': 'test@example.com'},
    {'id': 2, 'email': 'user'}
]

cleaned = clean_emails(data, provena_logger=logger)

print(generate_terminal_report(logger))
Pipeline Usage
python
Copy code
from cleancore import audit_pipeline, audit_trail
import csv

def load_data(path):
    with open(path) as f:
        return list(csv.DictReader(f))

@audit_trail(rule_id="STANDARDIZE_NAMES")
def standardize_names(data):
    return data

@audit_trail(rule_id="FILL_MISSING_VALUES")
def fill_missing(data):
    return data

with audit_pipeline("Customer_Onboarding_Pipeline") as logger:
    data = load_data("customers.csv")
    data = standardize_names(data, provena_logger=logger)
    data = fill_missing(data, provena_logger=logger)

logger.export_json("customer_pipeline_audit.json")
Output
CleanCore generates a human-readable terminal report and a machine-readable JSON audit log containing:

Transformation name

Rule ID

Rows before and after

Number of changed rows

Sample value changes

Execution timestamps

API Overview
audit_trail – Decorator for auditing functions

ProvenaLogger – Collects audit events

audit_pipeline – Context manager for multi-step pipelines

generate_terminal_report() – Prints terminal summary

export_json() – Saves audit log to file

Source Code
GitHub Repository
https://github.com/Sidra-009/cleancore-python-library

Issues, feature requests, and pull requests are welcome.

License
MIT License. See the LICENSE file for details.
