Metadata-Version: 2.1
Name: pydata-checks
Version: 0.0.82
Summary: Data quality checks that don't suck.
Author-email: Ivan Zhang <ivanzhangofficial@gmail.com>
License: MIT License
        
        Copyright (c) [2023] [Ivan Zhang]
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://github.com/SuperiorityComplex/data-checks
Project-URL: Bug Tracker, https://github.com/SuperiorityComplex/data-checks/issues
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: APScheduler
Requires-Dist: certifi
Requires-Dist: charset-normalizer
Requires-Dist: idna
Requires-Dist: pytz
Requires-Dist: requests
Requires-Dist: six
Requires-Dist: SQLAlchemy
Requires-Dist: typing-extensions
Requires-Dist: tzlocal
Requires-Dist: urllib3
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: python-dateutil

# Data Checks
![License](https://img.shields.io/badge/license-MIT-blue.svg) ![Python](https://img.shields.io/badge/python-3.7-blue.svg) 

**Create, schedule, and deploy data quality checks.**

## Overview
Exisiting data observability solutions are painfully static. **data_checks** provides a dynamic data observability framework that allows you to reuse existing Python code and/or write new Python code to define data quality checks that can then be easily scheduled and monitored. Inspired by Python's [unittest](https://docs.python.org/3/library/unittest.html), data_checks allows you to write data quality checks as easily and seamlessly as you would write unittests on your code.


## Quickstart
### 1) Installation
Install the latest version of data_checks using pip:
```bash
pip install pydata-checks
```
### 2) Start a Data Check project
Initialize a new data_checks project by using the `init` command from your project directory (`/Users/USERNAME/Desktop/PROJECT_NAME`):
```bash
python -m data_checks.init
```
This will start a series of prompts that will guide you through the process of initializing a new data_checks project. For example:
```bash
$ python -m data_checks.init
Enter the relative file path of the directory where suites will be stored: my_first_data_checks_project/suites
Directory '/Users/USERNAME/Desktop/PROJECT_NAME/my_first_data_checks_project/suites' does not exist.
Would you like to create it? [y/n]: y
Enter the relative file path of the directory where checks will be stored: my_first_data_checks_project/checks
Directory '/Users/USERNAME/Desktop/PROJECT_NAME/my_first_data_checks_project/checks' does not exist.
Would you like to create it? [y/n]: y
Enter the default CRON schedule: * * * * *
Enter the database URL: database_url
Enter the alerting endpoint URL:
check_settings.py generated.
my_first_data_check.py generated.
```


This will create a new directory with the following structure:
```
PROJECT_NAME
├── my_first_data_checks_project
│   ├── __init__.py
│   ├── checks
│   │   ├── __init__.py
│   │   └── my_first_data_check.py
│   ├── suites
│   │   ├── __init__.py
├── check_settings.py
```
### 3) Set the `CHECK_SETTINGS_MODULE` to point to the `check_settings.py` file
```bash
export CHECK_SETTINGS_MODULE=check_settings
```

### 4) Run the autogenerated data check
```bash
python -m data_checks.do.run_check MyFirstDataCheck
```

Output:
```bash
[1/1 checks] MyFirstDataCheck
	[1/2 Rules] rule_my_first_failed_rule
This rule failed
DataCheckException(severity=1.0, exception=This rule failed, metadata={'rule': 'rule_my_first_failed_rule', 'params': {'args': (), 'kwargs': {}}})
	[2/2 Rules] rule_my_first_successful_rule
		rule_my_first_successful_rule took 0.0 seconds
```

### 5) Modify the autogenerated data check
Open up the `my_first_data_checks_project/checks.my_first_data_check.py` file and customize the data check to your liking. For instance, you can modify the `rule_my_first_failed_rule` to always pass by removing the exception:
```python
from data_checks.classes.data_check import DataCheck


class MyFirstDataCheck(DataCheck):
    ...

    def rule_my_first_failed_rule(self):
        # This rule will now succeed
        assert True, "This rule now succeeds"

    ...
```

Rerun the data check:
```bash
python -m data_checks.do.run_check MyFirstDataCheck
```

Output:
```bash
[1/1 checks] MyFirstDataCheck
	[1/2 Rules] rule_my_first_successful_rule
		rule_my_first_successful_rule took 9.5367431640625e-07 seconds
	[2/2 Rules] rule_my_first_failed_rule
		rule_my_first_failed_rule took 9.5367431640625e-07 seconds
```

:tada: Congrats! :tada: You've created and executed your first data check! See the [documentation](https://github.com/SuperiorityComplex/data_checks/wiki) for more information on how writing more advanced checks, suites, and other features like scheduling and alerting.
