Metadata-Version: 2.4
Name: SQLplusPython
Version: 0.1.0
Summary: Utils for SQL+Python - reliably work with SQL in Python apps 
Author-email: D Hansen <vj3gt0o0l@mozmail.com>
License: MIT
Project-URL: Repository, https://github.com/dennisdeh/SQLplusPython
Keywords: sqlalchemy,mysql,pandas,database,orm
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.12
Description-Content-Type: text/markdown
Requires-Dist: numpy==2.3.4
Requires-Dist: pandas==2.3.3
Requires-Dist: SQLAlchemy==2.0.44
Requires-Dist: mysql-connector-python==9.4.0
Requires-Dist: python-dotenv==1.1.1
Requires-Dist: pytz==2025.2
Requires-Dist: scikit-learn==1.7.2
Requires-Dist: scipy==1.16.2
Requires-Dist: six==1.17.0
Requires-Dist: typing_extensions==4.15.0
Requires-Dist: tzdata==2025.2
Requires-Dist: joblib==1.5.2
Requires-Dist: threadpoolctl==3.6.0
Requires-Dist: python-dateutil==2.9.0.post0
Requires-Dist: greenlet==3.2.4

# SQL+Python
A collection of utils to effortlessly and reliably interact between Python and your SQL database — purpose‑built for time‑series workflows.
This package streamlines reading, writing, and maintaining SQL tables using familiar Python structures like pandas DataFrames, dictionaries, and other Python objects. It’s user-friendly, strict about data consistency, and tuned for time-series data pipelines.
Intended for MariaDB/MySQL via SQLAlchemy - pre-configured docker environment included.

### Design Principles
- Predictable schemas: deterministic dtype mapping and optional auto-alter
- Strong safeguards: explicit checks, clear assertions, and helpful errors
- Pandas-friendly: minimal friction between DataFrame types and SQL
- Maintainability: composable helpers for upload, fetch, and schema operations

## Key Features
- Focus on time-series data
    - Append-only patterns with optional “update latest” logic
    - Automatic handling of date/symbol index columns
    - Fast retrieval by symbol and date with optional indexing
    - Utilities to query by symbol(s), fetch latest dates, and union columns across tables

- Works with your data structures
    - Upload pandas DataFrames with automatic dtype mapping
    - Store/retrieve dictionaries as rows
    - Persist arbitrary Python objects via pickling (e.g., models, configs)
    - Minimal boilerplate for table creation and updates
    - Smart dtype defaults for SQL schema generation

- Safety and consistency checks
    - NaN/Inf coercion and validation for key columns
    - Duplicate/consistency guards when updating recent rows
    - Table introspection: add missing columns automatically (optional)
    - Environment validation and connection checks

### Typical Use Cases
- Maintain historical time-series data on a symbol-level (prices, indicators, metrics)
- Incrementally update tables from pandas pipelines
- Keep per-symbol metadata and snapshots
- Store models or transforms in SQL as versioned pickles

## Installation
- Python: 3.12+
- Installed packages: sqlalchemy, pandas etc.
- Optional: Docker to use the docker compose environment that is set up to work wiht the utils - batteries included

Create/activate your virtualenv, then install your project’s dependencies as usual with pip inside the virtualenv.
Run `docker compose up` in the project root to start the MariaDB container with the default environment variables.

### Environment Variables
- Standard DB credentials (e.g., user, password, host, port, database) loaded from your .env
- A default setup for Docker is included
- Built-in validation for host configuration; safe fallbacks if misconfigured


## Quick Start
The main functions are listed below.

- `upload_df()`: DataFrame uploads
    - Auto-creates tables with sensible column types (numeric as DOUBLE, categoricals as TEXT, datetimes as DATETIME) using `create_table()`
    - Optional table alteration to add newly appearing DataFrame columns
    - Optional enforcement of non-null date/symbol keys with configurable behavior
    - When uploading time-series data and using “update latest,” only the newest overlapping row(s) are replaced; older history remains intact

- `upload_dict()`: Dictionary uploads
    - Keys become columns; can define a symbol column for idempotent updates

- `upload_object()`: Object uploads
    - Serialize Python objects via pickle and store them in LONGBLOB columns

- Retrieval helpers
    - `get_symbol_data()` or `get_df_symbols_data()`: Get time-series for a single symbol or multiple symbols
    - `get_all_symbol_data()`: Fetch all tables for a symbol in one go
    - `get_existing_rows()`: Inspect availability (tables exist, rows > 0), and compute the union of columns across tables



