Metadata-Version: 2.1
Name: postgres-backup-plugin
Version: 1.0.2
Summary: Framework-agnostic PostgreSQL backup library with filtering support
Author-email: Võ Thiên Hòa <vothienhoa@gmail.com>
Maintainer-email: Võ Thiên Hòa <vothienhoa@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/vohoa/postgres_backup_plugin
Project-URL: Documentation, https://github.com/vohoa/postgres_backup_plugin#readme
Project-URL: Repository, https://github.com/vohoa/postgres_backup_plugin
Project-URL: Source Code, https://github.com/vohoa/postgres_backup_plugin
Project-URL: Bug Tracker, https://github.com/vohoa/postgres_backup_plugin/issues
Project-URL: Changelog, https://github.com/vohoa/postgres_backup_plugin/blob/main/CHANGELOG.md
Keywords: postgresql,postgres,backup,database,sql,copy,django,filter,export,s3,restore,dump
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Topic :: Database
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Topic :: System :: Archiving :: Backup
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Classifier: Framework :: Django
Classifier: Framework :: Django :: 3.0
Classifier: Framework :: Django :: 4.0
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: psycopg2-binary>=2.8.0
Provides-Extra: all
Requires-Dist: boto3>=1.20.0; extra == "all"
Requires-Dist: django>=3.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.990; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Provides-Extra: django
Requires-Dist: django>=3.0; extra == "django"
Provides-Extra: s3
Requires-Dist: boto3>=1.20.0; extra == "s3"

# PostgreSQL Backup Plugin

Framework-agnostic PostgreSQL backup library with advanced filtering support.

## Features

- ✅ **Direct Streaming**: No temp files, minimal RAM usage (works with TB-sized tables)
- ✅ **COPY Format**: Fast backup and restore using PostgreSQL COPY
- ✅ **Flexible Filtering**: Filter data per table with reusable filter classes
- ✅ **Multiple Exporters**: Local file, S3, or custom destinations
- ✅ **Framework-Agnostic**: Works with Django, Flask, FastAPI, or pure Python
- ✅ **Production Ready**: Error handling, logging, validation
- ✅ **Multi-Schema Support**: Backup from any schema, not just 'public'
- ✅ **Automatic SQL Cleaning**: Remove schema prefixes and psql meta-commands for clean restores

## Installation

```bash
# Basic installation
pip install psycopg2-binary

# For S3 support
pip install psycopg2-binary boto3

# For Django integration
pip install psycopg2-binary django
```

## Quick Start

### Basic Usage (Pure Python)

```python
from postgres_backup_plugin import PostgresBackupEngine, DatabaseConfig

# Configure database
db_config = DatabaseConfig(
    host='localhost',
    port=5432,
    user='postgres',
    password='secret',
    database='mydb'
)

# Create backup engine
engine = PostgresBackupEngine(db_config)

# Create backup
result = engine.backup('/tmp/backup.sql')

if result.success:
    print(f"Backup created: {result.file_path}")
    print(f"Size: {result.size_bytes} bytes")
    print(f"Tables: {result.tables_count}")
    print(f"Duration: {result.duration_seconds}s")
else:
    print(f"Backup failed: {result.error_message}")
```

### Django Integration

```python
from postgres_backup_plugin import PostgresBackupEngine

# Automatically use Django database settings
engine = PostgresBackupEngine.from_django_settings(
    excluded_tables=['django_session', 'django_admin_log']
)

result = engine.backup('/tmp/backup.sql')
```

### With Filtering

```python
from postgres_backup_plugin import (
    PostgresBackupEngine,
    DateRangeFilter,
    ForeignKeyFilter,
    StatusFilter
)

engine = PostgresBackupEngine.from_django_settings()

# Define filters per table
filters = {
    # Filter orders by customer IDs
    'orders': ForeignKeyFilter('customer_id', [123, 456, 789]),

    # Filter invoices by date range
    'invoices': DateRangeFilter('created_at', '2024-01-01', '2024-12-31'),

    # Filter users by status
    'users': StatusFilter('status', allowed_statuses=['active', 'pending']),

    # Raw SQL also works
    'products': 'SELECT * FROM products WHERE price > 100'
}

result = engine.backup(
    output_path='/tmp/filtered_backup.sql',
    filters=filters,
    schema_name='my_backup',
    metadata={'purpose': 'customer_migration', 'ticket': 'PROJ-123'}
)
```

### With S3 Export

```python
from postgres_backup_plugin import PostgresBackupEngine, S3Exporter

engine = PostgresBackupEngine.from_django_settings()

# Create backup
result = engine.backup('/tmp/backup.sql')

if result.success:
    # Export to S3
    exporter = S3Exporter(
        bucket='my-backups',
        prefix='postgres/deliveries/',
        delete_local=True  # Delete local file after upload
    )

    s3_url = exporter.export(result.file_path, metadata=result.metadata)
    print(f"Uploaded to: {s3_url}")
```

## Advanced Usage

### Multi-Schema Backup

Backup from any schema, not just the default 'public' schema:

```python
from postgres_backup_plugin import PostgresBackupEngine, DatabaseConfig

engine = PostgresBackupEngine.from_django_settings()

# Backup from 'public' schema (default)
result = engine.backup('/tmp/backup_public.sql')

# Backup from a specific schema
result = engine.backup(
    output_path='/tmp/backup_production.sql',
    source_schema='production'  # Backup from 'production' schema
)

# Backup from one schema and restore to another
result = engine.backup(
    output_path='/tmp/backup_migration.sql',
    source_schema='old_schema',      # Backup FROM this schema
    schema_name='new_schema'         # Restore TO this schema
)
```

**Use Cases:**
- Multi-tenant databases with separate schemas per tenant
- Development/staging/production schemas in same database
- Schema-based data isolation
- Cross-schema migrations

### Automatic SQL Cleaning

By default, the plugin automatically cleans SQL output to make it portable and easy to restore:

```python
from postgres_backup_plugin import BackupConfig, PostgresBackupEngine

# SQL cleaning is enabled by default
backup_config = BackupConfig(
    clean_output=True,  # Default: True
    target_schema='my_target_schema'  # Optional: specify target schema for restore
)

engine = PostgresBackupEngine(db_config, backup_config)

result = engine.backup('/tmp/clean_backup.sql')
```

**What gets cleaned:**
- ✅ Schema prefixes (e.g., `public.users` → `users`)
- ✅ Psql meta-commands (`\restrict`, `\unrestrict`, `\connect`, etc.)
- ✅ Unnecessary SET commands
- ✅ Empty lines and comments (optional)
- ⚠️ **COPY data blocks are preserved** - no data loss!

**Benefits:**
- Clean, portable SQL files
- Easy to restore to any schema
- No psql-specific commands
- Smaller file size

**Disable cleaning if needed:**
```python
backup_config = BackupConfig(
    clean_output=False  # Keep raw SQL output
)
```

### Custom Filters

Create reusable filter classes:

```python
from postgres_backup_plugin.filters import FilterQuery

class AgeRangeFilter(FilterQuery):
    def __init__(self, min_age, max_age):
        self.min_age = min_age
        self.max_age = max_age

    def build(self, table_name, **params):
        return f"""
            SELECT * FROM {table_name}
            WHERE age BETWEEN {self.min_age} AND {self.max_age}
        """

# Use it
filters = {
    'users': AgeRangeFilter(18, 65)
}
```

### Composite Filters

Combine multiple filters:

```python
from postgres_backup_plugin.filters import CompositeFilter

date_filter = DateRangeFilter('created_at', '2024-01-01', '2024-12-31')
status_filter = StatusFilter('status', ['active'])

# Combine with AND
composite = CompositeFilter(date_filter, status_filter, operator='AND')

filters = {
    'orders': composite
}
```

### Validate Filters Before Backup

```python
engine = PostgresBackupEngine.from_django_settings()

filters = {
    'orders': 'SELECT * FROM orders WHERE invalid_column = 123'
}

try:
    engine.validate_filters(filters)
except FilterValidationError as e:
    print(f"Invalid filter: {e}")
```

### Estimate Backup Size

```python
engine = PostgresBackupEngine.from_django_settings()

estimates = engine.estimate_size(filters)

for table, row_count in estimates.items():
    print(f"{table}: {row_count} rows")
```

## Configuration

### Database Config

```python
from postgres_backup_plugin import DatabaseConfig

db_config = DatabaseConfig(
    host='localhost',
    port=5432,
    user='postgres',
    password='secret',
    database='mydb'
)
```

### Backup Config

```python
from postgres_backup_plugin import BackupConfig

backup_config = BackupConfig(
    # Table exclusion
    excluded_tables=['temp_table', 'cache_table'],

    # Performance
    buffer_size=8192,
    timeout=3600,  # seconds

    # SQL Cleaning (NEW!)
    clean_output=True,              # Enable automatic SQL cleaning (default: True)
    target_schema='my_schema',      # Target schema for cleaned output (optional)

    # Encoding
    encoding='utf-8',

    # Restore optimization
    disable_triggers=True,  # Faster restore
    disable_fsync=True,     # Faster restore

    # Output options
    include_header=True,
    verbose_logging=True,

    # COPY format options
    copy_delimiter='\\t',           # Tab delimiter
    copy_null_string='\\N'          # NULL representation
)

engine = PostgresBackupEngine(db_config, backup_config)
```

**Configuration Details:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `excluded_tables` | List[str] | `[]` | Tables to skip during backup |
| `clean_output` | bool | `True` | Enable automatic SQL cleaning |
| `target_schema` | Optional[str] | `None` | Target schema for restore |
| `buffer_size` | int | `8192` | Stream buffer size (bytes) |
| `timeout` | int | `3600` | Connection timeout (seconds) |
| `encoding` | str | `'utf-8'` | File encoding |
| `disable_triggers` | bool | `True` | Disable triggers during restore |
| `disable_fsync` | bool | `True` | Disable fsync during restore |
| `include_header` | bool | `True` | Include header comments in SQL |
| `verbose_logging` | bool | `True` | Enable detailed logging |

## Restore Backup

To restore a backup created by this plugin:

```bash
# Simple restore
psql -U postgres -d mydb < backup.sql

# With specific schema
psql -U postgres -d mydb -f backup.sql

# Remote database
psql -h remote-host -U postgres -d mydb < backup.sql

# Restore with progress monitoring
pv backup.sql | psql -U postgres -d mydb

# Restore to a different schema (if backup includes schema setup)
psql -U postgres -d mydb < backup_with_schema.sql
```

## Real-World Examples

### Example 1: Multi-Tenant Backup

Backup each tenant's schema separately:

```python
from postgres_backup_plugin import PostgresBackupEngine

engine = PostgresBackupEngine.from_django_settings()

# List of tenant schemas
tenant_schemas = ['tenant_001', 'tenant_002', 'tenant_003']

for tenant_schema in tenant_schemas:
    output_file = f'/backups/{tenant_schema}_backup.sql'

    result = engine.backup(
        output_path=output_file,
        source_schema=tenant_schema,
        metadata={'tenant': tenant_schema, 'backup_type': 'daily'}
    )

    if result.success:
        print(f"✓ Backed up {tenant_schema}: {result.tables_count} tables, {result.total_rows:,} rows")
    else:
        print(f"✗ Failed to backup {tenant_schema}: {result.error_message}")
```

### Example 2: Cross-Schema Migration

Backup from production schema and restore to staging:

```python
# Step 1: Backup from production schema
result = engine.backup(
    output_path='/tmp/production_backup.sql',
    source_schema='production',       # FROM production
    schema_name='staging'             # TO staging (in restore)
)

# Step 2: Restore to staging schema (on same or different database)
# $ psql -U postgres -d mydb < /tmp/production_backup.sql
```

### Example 3: Clean Production Data for Development

Backup with filtering and cleaning for safe dev/test usage:

```python
from postgres_backup_plugin.filters import DateRangeFilter, StatusFilter
from datetime import datetime, timedelta

# Get last 30 days of data only
thirty_days_ago = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d')
today = datetime.now().strftime('%Y-%m-%d')

filters = {
    'orders': DateRangeFilter('created_at', thirty_days_ago, today),
    'users': StatusFilter('status', ['test', 'demo']),  # Only test users
    'payments': 'SELECT * FROM payments WHERE amount < 100'  # Small transactions only
}

backup_config = BackupConfig(
    clean_output=True,
    target_schema='dev_data',
    excluded_tables=['audit_logs', 'sessions', 'cache']
)

engine = PostgresBackupEngine(db_config, backup_config)

result = engine.backup(
    output_path='/tmp/dev_safe_backup.sql',
    source_schema='production',
    schema_name='dev_data',
    filters=filters,
    metadata={'purpose': 'dev_environment', 'sanitized': True}
)
```

### Example 4: Automated Daily Backups

```python
import schedule
import time
from datetime import datetime
from postgres_backup_plugin import PostgresBackupEngine, S3Exporter

def daily_backup():
    engine = PostgresBackupEngine.from_django_settings()

    # Create timestamped backup
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    backup_file = f'/tmp/backup_{timestamp}.sql'

    result = engine.backup(
        output_path=backup_file,
        metadata={'type': 'daily', 'timestamp': timestamp}
    )

    if result.success:
        # Upload to S3
        exporter = S3Exporter(
            bucket='my-backups',
            prefix=f'daily/{datetime.now().strftime("%Y/%m")}/',
            delete_local=True
        )

        s3_url = exporter.export(backup_file)
        print(f"✓ Daily backup completed: {s3_url}")
    else:
        print(f"✗ Backup failed: {result.error_message}")
        # Send alert notification here

# Schedule daily at 2 AM
schedule.every().day.at("02:00").do(daily_backup)

while True:
    schedule.run_pending()
    time.sleep(60)
```

## Performance

### Benchmark (1GB table, 10 million rows)

| Method | Backup Time | RAM Usage | Disk I/O |
|--------|-------------|-----------|----------|
| **This Plugin** | ~30-35s | ~50MB | 1GB |
| INSERT statements | ~120s | ~50MB | 1GB |
| pg_dump (custom) | ~40s | ~50MB | 1GB |
| Temp CSV method | ~45s | ~50MB | 2GB |

### Why Fast?

- Direct streaming from PostgreSQL to file
- COPY format (native PostgreSQL bulk export)
- No intermediate files
- Minimal memory usage

## Architecture

```
postgres_backup_plugin/
├── core/
│   ├── backup_engine.py      # Main backup logic
│   ├── stream_wrapper.py     # Direct streaming
│   └── query_builder.py      # SQL utilities
├── filters/
│   ├── base.py               # Filter interface
│   └── common_filters.py     # Pre-built filters
├── exporters/
│   ├── base.py               # Exporter interface
│   ├── file_exporter.py      # Local file
│   └── s3_exporter.py        # AWS S3
├── config.py                 # Configuration classes
└── exceptions.py             # Custom exceptions
```

## API Reference

### BackupEngine.backup()

```python
def backup(
    self,
    output_path: str,
    filters: Optional[Dict[str, Any]] = None,
    schema_name: Optional[str] = None,
    metadata: Optional[Dict[str, Any]] = None,
    source_schema: Optional[str] = 'public'
) -> BackupResult
```

**Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `output_path` | str | Required | Output file path for backup SQL |
| `filters` | Dict[str, Any] | `None` | Per-table filters for selective backup |
| `schema_name` | Optional[str] | `None` | Target schema name for restore (creates schema in SQL) |
| `metadata` | Optional[Dict] | `None` | Custom metadata to include in backup header |
| `source_schema` | Optional[str] | `'public'` | Source schema to backup from |

**Returns:** `BackupResult` object with:
- `success`: bool - Whether backup succeeded
- `file_path`: str - Path to backup file
- `file_size`: int - Backup file size in bytes
- `tables_count`: int - Number of tables backed up
- `total_rows`: int - Total rows backed up
- `duration`: float - Backup duration in seconds
- `error_message`: Optional[str] - Error message if failed
- `metadata`: Dict - Backup metadata

**Example:**
```python
result = engine.backup(
    output_path='/tmp/backup.sql',
    source_schema='production',      # Backup FROM this schema
    schema_name='staging',           # Restore TO this schema
    filters={'users': 'SELECT * FROM users WHERE active = true'},
    metadata={'purpose': 'migration', 'ticket': 'PROJ-123'}
)
```

### BackupConfig Options

Complete list of configuration options:

```python
BackupConfig(
    # Core options
    excluded_tables: List[str] = [],
    buffer_size: int = 8192,
    timeout: int = 3600,
    encoding: str = 'utf-8',

    # SQL Cleaning options
    clean_output: bool = True,
    target_schema: Optional[str] = None,

    # Performance options
    disable_triggers: bool = True,
    disable_fsync: bool = True,

    # Output options
    include_header: bool = True,
    verbose_logging: bool = True,

    # COPY format options
    copy_delimiter: str = '\\t',
    copy_null_string: str = '\\N',
    copy_quote_char: str = '\\b',
    copy_escape_char: str = '\\b'
)
```

## Troubleshooting

### Common Issues

**1. Permission Denied Error**
```
PermissionError: [Errno 13] Permission denied: '/path/to/backup.sql'
```
**Solution:** Ensure the output directory exists and is writable:
```python
import os
os.makedirs(os.path.dirname(output_path), exist_ok=True)
```

**2. Schema Does Not Exist**
```
psycopg2.errors.InvalidSchemaName: schema "my_schema" does not exist
```
**Solution:** Verify the schema exists:
```sql
SELECT schema_name FROM information_schema.schemata WHERE schema_name = 'my_schema';
```

**3. Connection Timeout**
```
psycopg2.OperationalError: timeout expired
```
**Solution:** Increase timeout in configuration:
```python
backup_config = BackupConfig(timeout=7200)  # 2 hours
```

**4. Large Backup Files**
```
File is too large (>2GB) for processing
```
**Solution:** Use compression or split by schema/table:
```bash
# Compress on-the-fly
python backup_script.py | gzip > backup.sql.gz

# Or split by schema
for schema in schema1 schema2 schema3; do
    python backup_script.py --schema=$schema
done
```

## Requirements

- Python 3.7+
- PostgreSQL 9.5+
- psycopg2 or psycopg2-binary
- boto3 (optional, for S3 export)
- schedule (optional, for automated backups)

## Changelog

### Version 1.1.0 (Latest)
- ✨ Added multi-schema support with `source_schema` parameter
- ✨ Automatic SQL cleaning (remove schema prefixes, psql commands)
- 🔧 Improved COPY data preservation during cleaning
- 📝 Enhanced documentation with real-world examples
- 🐛 Fixed pg_dump psql meta-command handling

### Version 1.0.0
- 🎉 Initial release
- ⚡ Direct streaming with COPY format
- 🎯 Flexible per-table filtering
- 📤 S3 export support
- 🔧 Django integration

## License

MIT License

## Contributing

Contributions welcome! Please open an issue or PR.

### Development Setup

```bash
# Clone repository
git clone https://github.com/yourusername/postgres_backup_plugin.git
cd postgres_backup_plugin

# Install dependencies
pip install -r requirements.txt

# Run tests
python -m pytest tests/

# Run specific test
python test_source_schema_detailed.py
```
