Metadata-Version: 2.4
Name: splurge-dsv
Version: 2025.1.1
Summary: A utility library for working with DSV (Delimited String Values) files
Author: Jim Schilling
License-Expression: MIT
Project-URL: Homepage, https://github.com/jim-schilling/splurge-dsv
Project-URL: Repository, https://github.com/jim-schilling/splurge-dsv
Project-URL: Documentation, https://github.com/jim-schilling/splurge-dsv#readme
Project-URL: Bug Tracker, https://github.com/jim-schilling/splurge-dsv/issues
Keywords: dsv,csv,tsv,delimited,parsing,file-processing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Filters
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-xdist>=3.0.0; extra == "dev"
Dynamic: license-file

# splurge-dsv

A robust Python library for parsing and processing delimited-separated value (DSV) files with advanced features for data validation, streaming, and error handling.

## Features

### 🔧 Core Functionality
- **Multi-format DSV Support**: Parse CSV, TSV, pipe-delimited, semicolon-delimited, and custom delimiter files
- **Flexible Parsing Options**: Configurable whitespace handling, bookend removal, and encoding support
- **Memory-Efficient Streaming**: Process large files without loading entire content into memory
- **Header/Footer Skipping**: Skip specified numbers of rows from start or end of files
- **Unicode Support**: Full Unicode character and delimiter support

### 🛡️ Security & Validation
- **Path Validation**: Comprehensive file path security validation with traversal attack prevention
- **File Permission Checks**: Automatic file accessibility and permission validation
- **Encoding Validation**: Robust encoding error detection and handling
- **Resource Management**: Automatic file handle cleanup and resource management

### 📊 Advanced Processing
- **Chunked Processing**: Configurable chunk sizes for streaming large datasets
- **Mixed Content Handling**: Support for quoted and unquoted values in the same file
- **Line Ending Flexibility**: Automatic handling of different line ending formats
- **Error Recovery**: Graceful error handling with detailed error messages

### 🧪 Testing & Quality
- **Comprehensive Test Suite**: 90%+ code coverage with 250+ tests
- **Cross-Platform Support**: Tested on Windows, Linux, and macOS
- **Type Safety**: Full type annotations and validation
- **Documentation**: Complete API documentation with examples

## Installation

```bash
pip install splurge-dsv
```

## Quick Start

### Basic CSV Parsing

```python
from splurge_dsv import DsvHelper

# Parse a simple CSV string
data = DsvHelper.parse("a,b,c", delimiter=",")
print(data)  # ['a', 'b', 'c']

# Parse a CSV file
rows = DsvHelper.parse_file("data.csv", delimiter=",")
for row in rows:
    print(row)  # ['col1', 'col2', 'col3']
```

### Streaming Large Files

```python
from splurge_dsv import DsvHelper

# Stream a large CSV file in chunks
for chunk in DsvHelper.parse_stream("large_file.csv", delimiter=",", chunk_size=1000):
    for row in chunk:
        process_row(row)
```

### Advanced Parsing Options

```python
from splurge_dsv import DsvHelper

# Parse with custom options
data = DsvHelper.parse(
    '"a","b","c"',
    delimiter=",",
    bookend='"',
    strip=True,
    bookend_strip=True
)
print(data)  # ['a', 'b', 'c']

# Skip header and footer rows
rows = DsvHelper.parse_file(
    "data.csv",
    delimiter=",",
    skip_header_rows=1,
    skip_footer_rows=2
)
```

### Text File Operations

```python
from splurge_dsv import TextFileHelper

# Count lines in a file
line_count = TextFileHelper.line_count("data.txt")

# Preview first N lines
preview = TextFileHelper.preview("data.txt", max_lines=10)

# Read entire file with options
lines = TextFileHelper.read(
    "data.txt",
    strip=True,
    skip_header_rows=1,
    skip_footer_rows=1
)

# Stream file content
for chunk in TextFileHelper.read_as_stream("large_file.txt", chunk_size=500):
    process_chunk(chunk)
```

### Path Validation

```python
from splurge_dsv import PathValidator

# Validate a file path
valid_path = PathValidator.validate_path(
    "data.csv",
    must_exist=True,
    must_be_file=True,
    must_be_readable=True
)

# Check if path is safe
is_safe = PathValidator.is_safe_path("user_input_path.txt")
```

## API Reference

### DsvHelper

Main class for DSV parsing operations.

#### Methods

- `parse(content, delimiter, strip=True, bookend=None, bookend_strip=True)` - Parse a single string
- `parses(content_list, delimiter, strip=True, bookend=None, bookend_strip=True)` - Parse multiple strings
- `parse_file(file_path, delimiter, strip=True, bookend=None, bookend_strip=True, skip_header_rows=0, skip_footer_rows=0, encoding='utf-8')` - Parse a file
- `parse_stream(file_path, delimiter, strip=True, bookend=None, bookend_strip=True, skip_header_rows=0, skip_footer_rows=0, encoding='utf-8', chunk_size=500)` - Stream parse a file

### TextFileHelper

Utility class for text file operations.

#### Methods

- `line_count(file_path, encoding='utf-8')` - Count lines in a file
- `preview(file_path, max_lines=100, strip=True, encoding='utf-8', skip_header_rows=0)` - Preview file content
- `read(file_path, strip=True, encoding='utf-8', skip_header_rows=0, skip_footer_rows=0)` - Read entire file
- `read_as_stream(file_path, strip=True, encoding='utf-8', skip_header_rows=0, skip_footer_rows=0, chunk_size=500)` - Stream read file

### PathValidator

Security-focused path validation utilities.

#### Methods

- `validate_path(file_path, must_exist=False, must_be_file=False, must_be_readable=False, allow_relative=False, base_directory=None)` - Validate file path
- `is_safe_path(file_path)` - Check if path is safe
- `sanitize_filename(filename, default_name='file')` - Sanitize filename

### ResourceManager

Context managers for safe resource handling.

#### Classes

- `FileResourceManager` - Context manager for file operations
- `StreamResourceManager` - Context manager for stream operations

#### Functions

- `safe_file_operation(file_path, mode='r', encoding='utf-8', ...)` - Safe file operation context manager
- `safe_stream_operation(stream, auto_close=True)` - Safe stream operation context manager

## Error Handling

The library provides comprehensive error handling with custom exception classes:

- `SplurgeParameterError` - Invalid parameter values
- `SplurgeFileNotFoundError` - File not found
- `SplurgeFilePermissionError` - File permission issues
- `SplurgeFileEncodingError` - File encoding problems
- `SplurgePathValidationError` - Path validation failures
- `SplurgeResourceAcquisitionError` - Resource acquisition failures
- `SplurgeResourceReleaseError` - Resource cleanup failures

## Development

### Running Tests

```bash
# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=splurge_dsv --cov-report=html

# Run specific test file
pytest tests/test_dsv_helper.py -v
```

### Code Quality

The project follows strict coding standards:
- PEP 8 compliance
- Type annotations for all functions
- Google-style docstrings
- 90%+ test coverage requirement
- Comprehensive error handling

## Changelog

### 2025.1.1 (2025-01-XX)

#### 🔧 Code Quality Improvements
- **Refactored Complex Regex Logic**: Extracted Windows drive letter validation logic from `_check_dangerous_characters` into a dedicated `_is_valid_windows_drive_pattern` helper method in `PathValidator` for better readability and maintainability
- **Exception Handling Consistency**: Fixed inconsistency in `ResourceManager.acquire()` method to properly re-raise `NotImplementedError` without wrapping it in `SplurgeResourceAcquisitionError`
- **Import Organization**: Moved all imports to the top of modules across the entire codebase for better code structure and PEP 8 compliance

#### 🧪 Testing Enhancements
- **Public API Focus**: Removed all tests that validated private implementation details, focusing exclusively on public API behavior validation
- **Comprehensive Resource Manager Tests**: Added extensive test suite for `ResourceManager` module covering all public methods, edge cases, error scenarios, and context manager behavior
- **Bookend Logic Clarification**: Updated and corrected all tests related to `StringTokenizer.remove_bookends` to properly reflect its single-character, symmetric bookend matching behavior
- **Path Validation Test Clarity**: Clarified test expectations and comments for Windows drive-relative paths (e.g., "C:file.txt") to reflect the validator's intentionally strict security design

#### 🐛 Bug Fixes
- **Test Reliability**: Fixed failing tests in `ResourceManager` context manager scenarios by properly handling file truncation and line ending normalization
- **Ruff Compliance**: Resolved all linting warnings including unused variables and imports

#### 📚 Documentation Updates
- **Method Documentation**: Updated `ResourceManager.acquire()` docstring to include `NotImplementedError` in the Raises section
- **Test Comments**: Enhanced test documentation with clearer explanations of expected behaviors and edge cases

### 2025.1.0 (2025-08-25)

#### 🎉 Major Features
- **Complete DSV Parser**: Full-featured delimited-separated value parser with support for CSV, TSV, and custom delimiters
- **Streaming Support**: Memory-efficient streaming for large files with configurable chunk sizes
- **Advanced Parsing Options**: Bookend removal, whitespace handling, and encoding support
- **Header/Footer Skipping**: Skip specified numbers of rows from start or end of files

#### 🛡️ Security Enhancements
- **Path Validation System**: Comprehensive file path security validation with traversal attack prevention
- **File Permission Checks**: Automatic file accessibility and permission validation
- **Encoding Validation**: Robust encoding error detection and handling

#### 🔧 Core Components
- **DsvHelper**: Main DSV parsing class with parse, parses, parse_file, and parse_stream methods
- **TextFileHelper**: Utility class for text file operations (line counting, preview, reading, streaming)
- **PathValidator**: Security-focused path validation utilities
- **ResourceManager**: Context managers for safe resource handling
- **StringTokenizer**: Core string parsing functionality

#### 🧪 Testing & Quality
- **Comprehensive Test Suite**: 250+ tests with 90%+ code coverage
- **Cross-Platform Testing**: Tested on Windows, Linux, and macOS
- **Type Safety**: Full type annotations throughout the codebase
- **Error Handling**: Custom exception hierarchy with detailed error messages

#### 📚 Documentation
- **Complete API Documentation**: Google-style docstrings for all public methods
- **Usage Examples**: Comprehensive examples for all major features
- **Error Documentation**: Detailed error handling documentation

#### 🚀 Performance
- **Memory Efficiency**: Streaming support for large files
- **Optimized Parsing**: Efficient string tokenization and processing
- **Resource Management**: Automatic cleanup and resource management

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

## Support

For support, please open an issue on the GitHub repository or contact the maintainers.
