Metadata-Version: 2.4
Name: carp-analytics-python
Version: 0.1.0
Summary: A high-performance Python library for processing and analysing data from CARP (Copenhagen Research Platform) clinical studies
Project-URL: Homepage, https://carp.dk
Project-URL: Documentation, https://docs.carp.dk
Project-URL: Repository, https://github.com/carp-dk/carp-analytics-python.git
Project-URL: Issues, https://github.com/carp-dk/carp-analytics-python/issues
Project-URL: Changelog, https://github.com/carp-dk/carp-analytics-python/blob/main/CHANGELOG.md
Author-email: CARP Team <support@carp.dk>
Maintainer-email: CARP Team <support@carp.dk>
License-Expression: MIT
License-File: LICENSE
Keywords: carp,clinical-studies,data-processing,health-data,json-streaming,mhealth,pandas,parquet,research
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: ijson>=3.2.0
Requires-Dist: rich>=13.0.0
Requires-Dist: tqdm>=4.65.0
Provides-Extra: all
Requires-Dist: folium>=0.14.0; extra == 'all'
Requires-Dist: matplotlib>=3.7.0; extra == 'all'
Requires-Dist: numba>=0.57.0; extra == 'all'
Requires-Dist: numpy>=1.24.0; extra == 'all'
Requires-Dist: pandas>=2.0.0; extra == 'all'
Requires-Dist: pyarrow>=14.0.0; extra == 'all'
Requires-Dist: scikit-learn>=1.3.0; extra == 'all'
Requires-Dist: scipy>=1.10.0; extra == 'all'
Provides-Extra: pandas
Requires-Dist: pandas>=2.0.0; extra == 'pandas'
Requires-Dist: pyarrow>=14.0.0; extra == 'pandas'
Provides-Extra: science
Requires-Dist: numpy>=1.24.0; extra == 'science'
Requires-Dist: scikit-learn>=1.3.0; extra == 'science'
Requires-Dist: scipy>=1.10.0; extra == 'science'
Provides-Extra: viz
Requires-Dist: folium>=0.14.0; extra == 'viz'
Requires-Dist: matplotlib>=3.7.0; extra == 'viz'
Description-Content-Type: text/markdown

# CARP Analytics Python

[![PyPI version](https://badge.fury.io/py/carp-analytics-python.svg)](https://badge.fury.io/py/carp-analytics-python)
[![Python versions](https://img.shields.io/pypi/pyversions/carp-analytics-python.svg)](https://pypi.org/project/carp-analytics-python/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A high-performance Python library for processing and analysing data from [CARP](https://carp.dk/) (Copenhagen Research Platform) studies.

## Features

- **Streaming JSON Parsing**: Uses `ijson` to handle very large JSON files with minimal memory footprint
- **Schema Discovery**: Automatically scans and infers the schema of the data
- **Data Grouping**: Efficiently groups data by any field (e.g., data type, device ID) into separate files
- **Parquet Export**: Convert JSON data to Parquet for faster subsequent analysis
- **Participant Management**: Link and track participants across multiple study phases
- **Visualization**: Generate location heatmaps and other visualizations
- **Pandas Integration**: Seamlessly work with DataFrames
- **Rich Terminal Output**: Beautiful progress bars and formatted tables

## Installation

### Basic Installation

```bash
pip install carp-analytics-python
```

### With Optional Dependencies

```bash
# For pandas/parquet support
pip install carp-analytics-python[pandas]

# For visualization support
pip install carp-analytics-python[viz]

# For scientific computing (numpy, scipy, scikit-learn)
pip install carp-analytics-python[science]

# Install everything
pip install carp-analytics-python[all]
```

### Development Installation

```bash
git clone https://github.com/carp-dk/carp-analytics-python.git
cd carp-analytics-python

# Using uv (recommended)
uv sync

# Or using pip
pip install -e .
```

## Quick Start

```python
from carp import CarpDataStream

# Initialize with a data file
data = CarpDataStream("data/study-phase-1/data-streams.json")

# Scan and print the schema
data.print_schema()

# Convert to Parquet for faster analysis
data.convert_to_parquet("output_parquet")

# Load data as a DataFrame
df = data.get_dataframe("dk.cachet.carp.stepcount", "output_parquet")
print(df.head())
```

## Working with Participants

```python
from carp import CarpDataStream

# Load data from multiple phases
data = CarpDataStream([
    "data/phase-1/data-streams.json",
    "data/phase-2/data-streams.json",
])

# Print participant summary
data.print_participants()

# Access participant data via email
participant = data.participant("user@example.com")

# Get participant info
print(participant.info())

# Get available data types for this participant
participant.print_data_types()

# Get a DataFrame of step count data
df = participant.dataframe("dk.cachet.carp.stepcount", "output_parquet")
```

## Data Export

```python
# Export specific data type to JSON
data.export_to_json("heartbeat_data.json", data_type="dk.cachet.carp.heartbeat")

# Group data by data type
data.group_by_field("dataStream.dataType.name", "output_by_type")

# Group data by participant
data.group_by_participant("output_by_participant")
```

## Visualization

```python
# Generate location heatmap for a participant
participant = data.participant("user@example.com")
participant.visualize.location(output_file="user_locations.html")
```

## Command Line Interface

The package includes a CLI for common operations:

```bash
# Show schema of data files
carp schema data/study/data-streams.json

# Convert JSON to Parquet
carp convert data/study/data-streams.json -o output_parquet

# Count items in data files
carp count data/study/data-streams.json

# List participants
carp participants data/study/data-streams.json

# Export filtered data
carp export data/study/data-streams.json -o output.json -t dk.cachet.carp.stepcount

# Group data by field
carp group data/study/data-streams.json -f dataStream.dataType.name -o grouped_output
```

## API Reference

### `CarpDataStream`

The main class for working with CARP data streams.

| Method | Description |
|--------|-------------|
| `scan_schema()` | Scan and infer the data schema |
| `print_schema()` | Print the inferred schema as a table |
| `convert_to_parquet(output_dir)` | Convert JSON to Parquet files |
| `get_dataframe(data_type, parquet_dir)` | Load data as a pandas DataFrame |
| `export_to_json(output_path, data_type)` | Export data to JSON file |
| `group_by_field(field_path, output_dir)` | Group data by a specific field |
| `participant(email)` | Access participant data via fluent API |
| `print_participants()` | Print participant summary table |

### `ParticipantAccessor`

Fluent API for accessing individual participant data.

| Method | Description |
|--------|-------------|
| `info()` | Get participant information as a dictionary |
| `print_info()` | Print participant info as a table |
| `all_data(data_type)` | Generator for all participant data |
| `data_types()` | Get all unique data types |
| `dataframe(data_type, parquet_dir)` | Get data as a pandas DataFrame |
| `visualize.location()` | Generate location heatmap |

## Requirements

- Python 3.10+
- ijson (for streaming JSON parsing)
- rich (for terminal output)
- tqdm (for progress bars)

Optional:
- pandas, pyarrow (for DataFrame and Parquet support)
- matplotlib, folium (for visualization)
- numpy, scipy, scikit-learn (for scientific computing)

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## Licence

This project is licensed under the MIT Licence - see the [Licence](LICENSE) file for details.

## Acknowledgments

- [CARP - Copenhagen Research Platform](https://carp.dk/)