Metadata-Version: 2.1
Name: owid-repack
Version: 0.1.0
Summary: Pack Pandas data frames into smaller, more memory-efficient data types.
Home-page: https://github.com/owid/owid-catalog-py
License: MIT
Author: Our World In Data
Author-email: tech@ourworldindata.org
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: numpy (>=1.24.0,<2.0.0)
Requires-Dist: pandas (>=1.5.2,<2.0.0)
Project-URL: Repository, https://github.com/owid/owid-catalog-py
Description-Content-Type: text/markdown

# owid-repack-py

![version](https://img.shields.io/badge/python-3.7—3.11-blue.svg?&logo=python&logoColor=yellow)

_Pack Pandas DataFrames into smaller, more memory efficient types._

## Overview

When you load data into Pandas, it will use standard types by default:

- `object` for strings
- `int64` for integers
- `float64` for floating point numbers

However, for many datasets there is a much more compact representation that Pandas could be using for that data. Using a more compact representation leads to lower memory usage, and smaller binary files on disk when using formats such as Feather and Parquet.

This library does just one thing: it shrinks your data frames to use smaller types.

## Installing

`pip install owid-repack`

## Usage

The `owid.repack` module exposes two methods, `repack_series()` and `repack_frame()`.

`repack_series()` will detect the smallest type that can accurately fit the existing data in the series.

```ipython
In [1]: from owid import repack

In [2]: pd.Series([1, 2, 3])
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: repack.repack_series(pd.Series([1.5, 2, 3]))
Out[3]:
0    1.5
1    2.0
2    3.0
dtype: float32

In [4]: repack.repack_series(pd.Series([1, None, 3]))
Out[4]:
0       1
1    <NA>
2       3
dtype: UInt8

In [5]: repack.repack_series(pd.Series([-1, None, 3]))
Out[5]:
0      -1
1    <NA>
2       3
dtype: Int8
```

The `repack_frame()` method simply does this across every column in your DataFrame, returning a new DataFrame.

## Releases

- `0.1.0`:
  - Migrate first version from `owid-catalog-py` repo

