Metadata-Version: 2.1
Name: preprocess_toolbox
Version: 0.0.2
Summary: A preprocessing toolbox for converting datasets into assets ready for ML based workloads
Author-email: James Byrne <digitalinnovation@bas.ac.uk>
Maintainer-email: James Byrne <digitalinnovation@bas.ac.uk>
License: MIT License
        
        Copyright (c) 2024, British Antarctic Survey
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
        
Project-URL: bugs, https://github.com/environmental-forecasting/preprocess_toolbox/issues
Project-URL: changelog, https://github.com/environmental-forecasting/preprocess_toolbox/blob/master/changelog.md
Project-URL: homepage, https://github.com/environmental-forecasting/preprocess_toolbox
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/x-rst
License-File: LICENSE
License-File: AUTHORS.rst
Requires-Dist: download-toolbox
Requires-Dist: orjson
Provides-Extra: dev
Requires-Dist: watchdog; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: tox; extra == "dev"
Requires-Dist: coverage; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: importlib-metadata; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Provides-Extra: docs
Requires-Dist: jinja2; extra == "docs"
Requires-Dist: Sphinx; extra == "docs"
Requires-Dist: myst-parser; extra == "docs"

# Preprocessing Toolbox

![GitHub issues](https://img.shields.io/github/issues/environmental-forecasting/preprocess-toolbox?style=plastic)
![GitHub closed issues](https://img.shields.io/github/issues-closed/environmental-forecasting/preprocess-toolbox?style=plastic)
![GitHub](https://img.shields.io/github/license/environmental-forecasting/preprocess-toolbox)
![GitHub forks](https://img.shields.io/github/forks/environmental-forecasting/preprocess-toolbox?style=social)
![GitHub forks](https://img.shields.io/github/stars/environmental-forecasting/preprocess-toolbox?style=social)

This is the preprocessing library for taking `download-toolbox` datasets and
combining / composing multi-source data loaders that can be used to cache or supply downstream applications.

This is only just getting started, more info will appear soon.

Contact `jambyr <at> bas <dot> ac <dot> uk` if you want further information.

## Table of contents

* [Overview](#overview)
* [Installation](#installation)
* [Implementation](#implementation)
* [Contributing](#contributing)
* [Credits](#credits)
* [License](#license)

## Installation

Not currently released to pip.

Please refer to [the contribution guidelines for more information.](CONTRIBUTING.rst)

## Implementation

When installed, the library will provide a series of CLI commands. Please use
the `--help` switch for more initial information, or the documentation.

### Basic principles

The library provides the ability to preprocess `download-toolbox` datasets and create singular configurations for reading out the data in a multi-channel format for dataset construction:

1. Preprocess datasets from download-toolbox so that the dataset is continuous and normalised for the downstream application
1. Generate a loader configuration, applying additional metadata (arbitrary channels and masks) providing initial access to the collected data
1. Use this data loader to produce usable application datasets for downstream applications (testing with IceNet and another internal application)

This is a base library upon which application specific processing is based, lowering the implementation overhead for creating multi-source datasets for environmental applications that require integration of data from sources that `download-toolbox` provide.

This library doesn't have knowledge of those datasets, it forms the basis for processing things specific to an application by importing application-specific logic dynamically. See [this issue](https://github.com/environmental-forecasting/preprocess-toolbox/issues/1) for a quick idea of how this works with the [IceNet workflow](https://github.com/icenet-ai/icenet).

## Limitations

There are some major limitations to this as a general purpose tool, these will hopefully be dealt with in time! I'm raising issues as I go

**This is currently very heavy development functionality, but the following commands already work**:

* preprocess_missing_spatial - poorly at present due to missing mask backref implementation
* preprocess_missing_time
* preprocess_regrid
* preprocess_rotate
* preprocess_dataset
* preprocess_loader_init
* preprocess_add_mask
* preprocess_add_channel

Other stubs probably don't work, unless I forgot to update these docs!

## Contributing

Please refer to [the contribution guidelines for more information.](CONTRIBUTING.rst)

## Credits

<a href="https://github.com/environmental-forecasting/preprocess-toolbox/graphs/contributors"><img src="https://contrib.rocks/image?repo=environmental-forecasting/preprocess-toolbox" /></a>

## License

This is licensed using the [MIT License](LICENSE)
