Metadata-Version: 2.3
Name: ebdamame
Version: 0.3.0
Summary: A scraper to library to scrape .docx files with 'Entscheidungsbaumdiagramm' tables into a truely machine readable structure
Project-URL: Changelog, https://github.com/Hochfrequenz/ebdamame/releases
Project-URL: Homepage, https://github.com/Hochfrequenz/ebdamame
Author-email: Hochfrequenz Unternehmensberatung GmbH <info@hochfrequenz.de>
License: GPL
License-File: LICENSE
Keywords: EBD,Energiewirtschaft,Marktkommunikation
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Requires-Dist: attrs
Requires-Dist: click
Requires-Dist: more-itertools
Requires-Dist: python-docx
Requires-Dist: rebdhuhn>=0.4.1
Provides-Extra: coverage
Requires-Dist: coverage==7.6.4; extra == 'coverage'
Provides-Extra: formatting
Requires-Dist: black==24.10.0; extra == 'formatting'
Requires-Dist: isort==5.13.2; extra == 'formatting'
Provides-Extra: linting
Requires-Dist: pylint==3.3.1; extra == 'linting'
Provides-Extra: spellcheck
Requires-Dist: codespell==2.3.0; extra == 'spellcheck'
Provides-Extra: test-packaging
Requires-Dist: build==1.2.2.post1; extra == 'test-packaging'
Requires-Dist: twine==5.1.1; extra == 'test-packaging'
Provides-Extra: tests
Requires-Dist: pytest-datafiles==3.0.0; extra == 'tests'
Requires-Dist: pytest-subtests==0.13.1; extra == 'tests'
Requires-Dist: pytest==8.3.3; extra == 'tests'
Requires-Dist: syrupy==4.7.2; extra == 'tests'
Provides-Extra: type-check
Requires-Dist: mypy==1.13.0; extra == 'type-check'
Description-Content-Type: text/markdown

# ebdamame

[![License: GPL](https://img.shields.io/badge/License-GPL-yellow.svg)](LICENSE)
![Python Versions (officially) supported](https://img.shields.io/pypi/pyversions/ebdamame.svg)
![Unittests status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Unittests/badge.svg)
![Coverage status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Coverage/badge.svg)
![Linting status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Linting/badge.svg)
![Formatting status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Formatting/badge.svg)
![PyPi Status Badge](https://img.shields.io/pypi/v/ebdamame)

🇩🇪 Dieses Repository enthält ein Python-Paket namens [`ebdamame`](https://pypi.org/project/ebdamame) (früher: `ebddocx2table`), das genutzt werden kann, um aus .docx-Dateien maschinenlesbare Tabellen, die einen Entscheidungsbaum (EBD) modellieren, zu extrahieren (scrapen).
Diese Entscheidungsbäume sind Teil eines regulatorischen Regelwerks für die deutsche Energiewirtschaft und kommen in der Eingangsprüfung der Marktkommunikation zum Einsatz.
Die mit diesem Paket erstellten maschinenlesbaren Tabellen können mit [`rebdhuhn`](https://pypi.org/project/rebdhuhn) (früher: `ebdtable2graph`) in echte Graphen und Diagramme umgewandelt werden.
Exemplarische Ergebnisse des Scrapings finden sich als .json-Dateien im Repository [`machine-readable_entscheidungsbaumdiagramme`](https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/).

🇬🇧 This repository contains the source code of the Python package [`ebdamame`](https://pypi.org/project/ebdamame) (formerly published as `ebddocx2table`).

## Rationale

Assume that you want to analyse or visualize the Entscheidungsbaumdiagramme (EBD) by EDI@Energy.
The website edi-energy.de, as always, only provides you with PDF or Word files instead of _really_ digitized data.

The package `ebdamame` scrapes the `.docx` files and returns data in a model defined in the "sister" package [`rebdhuhn`](https://pypi.org/project/rebdhuhn) (formerly known as `ebdtable2graph`).

Once you scraped the data (using this package) you can plot it with [`rebdhuhn`](https://pypi.org/project/rebdhuhn).

## How to use the package

In any case, install the repo from PyPI:

```bash
pip install ebdamame
```

### Use as a library

```python
import json
from pathlib import Path

import cattrs

from ebdamame import TableNotFoundError, get_all_ebd_keys, get_ebd_docx_tables  # type:ignore[import]
from ebdamame.docxtableconverter import DocxTableConverter  # type:ignore[import]

docx_file_path = Path("unittests/test_data/ebd20230629_v34.docx")
# download this .docx File from edi-energy.de or find it in the unittests of this repository.
# https://github.com/Hochfrequenz/ebddocx2table/blob/main/unittests/test_data/ebd20230629_v34.docx
docx_tables = get_ebd_docx_tables(docx_file_path, ebd_key="E_0003")
converter = DocxTableConverter(
    docx_tables,
    ebd_key="E_0003",
    chapter="MaBiS",
    sub_chapter="7.42.1: AD: Bestellung der Aggregationsebene der Bilanzkreissummenzeitreihe auf Ebene der Regelzone",
)
result = converter.convert_docx_tables_to_ebd_table()
with open(Path("E_0003.json"), "w+", encoding="utf-8") as result_file:
    # the result file can be found here:
    # https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/tree/main/FV2310
    json.dump(cattrs.unstructure(result), result_file, ensure_ascii=False, indent=2, sort_keys=True)
```

### Use as a CLI tool

_to be written_

## How to use this Repository on Your Machine (for development)

Please follow the instructions in our
[Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine).
And for further information, see the [Tox Repository](https://github.com/tox-dev/tox).

## Contribute

You are very welcome to contribute to this template repository by opening a pull request against the main branch.

## Related Tools and Context

This repository is part of the [Hochfrequenz Libraries and Tools for a truly digitized market communication](https://github.com/Hochfrequenz/digital_market_communication/).
