Metadata-Version: 2.1
Name: openmetadata-ingestion
Version: 0.2.1
Summary: Ingestion Framework for OpenMetadata
Home-page: https://open-metadata.org/
Author: OpenMetadata Committers
License: Apache License 2.0
Project-URL: Documentation, https://docs.open-metadata.org/
Project-URL: Source, https://github.com/open-metadata/OpenMetadata
Platform: UNKNOWN
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python-jose (==3.3.0)
Requires-Dist: google-auth (>=1.33.0)
Requires-Dist: wheel (~=0.36.2)
Requires-Dist: pandas (~=1.3.1)
Requires-Dist: commonregex
Requires-Dist: sql-metadata (~=2.0.0)
Requires-Dist: requests (~=2.25.1)
Requires-Dist: pydantic[email] (>=1.7.2)
Requires-Dist: google (>=3.0.0)
Requires-Dist: click (<7.2.0,>=7.1.1)
Requires-Dist: elasticsearch (<8.0.0,>=7.0.0)
Requires-Dist: typing-inspect
Requires-Dist: spacy (==3.0.5)
Requires-Dist: okta (==1.7.0)
Requires-Dist: email-validator (>=1.0.3)
Requires-Dist: pydantic (~=1.7.4)
Requires-Dist: requests (>=2.25.1)
Requires-Dist: expandvars (>=0.6.5dataclasses>=0.8typing_extensions>=3.7.4mypy_extensions>=0.4.3)
Requires-Dist: python-dateutil (>=2.8.1)
Requires-Dist: sqlalchemy (>=1.3.24)
Provides-Extra: all
Requires-Dist: elasticsearch (~=7.13.1) ; extra == 'all'
Requires-Dist: python-jose (==3.3.0) ; extra == 'all'
Requires-Dist: faker (~=8.1.1) ; extra == 'all'
Requires-Dist: google-auth (>=1.33.0) ; extra == 'all'
Requires-Dist: sasl (==0.3.1) ; extra == 'all'
Requires-Dist: sqlalchemy-redshift ; extra == 'all'
Requires-Dist: wheel (~=0.36.2) ; extra == 'all'
Requires-Dist: psycopg2-binary ; extra == 'all'
Requires-Dist: pandas (~=1.3.1) ; extra == 'all'
Requires-Dist: commonregex ; extra == 'all'
Requires-Dist: sql-metadata (~=2.0.0) ; extra == 'all'
Requires-Dist: requests (~=2.25.1) ; extra == 'all'
Requires-Dist: sqlalchemy-pytds (>=0.3) ; extra == 'all'
Requires-Dist: pydantic[email] (>=1.7.2) ; extra == 'all'
Requires-Dist: PyAthena[sqlalchemy] ; extra == 'all'
Requires-Dist: google (>=3.0.0) ; extra == 'all'
Requires-Dist: click (<7.2.0,>=7.1.1) ; extra == 'all'
Requires-Dist: python-dateutil (>=2.8.1) ; extra == 'all'
Requires-Dist: snowflake-sqlalchemy (<=1.2.4) ; extra == 'all'
Requires-Dist: pybigquery (>=0.6.0) ; extra == 'all'
Requires-Dist: pyhive (~=0.6.3) ; extra == 'all'
Requires-Dist: elasticsearch (<8.0.0,>=7.0.0) ; extra == 'all'
Requires-Dist: cx-Oracle ; extra == 'all'
Requires-Dist: thrift (~=0.13.0) ; extra == 'all'
Requires-Dist: typing-inspect ; extra == 'all'
Requires-Dist: pymysql (>=1.0.2) ; extra == 'all'
Requires-Dist: spacy (==3.0.5) ; extra == 'all'
Requires-Dist: okta (==1.7.0) ; extra == 'all'
Requires-Dist: thrift-sasl (==0.4.3) ; extra == 'all'
Requires-Dist: google-cloud-logging ; extra == 'all'
Requires-Dist: pyodbc ; extra == 'all'
Requires-Dist: email-validator (>=1.0.3) ; extra == 'all'
Requires-Dist: pydantic (~=1.7.4) ; extra == 'all'
Requires-Dist: cachetools ; extra == 'all'
Requires-Dist: requests (>=2.25.1) ; extra == 'all'
Requires-Dist: expandvars (>=0.6.5dataclasses>=0.8typing_extensions>=3.7.4mypy_extensions>=0.4.3) ; extra == 'all'
Requires-Dist: ldap3 (==2.9.1) ; extra == 'all'
Requires-Dist: GeoAlchemy2 ; extra == 'all'
Requires-Dist: sqlalchemy (>=1.3.24) ; extra == 'all'
Provides-Extra: athena
Requires-Dist: PyAthena[sqlalchemy] ; extra == 'athena'
Provides-Extra: base
Requires-Dist: python-jose (==3.3.0) ; extra == 'base'
Requires-Dist: google-auth (>=1.33.0) ; extra == 'base'
Requires-Dist: wheel (~=0.36.2) ; extra == 'base'
Requires-Dist: pandas (~=1.3.1) ; extra == 'base'
Requires-Dist: commonregex ; extra == 'base'
Requires-Dist: sql-metadata (~=2.0.0) ; extra == 'base'
Requires-Dist: requests (~=2.25.1) ; extra == 'base'
Requires-Dist: pydantic[email] (>=1.7.2) ; extra == 'base'
Requires-Dist: google (>=3.0.0) ; extra == 'base'
Requires-Dist: click (<7.2.0,>=7.1.1) ; extra == 'base'
Requires-Dist: elasticsearch (<8.0.0,>=7.0.0) ; extra == 'base'
Requires-Dist: typing-inspect ; extra == 'base'
Requires-Dist: spacy (==3.0.5) ; extra == 'base'
Requires-Dist: okta (==1.7.0) ; extra == 'base'
Requires-Dist: email-validator (>=1.0.3) ; extra == 'base'
Requires-Dist: pydantic (~=1.7.4) ; extra == 'base'
Requires-Dist: requests (>=2.25.1) ; extra == 'base'
Requires-Dist: expandvars (>=0.6.5dataclasses>=0.8typing_extensions>=3.7.4mypy_extensions>=0.4.3) ; extra == 'base'
Requires-Dist: python-dateutil (>=2.8.1) ; extra == 'base'
Requires-Dist: sqlalchemy (>=1.3.24) ; extra == 'base'
Provides-Extra: bigquery
Requires-Dist: pybigquery (>=0.6.0) ; extra == 'bigquery'
Provides-Extra: bigquery-usage
Requires-Dist: google-cloud-logging ; extra == 'bigquery-usage'
Requires-Dist: cachetools ; extra == 'bigquery-usage'
Provides-Extra: elasticsearch
Requires-Dist: elasticsearch (~=7.13.1) ; extra == 'elasticsearch'
Provides-Extra: hive
Requires-Dist: thrift (~=0.13.0) ; extra == 'hive'
Requires-Dist: sasl (==0.3.1) ; extra == 'hive'
Requires-Dist: thrift-sasl (==0.4.3) ; extra == 'hive'
Requires-Dist: pyhive (~=0.6.3) ; extra == 'hive'
Provides-Extra: ldap-users
Requires-Dist: ldap3 (==2.9.1) ; extra == 'ldap-users'
Provides-Extra: mssql
Requires-Dist: sqlalchemy-pytds (>=0.3) ; extra == 'mssql'
Provides-Extra: mssql-odbc
Requires-Dist: pyodbc ; extra == 'mssql-odbc'
Provides-Extra: mysql
Requires-Dist: pymysql (>=1.0.2) ; extra == 'mysql'
Provides-Extra: oracle
Requires-Dist: cx-Oracle ; extra == 'oracle'
Provides-Extra: postgres
Requires-Dist: pymysql (>=1.0.2) ; extra == 'postgres'
Requires-Dist: GeoAlchemy2 ; extra == 'postgres'
Requires-Dist: psycopg2-binary ; extra == 'postgres'
Provides-Extra: redshift
Requires-Dist: sqlalchemy-redshift ; extra == 'redshift'
Requires-Dist: GeoAlchemy2 ; extra == 'redshift'
Requires-Dist: psycopg2-binary ; extra == 'redshift'
Provides-Extra: redshift-usage
Requires-Dist: sqlalchemy-redshift ; extra == 'redshift-usage'
Requires-Dist: GeoAlchemy2 ; extra == 'redshift-usage'
Requires-Dist: psycopg2-binary ; extra == 'redshift-usage'
Provides-Extra: sample-tables
Requires-Dist: faker (~=8.1.1) ; extra == 'sample-tables'
Provides-Extra: snowflake
Requires-Dist: snowflake-sqlalchemy (<=1.2.4) ; extra == 'snowflake'
Provides-Extra: snowflake-usage
Requires-Dist: snowflake-sqlalchemy (<=1.2.4) ; extra == 'snowflake-usage'

---
This guide will help you setup the Ingestion framework and connectors
---

![Python version 3.8+](https://img.shields.io/badge/python-3.8%2B-blue)

OpenMetadata Ingesiton is a simple framework to build connectors and ingest metadata of various systems through OpenMetadata APIs. It could be used in an orchestration framework(e.g. Apache Airflow) to ingest metadata.
**Prerequisites**

- Python &gt;= 3.8.x

### Install From PyPI

```text
python3 -m pip install --upgrade pip wheel setuptools openmetadata-ingestion
python3 -m spacy download en_core_web_sm
```

### Install Ingestion Connector Dependencies

Click here to go to [Ingestion Connector's Documentation](https://docs.open-metadata.org/install/metadata-ingestion)

#### Generate Redshift Data

```text
metadata ingest -c ./pipelines/redshift.json
```

#### Generate Redshift Usage Data

```text
metadata ingest -c ./pipelines/redshift_usage.json
```

#### Generate Sample Tables

```text
metadata ingest -c ./pipelines/sample_tables.json
```

#### Generate Sample Users

```text
metadata ingest -c ./pipelines/sample_users.json
```

#### Ingest MySQL data to Metadata APIs

```text
metadata ingest -c ./pipelines/mysql.json
```

#### Ingest Bigquery data to Metadata APIs

```text
export GOOGLE_APPLICATION_CREDENTIALS="$PWD/pipelines/creds/bigquery-cred.json"
metadata ingest -c ./pipelines/bigquery.json
```

#### Index Metadata into ElasticSearch

#### Run ElasticSearch docker

```text
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.2
```

#### Run ingestion connector

```text
metadata ingest -c ./pipelines/metadata_to_es.json
```


Changelog
=========



