Metadata-Version: 2.1
Name: heps-ds-utils
Version: 0.2.1a0
Summary: A Module to enable Hepsiburada Data Science Team to utilize different tools.
License: MIT
Author: FarukBuldur
Author-email: faruk.buldur@hepsiburada.com
Maintainer: FıratÖncü
Maintainer-email: firat.oncu@hepsiburada.com
Requires-Python: >=3.8,<3.11
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: PyHive (>=0.6.5,<0.7.0)
Requires-Dist: colorama (>=0.4.4,<0.5.0)
Requires-Dist: google-cloud-bigquery[bqstorage,pandas] (>=3.0.1,<4.0.0)
Requires-Dist: pandas (>=1.4.1,<2.0.0)
Requires-Dist: paramiko (>=2.10.3,<3.0.0)
Requires-Dist: sasl (>=0.3.1,<0.4.0); sys_platform == "linux" or sys_platform == "darwin"
Requires-Dist: scp (>=0.14.4,<0.15.0)
Requires-Dist: thrift (>=0.15.0,<0.16.0)
Requires-Dist: thrift-sasl (>=0.4.3,<0.5.0)
Requires-Dist: tqdm (>=4.64.0,<5.0.0)
Description-Content-Type: text/markdown

# Hepsiburada Data Science Utilities

This module includes utilities for Hepsiburada Data Science Team.

Library is available via PyPi. 
Library can be downloaded using pip as follows: `pip install heps-ds-utils`
Existing library can be upgraded using pip as follows: `pip install heps-ds-utils --upgrade`

***
## Available Modules

1. Hive Operations

```python
import os
from heps_ds_utils import HiveOperations

# A connection is needed to be generated in a specific runtime.
# There are 3 ways to set credentials for connection.

# 1) Instance try to set default credentials from Environment Variables.
hive_ds = HiveOperations()
hive_ds.connect_to_hive()

# 2) One can pass credentials to instance initiation to override default.
hive_ds = HiveOperations(HIVE_HOST="XXX", HIVE_PORT="YYY", HIVE_USER="ZZZ", HIVE_PASS="WWW", HADOOP_EDGE_HOST="QQQ")
hive_ds.connect_to_hive()

# 3) One can change any of the credentials after initiation using appropriate attribute.
hive_ds = HiveOperations()
hive_ds.hive_username = 'XXX'
hive_ds.connect_to_hive()

# Execute an SQL query to retrieve data.
# Currently Implemented Types: DataFrame, Numpy Array, Dictionary, List.
SQL_QUERY = "SELECT * FROM {db}.{table}"
data, columns = hive_ds.execute_query(SQL_QUERY, return_type="dataframe", return_columns=False)

# Execute an SQL query to create and insert data into table.
SQL_QUERY = "INSERT INTO .."
hive_ds.create_insert_table(SQL_QUERY)

# Send Files to Hive and Create a Table with the Data.
# Currently DataFrame or Numpy Array can be sent to Hive.
# While sending Numpy Array columns have to be provided.
SQL_QUERY = "INSERT INTO .."
hive_ds.send_files_to_hive("{db}.{table}", data, columns=None)

# Close the connection at the end of the runtime.

hive_ds.disconnect_from_hive()

```

2. BigQuery Operations
