Metadata-Version: 2.1
Name: ibis-framework
Version: 9.0.0.dev686
Summary: The portable Python dataframe library
Home-page: https://ibis-project.org
License: Apache-2.0
Author: Ibis Maintainers
Author-email: maintainers@ibis-project.org
Maintainer: Ibis Maintainers
Maintainer-email: maintainers@ibis-project.org
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: SQL
Classifier: Topic :: Database :: Front-Ends
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: User Interfaces
Provides-Extra: bigquery
Provides-Extra: clickhouse
Provides-Extra: dask
Provides-Extra: datafusion
Provides-Extra: decompiler
Provides-Extra: deltalake
Provides-Extra: druid
Provides-Extra: duckdb
Provides-Extra: examples
Provides-Extra: exasol
Provides-Extra: flink
Provides-Extra: geospatial
Provides-Extra: impala
Provides-Extra: mssql
Provides-Extra: mysql
Provides-Extra: oracle
Provides-Extra: pandas
Provides-Extra: polars
Provides-Extra: postgres
Provides-Extra: pyspark
Provides-Extra: risingwave
Provides-Extra: snowflake
Provides-Extra: sqlite
Provides-Extra: trino
Provides-Extra: visualization
Requires-Dist: atpublic (>=2.3,<5)
Requires-Dist: bidict (>=0.22.1,<1)
Requires-Dist: black (>=22.1.0,<25) ; extra == "decompiler"
Requires-Dist: clickhouse-connect[arrow,numpy,pandas] (>=0.5.23,<1) ; extra == "clickhouse"
Requires-Dist: dask[array,dataframe] (>=2022.9.1,<2024.3.0) ; extra == "dask"
Requires-Dist: datafusion (>=0.6,<37) ; extra == "datafusion"
Requires-Dist: db-dtypes (>=0.3,<2) ; extra == "bigquery"
Requires-Dist: deltalake (>=0.9.0,<1) ; extra == "deltalake"
Requires-Dist: duckdb (>=0.8.1,<1) ; extra == "duckdb"
Requires-Dist: fsspec (<2024.3.0) ; extra == "examples"
Requires-Dist: geopandas (>=0.6,<1) ; extra == "geospatial"
Requires-Dist: google-cloud-bigquery (>=3,<4) ; extra == "bigquery"
Requires-Dist: google-cloud-bigquery-storage (>=2,<3) ; extra == "bigquery"
Requires-Dist: graphviz (>=0.16,<1) ; extra == "visualization"
Requires-Dist: impyla (>=0.17,<1) ; extra == "impala"
Requires-Dist: numpy (>=1.23.2,<2)
Requires-Dist: oracledb (>=1.3.1,<3) ; extra == "oracle"
Requires-Dist: packaging (>=21.3,<25) ; extra == "dask" or extra == "oracle" or extra == "pandas" or extra == "polars" or extra == "pyspark"
Requires-Dist: pandas (>=1.5.3,<3)
Requires-Dist: parsy (>=2,<3)
Requires-Dist: pins[gcs] (>=0.8.3,<1) ; extra == "examples"
Requires-Dist: polars (>=0.20.17,<1) ; extra == "polars"
Requires-Dist: psycopg2 (>=2.8.4,<3) ; extra == "postgres" or extra == "risingwave"
Requires-Dist: pyarrow (>=10.0.1,<17)
Requires-Dist: pyarrow-hotfix (>=0.4,<1)
Requires-Dist: pydata-google-auth (>=1.4.0,<2) ; extra == "bigquery"
Requires-Dist: pydruid[sqlalchemy] (>=0.6.5,<1) ; extra == "druid"
Requires-Dist: pyexasol[pandas] (>=0.25.2,<1) ; extra == "exasol"
Requires-Dist: pymysql (>=1,<2) ; extra == "mysql"
Requires-Dist: pyodbc (>=4.0.39,<6) ; extra == "mssql"
Requires-Dist: pyspark (>=3,<4) ; extra == "pyspark"
Requires-Dist: python-dateutil (>=2.8.2,<3)
Requires-Dist: pytz (>=2022.7)
Requires-Dist: regex (>=2021.7.6) ; extra == "dask" or extra == "pandas" or extra == "sqlite"
Requires-Dist: rich (>=12.4.4,<14)
Requires-Dist: shapely (>=2,<3) ; extra == "geospatial"
Requires-Dist: snowflake-connector-python (>=3.0.2,<4,!=3.3.0b1) ; extra == "snowflake"
Requires-Dist: sqlglot (>=23.4,<23.13)
Requires-Dist: toolz (>=0.11,<1)
Requires-Dist: trino (>=0.321,<1) ; extra == "trino"
Requires-Dist: typing-extensions (>=4.3.0,<5)
Project-URL: Documentation, https://ibis-project.org
Project-URL: Issue Tracker, https://github.com/ibis-project/ibis/issues
Project-URL: Repository, https://github.com/ibis-project/ibis
Description-Content-Type: text/markdown

# Ibis

[![Documentation status](https://img.shields.io/badge/docs-docs.ibis--project.org-blue.svg)](http://ibis-project.org)
[![Project chat](https://img.shields.io/badge/zulip-join_chat-purple.svg?logo=zulip)](https://ibis-project.zulipchat.com)
[![Anaconda badge](https://anaconda.org/conda-forge/ibis-framework/badges/version.svg)](https://anaconda.org/conda-forge/ibis-framework)
[![PyPI](https://img.shields.io/pypi/v/ibis-framework.svg)](https://pypi.org/project/ibis-framework)
[![Build status](https://github.com/ibis-project/ibis/actions/workflows/ibis-main.yml/badge.svg)](https://github.com/ibis-project/ibis/actions/workflows/ibis-main.yml?query=branch%3Amain)
[![Build status](https://github.com/ibis-project/ibis/actions/workflows/ibis-backends.yml/badge.svg)](https://github.com/ibis-project/ibis/actions/workflows/ibis-backends.yml?query=branch%3Amain)
[![Codecov branch](https://img.shields.io/codecov/c/github/ibis-project/ibis/main.svg)](https://codecov.io/gh/ibis-project/ibis)

## What is Ibis?

Ibis is the portable Python dataframe library:

- Fast local dataframes (via DuckDB by default)
- Lazy dataframe expressions
- Interactive mode for iterative data exploration
- [Compose Python dataframe and SQL code](#python--sql-better-together)
- Use the same dataframe API for [20+ backends](#backends)
- Iterate locally and deploy remotely by [changing a single line of code](#portability)

See the documentation on ["Why Ibis?"](https://ibis-project.org/why) to learn more.

## Getting started

You can `pip install` Ibis with a backend and example data:

```bash
pip install 'ibis-framework[duckdb,examples]'
```

> [!TIP]
> See the [installation guide](https://ibis-project.org/install) for more installation options.

Then use Ibis:

```python
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │               181 │        3750 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │               186 │        3800 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │               195 │        3250 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │               193 │        3450 │ female │  2007 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │               190 │        3650 │ male   │  2007 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │               181 │        3625 │ female │  2007 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │               195 │        4675 │ male   │  2007 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │               193 │        3475 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │               190 │        4250 │ NULL   │  2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> g = t.group_by(["species", "island"]).agg(count=t.count()).order_by("count")
>>> g
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species   ┃ island    ┃ count ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string    │ string    │ int64 │
├───────────┼───────────┼───────┤
│ Adelie    │ Biscoe    │    44 │
│ Adelie    │ Torgersen │    52 │
│ Adelie    │ Dream     │    56 │
│ Chinstrap │ Dream     │    68 │
│ Gentoo    │ Biscoe    │   124 │
└───────────┴───────────┴───────┘
```

> [!TIP]
> See the [getting started tutorial](https://ibis-project.org/tutorials/getting_started) for a full introduction to Ibis.

## Python + SQL: better together

For most backends, Ibis works by compiling its dataframe expressions into SQL:

```python
>>> ibis.to_sql(g)
SELECT
  "t1"."species",
  "t1"."island",
  "t1"."count"
FROM (
  SELECT
    "t0"."species",
    "t0"."island",
    COUNT(*) AS "count"
  FROM "penguins" AS "t0"
  GROUP BY
    1,
    2
) AS "t1"
ORDER BY
  "t1"."count" ASC
```

You can mix SQL and Python code:

```python
>>> a = t.sql("SELECT species, island, count(*) AS count FROM penguins GROUP BY 1, 2")
>>> a
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species   ┃ island    ┃ count ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string    │ string    │ int64 │
├───────────┼───────────┼───────┤
│ Adelie    │ Torgersen │    52 │
│ Adelie    │ Biscoe    │    44 │
│ Adelie    │ Dream     │    56 │
│ Gentoo    │ Biscoe    │   124 │
│ Chinstrap │ Dream     │    68 │
└───────────┴───────────┴───────┘
>>> b = a.order_by("count")
>>> b
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species   ┃ island    ┃ count ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string    │ string    │ int64 │
├───────────┼───────────┼───────┤
│ Adelie    │ Biscoe    │    44 │
│ Adelie    │ Torgersen │    52 │
│ Adelie    │ Dream     │    56 │
│ Chinstrap │ Dream     │    68 │
│ Gentoo    │ Biscoe    │   124 │
└───────────┴───────────┴───────┘
```

This allows you to combine the flexibility of Python with the scale and performance of modern SQL.

## Backends

Ibis supports 20+ backends:

- [Apache DataFusion](https://ibis-project.org/backends/datafusion/)
- [Apache Druid](https://ibis-project.org/backends/druid/)
- [Apache Flink](https://ibis-project.org/backends/flink)
- [Apache Impala](https://ibis-project.org/backends/impala/)
- [Apache PySpark](https://ibis-project.org/backends/pyspark/)
- [BigQuery](https://ibis-project.org/backends/bigquery/)
- [ClickHouse](https://ibis-project.org/backends/clickhouse/)
- [Dask](https://ibis-project.org/backends/dask/)
- [DuckDB](https://ibis-project.org/backends/duckdb/)
- [Exasol](https://ibis-project.org/backends/exasol)
- [MySQL](https://ibis-project.org/backends/mysql/)
- [Oracle](https://ibis-project.org/backends/oracle/)
- [pandas](https://ibis-project.org/backends/pandas/)
- [Polars](https://ibis-project.org/backends/polars/)
- [PostgreSQL](https://ibis-project.org/backends/postgresql/)
- [RisingWave](https://ibis-project.org/backends/risingwave/)
- [SQL Server](https://ibis-project.org/backends/mssql/)
- [SQLite](https://ibis-project.org/backends/sqlite/)
- [Snowflake](https://ibis-project.org/backends/snowflake)
- [Trino](https://ibis-project.org/backends/trino/)

## How it works

Most Python dataframes are tightly coupled to their execution engine. And many databases only support SQL, with no Python API. Ibis solves this problem by providing a common API for data manipulation in Python, and compiling that API into the backend’s native language. This means you can learn a single API and use it across any supported backend (execution engine).

Ibis supports three types of backend:

1. SQL-generating backends
2. Expression-generating backends
3. Naïve execution backends

![Ibis backend types](docs/images/backends.png)

## Portability

To use different backends, you can set the backend Ibis uses:

```python
>>> ibis.set_backend("duckdb")
>>> ibis.set_backend("polars")
>>> ibis.set_backend("datafusion")
```

Typically, you'll create a connection object:

```python
>>> con = ibis.duckdb.connect()
>>> con = ibis.polars.connect()
>>> con = ibis.datafusion.connect()
```

And work with tables in that backend:

```python
>>> con.list_tables()
['penguins']
>>> t = con.table("penguins")
```

You can also read from common file formats like CSV or Apache Parquet:

```python
>>> t = con.read_csv("penguins.csv")
>>> t = con.read_parquet("penguins.parquet")
```

This allows you to iterate locally and deploy remotely by changing a single line of code.

> [!TIP]
> Check out [the blog on backend agnostic arrays](https://ibis-project.org/posts/backend-agnostic-arrays/) for one example using the same code across DuckDB and BigQuery.

## Community and contributing

Ibis is an open source project and welcomes contributions from anyone in the community.

- Read [the contributing guide](https://github.com/ibis-project/ibis/blob/main/docs/CONTRIBUTING.md).
- We care about keeping the community welcoming for all. Check out [the code of conduct](https://github.com/ibis-project/ibis/blob/main/docs/CODE_OF_CONDUCT.md).
- The Ibis project is open sourced under the [Apache License](https://github.com/ibis-project/ibis/blob/main/LICENSE.txt).

Join our community by interacting on GitHub or chatting with us on [Zulip](https://ibis-project.zulipchat.com/).

For more information visit https://ibis-project.org/.

