Metadata-Version: 2.1
Name: dvcx
Version: 0.67.0
Summary: DQL
Author-email: Dmitry Petrov <support@dvc.org>
License: Apache-2.0
Project-URL: Issues, https://github.com/iterative/dql/issues
Project-URL: Source, https://github.com/iterative/dql
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Development Status :: 2 - Pre-Alpha
Requires-Python: >=3.8
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: pyyaml
Requires-Dist: tomlkit
Requires-Dist: tqdm
Requires-Dist: python-dateutil >=2
Requires-Dist: attrs >=21.3.0
Requires-Dist: s3fs >=2024.2.0
Requires-Dist: gcsfs >=2024.2.0
Requires-Dist: adlfs >=2024.2.0
Requires-Dist: dvc-data <4,>=3.10
Requires-Dist: dvc-objects <6,>=4
Requires-Dist: shtab <2,>=1.3.4
Requires-Dist: sqlalchemy <1.5,>=1.4.24
Requires-Dist: multiprocess ==0.70.15
Requires-Dist: dill ==0.3.7
Requires-Dist: ujson ==5.9.0
Requires-Dist: types-ujson ==5.9.0.0
Provides-Extra: cv
Requires-Dist: Pillow <11,>=10.0.0 ; extra == 'cv'
Requires-Dist: torch >=2.1.0 ; extra == 'cv'
Requires-Dist: numpy ; extra == 'cv'
Requires-Dist: transformers >=4.36.0 ; extra == 'cv'
Provides-Extra: dev
Requires-Dist: dvcx[tests] ; extra == 'dev'
Requires-Dist: mypy ==1.8.0 ; extra == 'dev'
Requires-Dist: types-python-dateutil ; extra == 'dev'
Requires-Dist: types-PyYAML ; extra == 'dev'
Requires-Dist: types-requests ; extra == 'dev'
Provides-Extra: pandas
Requires-Dist: pandas >=1.4.0 ; extra == 'pandas'
Provides-Extra: remote
Requires-Dist: dvcx[pandas] ; extra == 'remote'
Requires-Dist: lz4 ; extra == 'remote'
Requires-Dist: pyarrow ; extra == 'remote'
Requires-Dist: numpy ; extra == 'remote'
Requires-Dist: msgpack <2,>=1.0.4 ; extra == 'remote'
Requires-Dist: requests >=2.22.0 ; extra == 'remote'
Provides-Extra: tests
Requires-Dist: dvcx[cv,pandas,remote,vector] ; extra == 'tests'
Requires-Dist: pytest <8,>=7 ; extra == 'tests'
Requires-Dist: pytest-sugar >=0.9.6 ; extra == 'tests'
Requires-Dist: pytest-cov >=4.1.0 ; extra == 'tests'
Requires-Dist: pytest-mock >=3.12.0 ; extra == 'tests'
Requires-Dist: pytest-servers[all] >=0.4.0 ; extra == 'tests'
Requires-Dist: pytest-benchmark[histogram] ; extra == 'tests'
Requires-Dist: pytest-asyncio >=0.23.2 ; extra == 'tests'
Requires-Dist: virtualenv ; extra == 'tests'
Requires-Dist: dulwich ; extra == 'tests'
Requires-Dist: hypothesis ; extra == 'tests'
Requires-Dist: numpy ; extra == 'tests'
Requires-Dist: aiotools >=1.7.0 ; extra == 'tests'
Provides-Extra: vector
Requires-Dist: numpy ; extra == 'vector'
Requires-Dist: scipy ; extra == 'vector'

|PyPI| |Status| |Python Version| |License|

|Tests| |Codecov| |pre-commit| |Black|

.. |PyPI| image:: https://img.shields.io/pypi/v/dql.svg
   :target: https://pypi.org/project/dvcx/
   :alt: PyPI
.. |Status| image:: https://img.shields.io/pypi/status/dql.svg
   :target: https://pypi.org/project/dql/
   :alt: Status
.. |Python Version| image:: https://img.shields.io/pypi/pyversions/dql
   :target: https://pypi.org/project/dql
   :alt: Python Version
.. |License| image:: https://img.shields.io/pypi/l/dql
   :target: https://opensource.org/licenses/Apache-2.0
   :alt: License
.. |Tests| image:: https://github.com/iterative/dql/workflows/Tests/badge.svg
   :target: https://github.com/iterative/dql/actions?workflow=Tests
   :alt: Tests
.. |Codecov| image:: https://codecov.io/gh/iterative/dql/branch/main/graph/badge.svg
   :target: https://app.codecov.io/gh/iterative/dql
   :alt: Codecov
.. |pre-commit| image:: https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white
   :target: https://github.com/pre-commit/pre-commit
   :alt: pre-commit
.. |Black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/psf/black
   :alt: Black


What is DVCx?
-------------

DVCx is a Python data manipulation library designed to work with unstructured AI datasets.
It provides a dataframe-like interface which can automatically reference data stored as files
(text, images, video) locally or in the cloud.

Why use DVCx?
-------------

1. **Storage as a single source of truth.** DVCx can organize unstructured data from storages
   and datalakes (local files, S3, GCS, Azure ADLS) into overlapping datasets without
   unnecessary file copies.
2. **Compute**. DVCx supports local parallelization and external compute workers for efficient
   data processing and AI metadata creation.
3. **Large scale.** In contrast to in-memory frameworks (like Pandas data frame), DVCx can work
   with datasets of millions and billions of records by using out-of-memory algorithms.
4. **Persistence and versioning**. Your datasets, your computed metadata, and paid API call
   results remain versioned and reusable.


Installation
------------

You can install *DQL* via pip_ from PyPI_:

.. code:: console

   $ pip install dvcx


Usage
-----
DQL can be used as a CLI (from system terminal), or as a Python library.

TODO: CLI usage

To use it from Python code, import class ``dql.catalog.Catalog``, which provides methods for all the same commands above, like ``ls()``, ``get()``, ``find()``, ``du()`` and ``index()``.

.. code:: py

    from dql.catalog import Catalog
    catalog = Catalog()
    catalog.ls(["s3://ldb-public/remote/data-lakes/dogs-and-cats/"], update=True)


How it’s related to DVC?
------------------------

`DVC <https://github.com/iterative/dvc/>`_ is an ML framework that helps connecting
unstructured data to ML models through pipelines to ensure reproducibility. DVCX,
created by DVC team, designed to handle the data preparation phase, thus functioning
upstream from DVC in the data management process.

Contributing
------------

Contributions are very welcome.
To learn more, see the `Contributor Guide`_.


License
-------

Distributed under the terms of the `Apache 2.0 license`_,
*DQL* is free and open source software.


Issues
------

If you encounter any problems,
please `file an issue`_ along with a detailed description.


.. _Apache 2.0 license: https://opensource.org/licenses/Apache-2.0
.. _PyPI: https://pypi.org/
.. _file an issue: https://github.com/iterative/dql/issues
.. _pip: https://pip.pypa.io/
.. github-only
.. _Contributor Guide: CONTRIBUTING.rst
