Metadata-Version: 2.4
Name: tarzi
Version: 0.1.3
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: maturin>=1.5,<2.0 ; extra == 'dev'
Requires-Dist: pytest>=7.4,<9 ; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21,<0.25 ; extra == 'dev'
Requires-Dist: pytest-cov>=0.6 ; extra == 'dev'
Requires-Dist: pytest-mock>=0.1.0 ; extra == 'dev'
Requires-Dist: docopt>=0.6.2 ; extra == 'dev'
Requires-Dist: patchelf>=0.17.2.0 ; sys_platform == 'linux' and extra == 'dev'
Requires-Dist: black>=23.12,<25 ; extra == 'dev'
Requires-Dist: ruff>=0.3,<0.6 ; extra == 'dev'
Requires-Dist: isort>=5.13,<6 ; extra == 'dev'
Requires-Dist: autoflake>=2.2,<3 ; extra == 'dev'
Requires-Dist: twine>=4.0,<6 ; extra == 'dev'
Requires-Dist: build>=1.0,<2 ; extra == 'dev'
Requires-Dist: sphinx>=6.0.0 ; extra == 'docs'
Requires-Dist: sphinx-copybutton>=0.5.2 ; extra == 'docs'
Requires-Dist: myst-parser>=2.0.0 ; extra == 'docs'
Requires-Dist: sphinx-tabs>=3.4.1 ; extra == 'docs'
Requires-Dist: sphinx-design>=0.5.0 ; extra == 'docs'
Requires-Dist: furo>=2023.9.10 ; extra == 'docs'
Requires-Dist: sphinx-autoapi>=3.0.0 ; extra == 'docs'
Provides-Extra: dev
Provides-Extra: docs
License-File: LICENSE
Summary: Rust-native lite search for AI applications
Keywords: web-scraping,search-engine,ai-tools,rust,browser-automation
Author-email: Xiaming Chen <chenxm35@gmail.com>
Maintainer-email: Xiaming Chen <chenxm35@gmail.com>
License: Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/mirasurf/tarzi
Project-URL: Documentation, https://tarzi.readthedocs.io/
Project-URL: Repository, https://github.com/mirasurf/tarzi

<div align="center">
  <img src="https://github.com/mirasurf/tarzi/blob/4e751f8d389c0ac7f2061afa9286d2d7fa551aaf/static/tarzi-320.png" alt="Tarzi Logo" width="200" height="200">
</div>
<h1 align="center">tarzi</h1>  
<p align="center">
  <a href="https://crates.io/crates/tarzi">
    <img src="https://img.shields.io/crates/v/tarzi.svg?style=flat-square" alt="Crate Version" />
  </a>
  <a href="https://pypi.org/project/tarzi/">
    <img src="https://img.shields.io/pypi/v/tarzi.svg?style=flat-square" alt="PyPI Version" />
  </a>
  <!-- CI and Docs -->
  <a href="https://github.com/mirasurf/tarzi/actions/workflows/rust-ci.yml">
    <img src="https://github.com/mirasurf/tarzi/actions/workflows/rust-ci.yml/badge.svg" alt="Rust CI" />
  </a>
  <a href="https://github.com/mirasurf/tarzi/actions/workflows/python-ci.yml">
    <img src="https://github.com/mirasurf/tarzi/actions/workflows/python-ci.yml/badge.svg" alt="Python CI" />
  </a>
</p>
<p align="center">
  <!-- License -->
  <a href="https://www.apache.org/licenses/LICENSE-2.0">
    <img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg?style=flat-square" alt="License" />
  </a>
  <!-- X (formerly Twitter) -->
  <a href="https://x.com/mirasurf_ai">
    <img src="https://img.shields.io/twitter/follow/mirasurf_ai?label=@mirasurf_ai&style=flat-square" alt="X Follow" />
  </a>
</p>

> **⚠️ Current Limitation**: Currently only search engines without anti-bot protection are workable, such as DuckDuckGo and Brave. We require advanced features to bypass anti-bot measures for other engines like Google and Bing.

## 🐒 Tarzi

**Tarzi** is a unified search interface designed for **Retrieval-Augmented Generation (RAG)** and **agentic systems** built on large language models. Search is a core functionality in these systems, yet most search engine providers (SEPs) impose API paywalls or strict rate limits. **Tarzi**, empowered by browser automation and web crawling technologies, removes these barriers by supporting token-free queries across multiple search engines. With a single dependency, you can integrate and switch between different SEPs as needed—seamlessly and efficiently.

<div align="center">
  <img src="static/tariz-workflow.png" alt="Tarzi Logo" width="100%">
</div>

## ⚙️ Core Capabilities

- 🦀 **Dual Implementation**: Native Rust library and Python wrapper with CLI tools
- 🔄 **Content Conversion**: Convert raw HTML into Markdown, JSON, or YAML, which is ready for LLMs
- 🔍 **Search Integration**: Fetch fully rendered result pages with a unified interface for token-free headless browser mode
- 🧠 **Multi-Engine Support**: Works with Bing, Google, DuckDuckGo, Baidu etc.
- 🛡️ **Proxy Support**: Bypass network bans using proxy support to access global SEPs
- 🚀 **End-to-End Workflow**: Full pipeline from search to content extraction for AI and automation use cases

## 🧪 Advanced Features (Under dev.)

- 🕵️‍♂️ **Anti-Bot Evasion**: Use fingerprint spoofing, proxy rotation, and human-like actions to avoid detection  
- 🧠 **Smarter Queries**: Improve search results with prompt rewriting and intent-aware queries 
- 🔗 **Workflow Automation**: Chain steps like search, click, form fill, and scraping into automated flows  

## Install

```
pip install tarzi
```

## CLI Commands

Tarzi provides two command-line interfaces:

- **`tarzi`**: Native Rust CLI (faster, more efficient)
- **`pytarzi`**: Python CLI (easier to extend, same functionality)

Both CLIs support the same commands and configuration precedence.

## Usage Examples

* Examples in Python and Rust: [examples](/examples/)

## Alternatives

* LangChain [PlayWrightBrowserToolkit](https://python.langchain.com/docs/integrations/tools/playwright/)

## Contributors

Thank you ❤ all human and non-human contributors.

[![tarzi contributors](https://contrib.rocks/image?repo=mirasurf/tarzi "tarzi contributors")](https://github.com/mirasurf/tarzi/graphs/contributors)

