Metadata-Version: 2.4
Name: sparrow-python
Version: 0.3.21
Project-URL: homepage, https://github.com/beidongjiedeguang/sparrow
Project-URL: repository, https://github.com/beidongjiedeguang/sparrow
Project-URL: documentation, https://github.com/beidongjiedeguang/sparrow#sparrow_python
Project-URL: Issues, https://github.com/beidongjiedeguang/sparrow/issues
Project-URL: Source, https://github.com/beidongjiedeguang/sparrow
Author-email: kunyuan <beidongjiedeguang@gmail.com>
License-File: LICENSE
Keywords: Machine Learning,cli,cv,nlp
Classifier: Development Status :: 5 - Production/Stable
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Build Tools
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: aiohttp
Requires-Dist: attrs>=22.2.0
Requires-Dist: chevron
Requires-Dist: colour
Requires-Dist: deprecated
Requires-Dist: diff-match-patch
Requires-Dist: fire
Requires-Dist: json5
Requires-Dist: loguru>=0.6.0
Requires-Dist: more-itertools
Requires-Dist: mpire
Requires-Dist: pillow
Requires-Dist: pretty-errors~=1.2.25
Requires-Dist: psutil
Requires-Dist: pyyaml
Requires-Dist: requests
Requires-Dist: rich
Requires-Dist: tabulate
Provides-Extra: cli
Requires-Dist: asciinema; extra == 'cli'
Requires-Dist: docker; extra == 'cli'
Requires-Dist: gitpython; extra == 'cli'
Requires-Dist: httpie; extra == 'cli'
Requires-Dist: objprint; extra == 'cli'
Requires-Dist: orjsonl; extra == 'cli'
Requires-Dist: paramiko; extra == 'cli'
Requires-Dist: schedule; extra == 'cli'
Requires-Dist: twine; extra == 'cli'
Requires-Dist: typer; extra == 'cli'
Requires-Dist: viztracer; extra == 'cli'
Provides-Extra: crawl
Requires-Dist: crawl4ai; extra == 'crawl'
Provides-Extra: dev
Requires-Dist: asciinema; extra == 'dev'
Requires-Dist: black; extra == 'dev'
Requires-Dist: concurrent-log-handler; extra == 'dev'
Requires-Dist: fake-headers; extra == 'dev'
Requires-Dist: faker~=13.0.0; extra == 'dev'
Requires-Dist: fastapi>=0.80.0; extra == 'dev'
Requires-Dist: gitpython; extra == 'dev'
Requires-Dist: gpustat>=1.0.0; extra == 'dev'
Requires-Dist: ordered-set; extra == 'dev'
Requires-Dist: orjson; extra == 'dev'
Requires-Dist: pandas~=1.5.0; extra == 'dev'
Requires-Dist: paramiko; extra == 'dev'
Requires-Dist: pendulum>=2.1.2; extra == 'dev'
Requires-Dist: pre-commit>=2.8; extra == 'dev'
Requires-Dist: psutil>=5.9.2; extra == 'dev'
Requires-Dist: pyahocorasick~=1.4.4; extra == 'dev'
Requires-Dist: pysnooper; extra == 'dev'
Requires-Dist: ray; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Requires-Dist: uvicorn>=0.16.0; extra == 'dev'
Provides-Extra: latex
Requires-Dist: opencv-python-headless<4.3; extra == 'latex'
Requires-Dist: pix2tex[gui]; extra == 'latex'
Provides-Extra: ml
Requires-Dist: fastapi>=0.80.0; extra == 'ml'
Requires-Dist: marisa-trie>=0.7.8; extra == 'ml'
Requires-Dist: orjson; extra == 'ml'
Requires-Dist: pysnooper; extra == 'ml'
Requires-Dist: ray; extra == 'ml'
Requires-Dist: uvicorn>=0.16.0; extra == 'ml'
Provides-Extra: nlp
Requires-Dist: jionlp; extra == 'nlp'
Requires-Dist: levenshtein; extra == 'nlp'
Requires-Dist: nltk; extra == 'nlp'
Requires-Dist: rouge-chinese; extra == 'nlp'
Provides-Extra: other
Requires-Dist: aiortc; extra == 'other'
Requires-Dist: arrayfire; extra == 'other'
Requires-Dist: awkward; extra == 'other'
Requires-Dist: cn2an; extra == 'other'
Requires-Dist: gradio; extra == 'other'
Requires-Dist: grpcio-reflection~=1.46.3; extra == 'other'
Requires-Dist: grpcio-tools~=1.46.3; extra == 'other'
Requires-Dist: grpcio~=1.46.3; extra == 'other'
Requires-Dist: keyborad; extra == 'other'
Requires-Dist: memray; extra == 'other'
Requires-Dist: protobuf~=3.19.1; extra == 'other'
Requires-Dist: pyzmq; extra == 'other'
Requires-Dist: recordclass; extra == 'other'
Requires-Dist: textdistance[extras]; extra == 'other'
Requires-Dist: wordfreq; extra == 'other'
Requires-Dist: zigzag; extra == 'other'
Provides-Extra: prompt
Requires-Dist: streamlit; extra == 'prompt'
Requires-Dist: streamlit-ace; extra == 'prompt'
Provides-Extra: test
Requires-Dist: openpyxl; extra == 'test'
Requires-Dist: pandas; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: scikit-learn; extra == 'test'
Provides-Extra: torch
Requires-Dist: bert4torch; extra == 'torch'
Requires-Dist: bertviz; extra == 'torch'
Requires-Dist: datasets; extra == 'torch'
Requires-Dist: einops; extra == 'torch'
Requires-Dist: fairseq; extra == 'torch'
Requires-Dist: koila; extra == 'torch'
Requires-Dist: lightseq; extra == 'torch'
Requires-Dist: orjson; extra == 'torch'
Requires-Dist: pytorch-lightning; extra == 'torch'
Requires-Dist: ray; extra == 'torch'
Requires-Dist: sacremoses; extra == 'torch'
Requires-Dist: seqevae; extra == 'torch'
Requires-Dist: transformers; extra == 'torch'
Requires-Dist: whylogs; extra == 'torch'
Description-Content-Type: text/markdown

# sparrow-python
[![image](https://img.shields.io/badge/Pypi-0.1.7-green.svg)](https://pypi.org/project/sparrow-python)
[![image](https://img.shields.io/badge/python-3.6+-blue.svg)](https://www.python.org/)
[![image](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![image](https://img.shields.io/badge/author-kunyuan-orange.svg?style=flat-square&logo=appveyor)](https://github.com/beidongjiedeguang)


-------------------------
## TODO
- [ ]  from mod_base.cv.image.image_processor import messages_preprocess 添加是否对网络url替换为base64的控制；添加对video切帧的支持

识别下面链接的滚动截图：  
https://sjh.baidu.com/site/dzfmws.cn/da721a31-476d-42ed-aad1-81c2dc3a66a3 


vllm 异步推理示例：

```python
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List
import uvicorn
from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine
from vllm.sampling_params import SamplingParams
import torch

# Define request data model
class RequestData(BaseModel):
    prompts: List[str]
    max_tokens: int = 2048
    temperature: float = 0.7

# Initialize FastAPI app
app = FastAPI()

# Determine device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Initialize AsyncLLMEngine
engine_args = AsyncEngineArgs(
    model="your-model-name",  # Replace with your model name
    dtype="bfloat16",
    gpu_memory_utilization=0.8,
    max_model_len=4096,
    trust_remote_code=True
)
llm_engine = AsyncLLMEngine.from_engine_args(engine_args)

# Define the inference endpoint
@app.post("/predict")
async def generate_text(data: RequestData):
    sampling_params = SamplingParams(
        max_tokens=data.max_tokens,
        temperature=data.temperature
    )
    request_id = "unique_request_id"  # Generate a unique request ID
    results_generator = llm_engine.generate(data.prompts, sampling_params, request_id)
    
    final_output = None
    async for request_output in results_generator:
        final_output = request_output
    
    assert final_output is not None
    text_outputs = [output.text for output in final_output.outputs]
    return {"responses": text_outputs}

# Run the server
if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

```

## 待添加脚本



## Install

```bash
pip install sparrow-python
# Or dev version
pip install sparrow-python[dev]
# Or
pip install -e .
# Or
pip install -e .[dev]
```

## Usage

### Multiprocessing SyncManager

Open server first:

```bash
$ spr start-server
```

The defualt port `50001`.

(Process1) productor:

```python
from sparrow.multiprocess.client import Client

client = Client(port=50001)
client.update_dict({'a': 1, 'b': 2})
```

(Process2) consumer:

```python
from sparrow.multiprocess.client import Client

client = Client(port=50001)
print(client.get_dict_data())

>> > {'a': 1, 'b': 2}
```

### Common tools

- **Kill process by port**

```bash
$ spr kill {port}
```

- **pack & unpack**  
  support archive format: "zip", "tar", "gztar", "bztar", or "xztar".

```bash
$ spr pack pack_dir
```

```bash
$ spr unpack filename extract_dir
```

- **Scaffold**

```bash
$ spr create awosome-project
```

### Some useful functions

> `sparrow.relp`  
> Relative path, which is used to read or save files more easily.

> `sparrow.performance.MeasureTime`  
> For measuring time (including gpu time)

> `sparrow.performance.get_process_memory`  
> Get the memory size occupied by the process

> `sparrow.performance.get_virtual_memory`  
> Get virtual machine memory information

> `sparrow.add_env_path`  
> Add python environment variable (use relative file path)

### Safe logger in `multiprocessing`

```python
from sparrow.log import Logger
import numpy as np

logger = Logger(name='train-log', log_dir='./logs', )
logger.info("hello", "numpy:", np.arange(10))

logger2 = Logger.get_logger('train-log')
print(id(logger2) == id(logger))
>> > True
```
