Metadata-Version: 2.1
Name: llama-cpp-cffi
Version: 0.0.4
Summary: Python binding for llama.cpp using cffi
Home-page: https://github.com/mtasic85/llama-cpp-cffi
License: MIT
Author: Marko Tasic
Author-email: mtasic85@gmail.com
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: attrs (>=23.2.0,<24.0.0)
Requires-Dist: cffi (>=1.16.0,<2.0.0)
Requires-Dist: huggingface-hub (>=0.23.4,<0.24.0)
Requires-Dist: jinja2 (>=3.1.4,<4.0.0)
Requires-Dist: protobuf (>=5.27.2,<6.0.0)
Requires-Dist: psutil (>=6.0.0,<7.0.0)
Requires-Dist: sentencepiece (>=0.2.0,<0.3.0)
Requires-Dist: setuptools (>=70.2.0,<71.0.0)
Requires-Dist: transformers (>=4.42.4,<5.0.0)
Project-URL: Repository, https://github.com/mtasic85/llama-cpp-cffi
Description-Content-Type: text/markdown

# llama-cpp-cffi

<!--
[![Build][build-image]]()
[![Status][status-image]][pypi-project-url]
[![Stable Version][stable-ver-image]][pypi-project-url]
[![Coverage][coverage-image]]()
[![Python][python-ver-image]][pypi-project-url]
[![License][mit-image]][mit-url]
-->
[![Downloads](https://img.shields.io/pypi/dm/llama-cli-cffi)](https://pypistats.org/packages/llama-cli-cffi)
[![Supported Versions](https://img.shields.io/pypi/pyversions/llama-cli-cffi)](https://pypi.org/project/llama-cli-cffi)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)

**Python** binding for [llama.cpp](https://github.com/ggerganov/llama.cpp) using **cffi** and **ctypes**. Supports **CPU** and **CUDA 12.5** execution.

## Install

```bash
pip install llama-cli-cffi
```

## Example

```python
from llama.llama_cli_cffi_cpu import llama_generate, Model, Options
# from llama.llama_cli_cffi_cuda_12_5 import llama_generate, Model, Options
# from llama.llama_cli_ctypes_cuda import llama_generate, Model, Options
# from llama.llama_cli_ctypes_cuda_12_5 import llama_generate, Model, Options

from llama.formatter import get_config

model = Model(
    'TinyLlama/TinyLlama-1.1B-Chat-v1.0',
    'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
    'tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)

config = get_config(model.creator_hf_repo)

messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]

options = Options(
    ctx_size=config.max_position_embeddings,
    predict=-2,
    model=model,
    prompt=messages,
)

for chunk in llama_generate(options):
    print(chunk, flush=True, end='')

# newline
print()
```

## Demos

```BASH
#
# run demos
#
python -B examples/demo_cffi_cpu.py
python -B examples/demo_cffi_cuda_12_5.py

python -B examples/demo_ctypes_cpu.py
python -B examples/demo_ctypes_cuda_12_5.py

# python -m http.server -d examples/demo_pyonide -b "0.0.0.0" 5000
```

