Metadata-Version: 2.1
Name: OmniSenseVoice
Version: 0.1.2
Summary: OmniSenseVoice
Home-page: https://github.com/lifeiteng/OmniSenseVoice
Download-URL: https://github.com/lifeiteng/OmniSenseVoice/releases
Author: lifeiteng0422@gmail.com
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

# Omni SenseVoice 🚀

## The Ultimate Speech Recognition Solution

Built on [SenseVoice](https://github.com/FunAudioLLM/SenseVoice), Omni SenseVoice is optimized for lightning-fast inference and precise timestamps—giving you a smarter, faster way to handle audio transcription!

## Install

```
pip3 install OmniSenseVoice
```

## Usage

```
omnisense transcribe [OPTIONS] AUDIO_PATH
```

Key Options:

- `--language`: Automatically detect the language or specify (`auto, zh, en, yue, ja, ko`).
- `--textnorm`: Choose whether to apply inverse text normalization (`withitn for inverse normalized` or `woitn for raw`).
- `--device-id`: Run on a specific GPU (default: -1 for CPU).
- `--quantize`: Use a quantized model for faster processing.
- `--help`: Display detailed help information.

## Benchmark

`omnisense benchmark -s -d --num-workers 2 --device-id 0 --batch-size 10 --textnorm woitn --language en benchmark/data/manifests/libritts/libritts_cuts_dev-clean.jsonl`

| Optimize         | test set        | GPU           | WER ⬇️ | RTF ⬇️ | Speed Up 🔥 |
| ---------------- | --------------- | ------------- | ------ | ------ | ----------- |
| onnx             | dev-clean[:100] | NVIDIA L4 GPU | 4.47%  | 0.1200 | 1x          |
| torch            | dev-clean[:100] | NVIDIA L4 GPU | 5.02%  | 0.0022 | 50x         |
| onnx `fix cudnn` | dev-clean[all]  | NVIDIA L4 GPU | 5.60%  | 0.0027 | 50x         |
| torch            | dev-clean[all]  | NVIDIA L4 GPU | 6.39%  | 0.0019 | 50x         |

- `fix cudnn`: `cudnn_conv_algo_search: DEFAULT`
- With Omni SenseVoice, experience up to 50x faster processing without sacrificing accuracy.

```
# LibriTTS
DIR=benchmark/data
lhotse download libritts -p dev-clean benchmark/dataLibriTTS
lhotse prepare libritts -p dev-clean benchmark/data/LibriTTS/LibriTTS benchmark/data/manifests/libritts

lhotse cut simple --force-eager -r benchmark/data/manifests/libritts/libritts_recordings_dev-clean.jsonl.gz \
    -s benchmark/data/manifests/libritts/libritts_supervisions_dev-clean.jsonl.gz \
    benchmark/data/manifests/libritts/libritts_cuts_dev-clean.jsonl

omnisense benchmark -s -d --num-workers 2 --device-id 0 --batch-size 10 -
-textnorm woitn --language en benchmark/data/manifests/libritts/libritts_cuts_dev-clean.jsonl

omnisense benchmark -s --num-workers 4 --device-id 0 --batch-size 16 --textnorm woitn --language en benchmark/data/manifests/libritts/libritts_cuts_dev-clean.jsonl
```

## Contributing 🙌

#### Step 1: Code Formatting

Set up pre-commit hooks:

```
pip install pre-commit==3.6.0
pre-commit install
```

#### Step 2: Pull Request

Submit your awesome improvements through a PR. 😊
