Metadata-Version: 2.1
Name: BanglaSpeech2Text
Version: 0.0.8
Summary: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance.
Home-page: https://github.com/shhossain/BanglaSpeech2Text
Author: Sifat (shhossain)
Author-email: <hossain@gmail.com>
Project-URL: Documentation, https://github.com/shhossain/BanglaSpeech2Text
Project-URL: Source, https://github.com/shhossain/BanglaSpeech2Text
Project-URL: Bug Tracker, https://github.com/shhossain/BanglaSpeech2Text/issues
Keywords: python,speech to text,voice to text,bangla speech to text,bangla speech recognation,whisper model,bangla asr model,offline speech to text,offline bangla speech to text,offline bangla voice recognation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Utilities
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: certifi (==2022.12.7)
Requires-Dist: charset-normalizer (==3.0.1)
Requires-Dist: colorama (==0.4.6)
Requires-Dist: elevate (==0.1.3)
Requires-Dist: filelock (==3.9.0)
Requires-Dist: gitdb (==4.0.10)
Requires-Dist: GitPython (==3.1.30)
Requires-Dist: huggingface-hub (==0.11.1)
Requires-Dist: idna (==3.4)
Requires-Dist: numpy (==1.24.1)
Requires-Dist: packaging (==23.0)
Requires-Dist: pySmartDL (==1.3.4)
Requires-Dist: PyYAML (==6.0)
Requires-Dist: regex (==2022.10.31)
Requires-Dist: requests (==2.28.2)
Requires-Dist: smmap (==5.0.0)
Requires-Dist: SpeechRecognition (==3.9.0)
Requires-Dist: tokenizers (==0.13.2)
Requires-Dist: torch (==1.13.1)
Requires-Dist: tqdm (==4.64.1)
Requires-Dist: transformers (==4.25.1)
Requires-Dist: typing-extensions (==4.4.0)
Requires-Dist: urllib3 (==1.26.14)

# Bangla Speech to Text
BanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance. Transcribe speech to text, convert voice to text and perform speech recognition in python with ease, even without internet connection.

## Installation
```bash
pip install banglaspeech2text
```

## Models
| Model | Size | Best(WER) |
| --- | --- | --- |
| 'tiny' | 100-200 MB | N/A |
| 'base' | 200-300 MB | 46 |
| 'small'| 2-3 GB     | 18 |

__NOTE__: Bigger model have better accuracy but slower inference speed. Smaller wer is better.You can view the models from [here](https://github.com/shhossain/whisper_bangla_models)


## Pre-requisites
- Python 3.6+
- Git
- Git LFS

## Download Git
## Windows
- Download git from [here](https://git-scm.com/download/win)
- Download git lfs from [here](https://git-lfs.github.com/)

__Note__: Must check git lfs is marked during installation. If not, you can install git lfs from [here](https://git-lfs.github.com/)

## Linux
- [Git](https://git-scm.com/download/linux)
- Git LFS
Ubuntu 16.04
```bash
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
```
Ubuntu 18.04 and above
```bash
sudo apt-get install git-lfs
```

## Mac
- [Git](https://git-scm.com/download/mac)
- Git LFS
```bash
brew install git-lfs
```

## Download Git with banglaspeech2text
```bash
from banglaspeech2text.utils.install_packages import install_git_windows, install_git_linux

# for windows
install_git_windows()

# for linux
install_git_linux()
```


## Usage

### Download a model
```python
from banglaspeech2text import Model, available_models

# Download a model
models = available_models()
print(models) # see the available models by diffrent people and diffrent sizes

model = models[0] # select a model
model.download() # download the model
```
### Use with file
```python
from banglaspeech2text import Model, available_models

# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model) # load the model
model.load()

# Use with file
file_name = 'test.wav'
output = model.recognize(file_name)

print(output) # output will be a dict containing text
print(output['text'])
```

### Use with SpeechRecognition
```python
import speech_recognition as sr
from banglaspeech2text import Model, available_models

# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model) # load the model
model.load()


r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    output = model.recognize(audio)

print(output) # output will be a dict containing text
print(output['text'])
```

### Use GPU
```python
import speech_recognition as sr
from banglaspeech2text import Model, available_models

# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model,device="gpu") # load the model
model.load()


r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    output = model.recognize(audio)

print(output) # output will be a dict containing text
print(output['text'])
```
__NOTE__: This package uses torch as backend. So, you can use any device supported by torch. For more information, see [here](https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.device). But you need to setup torch for gpu first from [here](https://pytorch.org/get-started/locally/).


### Some Methods
```python
from banglaspeech2text import Model, available_models

models = available_models()
print(models[0]) # get first model
print(models['base']) # get base models
print(models['whisper_base_bn_sifat']) # get model by name

# set download path
model = Model(model,download_path=r"F:\Code\Python\BanglaSpeech2Text\models") # default is home directory
model.load()

# directly load a model
model = Model('base')
model.load()
```


