Metadata-Version: 2.4
Name: kalakan-tts
Version: 1.0.1
Summary: A fully custom, offline Text-to-Speech system for the Twi language
Home-page: https://github.com/kalakan-ai/kalakan-tts
Author: Kalakan Team
Author-email: Kalakan TTS Team <calculus069@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/Odeneho-Calculus/kanakan-TTS
Project-URL: Documentation, https://github.com/Odeneho-Calculus/kanakan-TTS/tree/main/docs
Project-URL: Repository, https://github.com/Odeneho-Calculus/kanakan-TTS
Project-URL: Issues, https://github.com/Odeneho-Calculus/kanakan-TTS/issues
Project-URL: Changelog, https://github.com/Odeneho-Calculus/kanakan-TTS/releases
Keywords: tts,text-to-speech,twi,akan,pytorch,offline
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0.0
Requires-Dist: torchaudio>=2.0.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: librosa>=0.10.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: tqdm>=4.64.0
Requires-Dist: matplotlib>=3.5.0
Requires-Dist: tensorboard>=2.10.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Provides-Extra: api
Requires-Dist: fastapi>=0.100.0; extra == "api"
Requires-Dist: uvicorn>=0.23.0; extra == "api"
Requires-Dist: grpcio>=1.56.0; extra == "api"
Requires-Dist: websockets>=11.0.0; extra == "api"
Provides-Extra: training
Requires-Dist: wandb>=0.15.0; extra == "training"
Requires-Dist: mlflow>=2.5.0; extra == "training"
Requires-Dist: optuna>=3.3.0; extra == "training"
Requires-Dist: torchmetrics>=1.0.0; extra == "training"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Kalakan TTS

A fully custom, offline Text-to-Speech system for the Twi language.

## Features

- Specialized for Twi language with proper handling of special characters (ɛ, ɔ, Ɔ, tonal markers)
- Complete text processing pipeline with Twi-specific tokenization and phoneme mapping
- High-quality acoustic models:
  - Tacotron2: Attention-based sequence-to-sequence model
  - FastSpeech2: Non-autoregressive model with explicit duration modeling
  - Transformer-TTS: Transformer-based sequence-to-sequence model
- Multiple vocoders:
  - Griffin-Lim: Simple phase reconstruction algorithm
  - HiFi-GAN: High-fidelity GAN-based vocoder
  - MelGAN: Fast GAN-based vocoder
  - WaveGlow: Flow-based generative model
- Production-ready API with REST and gRPC interfaces
- Comprehensive training infrastructure with experiment tracking and data augmentation
- Advanced dataset preparation with metadata generation, phoneme conversion, and quality control
- Optimized for both cloud and edge deployment

## Installation

### Basic Installation

```bash
pip install kalakan-tts
```

### Development Installation

```bash
git clone https://github.com/kalakan-ai/kalakan-tts.git
cd kalakan-tts
pip install -e ".[dev,api,training]"
```

## Quick Start

### Text-to-Speech Synthesis

```python
from kalakan.synthesis.synthesizer import Synthesizer

# Initialize the synthesizer with specific models
synthesizer = Synthesizer(
    acoustic_model="/path/to/acoustic_model.pt",  # or use a model name like "tacotron2", "fastspeech2", "transformer_tts"
    vocoder="/path/to/vocoder.pt",  # or use a model name like "griffin_lim", "hifigan", "melgan", "waveglow"
    device="cuda:0",  # or "cpu"
)

# Generate audio from Twi text
audio = synthesizer.synthesize(
    text="Akwaaba! Wo ho te sɛn?",
    normalize=True,
    clean=True,
    speed=1.0,  # Control speech speed (for FastSpeech2)
    pitch=1.0,  # Control pitch (for FastSpeech2)
    energy=1.0,  # Control energy/volume (for FastSpeech2)
)

# Save the audio to a file
synthesizer.save_audio(audio, "output.wav")
```

### Command Line Interface

```bash
# Using the demo script
python demo.py --text "Akwaaba! Wo ho te sɛn?" --output output.wav --acoustic_model /path/to/acoustic_model.pt --vocoder /path/to/vocoder.pt

# Start the REST API server
python -m kalakan.api.server --host 0.0.0.0 --port 8000 --acoustic_model /path/to/acoustic_model.pt --vocoder /path/to/vocoder.pt

# Start the gRPC API server
python -m kalakan.api.grpc_api --host 0.0.0.0 --port 50051 --acoustic_model /path/to/acoustic_model.pt --vocoder /path/to/vocoder.pt
```

### API Clients

```bash
# REST API client
python rest_client.py --text "Akwaaba! Wo ho te sɛn?" --output output.wav --host localhost --port 8000

# gRPC API client
python grpc_client.py --text "Akwaaba! Wo ho te sɛn?" --output output.wav --host localhost --port 50051
```

## Dataset Preparation

Kalakan TTS includes a comprehensive metadata generation tool for preparing TTS datasets:

### Basic Metadata Generation

```bash
# Generate basic metadata for a dataset
kalakan gen-metadata --input-dir /path/to/dataset --verbose
```

### Advanced Features

```bash
# Generate metadata with phonemes, dataset splitting, and quality control
kalakan gen-metadata \
    --input-dir /path/to/dataset \
    --generate-phonemes \
    --split-dataset \
    --val-ratio 0.2 \
    --test-ratio 0.1 \
    --output-format both \
    --include-stats \
    --check-duplicates \
    --remove-duplicates \
    --normalize-text \
    --clean-text \
    --speaker-id "speaker_001" \
    --verbose
```

### Key Features

- **Phoneme Generation**: Automatic G2P conversion for Twi text
- **Text Normalization**: Cleaning and standardization of text content
- **Dataset Splitting**: Automatic train/validation/test splits
- **Quality Control**: Duration filtering, duplicate detection, and validation
- **Multiple Formats**: CSV and JSON output support
- **Comprehensive Statistics**: Detailed dataset analytics

For detailed documentation, see [docs/metadata_generation_guide.md](docs/metadata_generation_guide.md).

## Text Normalization

Kalakan TTS includes a powerful text normalization tool for preparing Twi text:

### Basic Text Normalization

```bash
# Normalize Twi text directly
kalakan norm --text "Dr. Kwame na Prof. Ama bɛba ha 25 mu."
```

### File Processing

```bash
# Normalize text from files with various options
kalakan norm \
    --file input.txt \
    --output normalized.txt \
    --format json \
    --show-diff \
    --verbose
```

### Key Features

- **Abbreviation Expansion**: "Dr." → "doctor", "Prof." → "professor"
- **Number Conversion**: "25" → text representation
- **Special Character Normalization**: Proper handling of Twi characters (ɛ, ɔ)
- **Multiple Output Formats**: Text, JSON, CSV
- **Batch Processing**: Process multiple files at once
- **Interactive File Selection**: GUI file picker
- **Difference Visualization**: See what changed during normalization

## Documentation

For detailed documentation, visit [docs/](docs/).

## License

Apache License 2.0
