Metadata-Version: 2.2
Name: whisperchain
Version: 0.1.3
Summary: Voice control using Whisper.cpp with LangChain cleanup
Author-email: Chris Choy <chrischoy@ai.stanford.edu>
License: MIT
Project-URL: Homepage, https://github.com/chrischoy/whisperchain
Project-URL: Bug Tracker, https://github.com/chrischoy/whisperchain/issues
Keywords: whisper,langchain,voice-control,speech-to-text
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pynput>=1.7.7
Requires-Dist: pyperclip
Requires-Dist: openai>=1.0.0
Requires-Dist: pywhispercpp>=1.3.0
Requires-Dist: fastapi>=0.100.0
Requires-Dist: uvicorn>=0.22.0
Requires-Dist: pyaudio>=0.2.11
Requires-Dist: langchain>=0.1.0
Requires-Dist: langchain-openai>=0.1.0
Requires-Dist: websockets>=11.0.0
Requires-Dist: streamlit>=1.20.0
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "test"
Requires-Dist: httpx>=0.24.0; extra == "test"
Provides-Extra: dev
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.0.0; extra == "dev"
Requires-Dist: build>=0.10.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"

# Whisper Chain

<p align="center">
  <img src="https://github.com/chrischoy/WhisperChain/raw/main/assets/logo.jpg" width="30%" alt="Whisper Chain Logo" />
</p>

## Overview

Typing is boring, let's use voice to speed up your workflow. This project combines:
- Real-time speech recognition using Whisper.cpp
- Transcription cleanup using LangChain
- Global hotkey support for voice control
- Automatic clipboard integration for the cleaned transcription

## Requirements

- Python 3.8+
- OpenAI API Key
- For MacOS:
  - ffmpeg (for audio processing)
  - portaudio (for audio capture)

## Installation

1. Install system dependencies (MacOS):
```bash
# Install ffmpeg and portaudio using Homebrew
brew install ffmpeg portaudio
```

2. Install the project:
```bash
pip install whisperchain
```

## Configuration

WhisperChain will look for configuration in the following locations:
1. Environment variables
2. .env file in the current directory
3. ~/.whisperchain/.env file

On first run, if no configuration is found, you will be prompted to enter your OpenAI API key. The key will be saved in `~/.whisperchain/.env` for future use.

You can also manually set your OpenAI API key in any of these ways:
```bash
# Option 1: Environment variable
export OPENAI_API_KEY=your-api-key-here

# Option 2: Create .env file in current directory
echo "OPENAI_API_KEY=your-api-key-here" > .env

# Option 3: Create global config
mkdir -p ~/.whisperchain
echo "OPENAI_API_KEY=your-api-key-here" > ~/.whisperchain/.env
```

## Usage

1. Start the application:
```bash
# Run with default settings
whisperchain

# Run with custom configuration
whisperchain --config config.json

# Override specific settings
whisperchain --port 8080 --hotkey "<ctrl>+<alt>+t" --model "large" --debug
```

3. Use the global hotkey (`<ctrl>+<alt>+r` by default. `<ctrl>+<option>+r` on MacOS):
   - Press and hold to start recording
   - Speak your text
   - Release to stop recording
   - The cleaned transcription will be copied to your clipboard automatically
   - Paste (Ctrl+V) to paste the transcription

## Development

### Streamlit UI

```bash
streamlit run src/whisperchain/ui/streamlit_app.py
```

If there is an error in the Streamlit UI, you can run the following command to kill all running Streamlit processes:

```bash
lsof -ti :8501 | xargs kill -9
```

### Running Tests

Install test dependencies:
```bash
pip install -e ".[test]"
```

Run tests:
```bash
pytest tests/
```

Run tests with microphone input:
```bash
# Run specific microphone test
TEST_WITH_MIC=1 pytest tests/test_stream_client.py -v -k test_stream_client_with_real_mic

# Run all tests including microphone test
TEST_WITH_MIC=1 pytest tests/
```

### Building the project

```bash
python -m build
pip install .
```

### Publishing to PyPI

```bash
python -m build
twine upload --repository pypi dist/*
```

## License

[LICENSE](LICENSE)

## Acknowledgments

- [Whisper.cpp](https://github.com/ggerganov/whisper.cpp)
- [pywhispercpp](https://github.com/absadiki/pywhispercpp.git)
- [LangChain](https://github.com/langchain-ai/langchain)


## Architecture

```mermaid
graph TB
    subgraph "Client Options"
        K[Key Listener]
        A[Audio Stream]
        C[Clipboard]
    end

    subgraph "Streamlit Web UI :8501"
        WebP[Prompt]
        WebH[History]
    end

    subgraph "FastAPI Server :8000"
        WS[WebSocket /stream]
        W[Whisper Model]
        LC[LangChain Processor]
        H[History]
    end

    K -->|"Hot Key"| A
    A -->|"Audio Stream"| WS
    WS --> W
    W --> LC
    WebP --> LC
    LC --> C
    LC --> H
    H --> WebH
```
