Metadata-Version: 2.1
Name: audio-miner
Version: 0.0.6
Summary: A radio streaming and transcription application.
Author: Sebastian Milchsack
Author-email: info@milchsack.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
License-File: LICENSE

# README for audio_miner

![Test Status](https://github.com/smilchsack/audio_miner/actions/workflows/ci.yml/badge.svg)

## audio_miner

audio_miner is a Python application that records audio streams and transcribes them using the Whisper model. It allows users to capture radio broadcasts and convert them into text files for easy access and analysis.

### Features

- Record audio streams in specified segments.
- Transcribe recorded audio using the Whisper model.
- Organize recordings and transcriptions in a structured directory.
- Optional speaker diarization using PyAnnote (requires a Hugging Face token).

### Installation

To install audio_miner, you can use pip. Clone the repository and run the following command:

```bash
pip install .
```

Additionally, you need to have `ffmpeg` installed on your system. You can install it using the following command:

```bash
sudo apt-get install ffmpeg
```

### Usage

After installation, you can run audio_miner from the command line. Use the following command format:

```bash
audio_miner --stream-url <STREAM_URL> --sender <SENDER_NAME> [--segment-time <SEGMENT_TIME>] [--base-dir <BASE_DIR>] [--poll-interval <POLL_INTERVAL>] [--whisper-model <WHISPER_MODEL>] [--quality <QUALITY>] [--record-only] [--transcribe-only] [--verbose] [--ffmpeg-path <FFMPEG_PATH>]
```

#### Parameters

- `--stream-url`: The URL of the audio stream to record (required).
- `--sender`: The name of the radio station (required).
- `--segment-time`: Length of each audio segment in seconds (default: 3600).
- `--base-dir`: Base directory for storing audio and transcription files (default: current directory).
- `--poll-interval`: Interval in seconds between recordings (default: 5).
- `--whisper-model`: The Whisper model to use for transcription (default: TURBO; options include TINY, BASE, SMALL, MEDIUM, LARGE, TURBO). **Warning: TURBO requires ~6GB VRAM and LARGE requires ~10GB VRAM. More info: [https://github.com/openai/whisper](https://github.com/openai/whisper)**

- `--quality`: Audio bitrate for re-encoding (e.g., 64k). If not specified, the original stream quality will be copied.
- `--start-time`: Start time for transcription in YYYYMMDD_HHMMSS format. Only relevant when using `--transcribe-only`.
- `--end-time`: End time for transcription in YYYYMMDD_HHMMSS format. Only relevant when using `--transcribe-only`.
- `--token`: Hugging Face token for PyAnnote speaker diarization model (optional). If provided, diarization will be performed.
- `--record-only`: Record audio without transcribing.
- `--transcribe-only`: Transcribe existing audio files without recording.
- `--verbose`: Enable detailed output.
- `--ffmpeg-path`: Path to the `ffmpeg` executable. This is only necessary if `ffmpeg` cannot be started directly from the terminal.

### Example

To record from a stream and transcribe it, you can use:

```bash
audio_miner --stream-url 'https://liveradio.swr.de/sw282p3/swr1rp/' --sender 'swr1' --segment-time 300 --base-dir './output' --poll-interval 5 --whisper-model TURBO
```

### Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue for any suggestions or improvements.

### License

This project is licensed under the MIT License. See the LICENSE file for more details.
