Metadata-Version: 2.1
Name: riffusion
Version: 0.0.5
Summary: Stable diffusion for real-time music generation.
License: MIT
Author: Hayk Martiros
Author-email: hayk.mart@gmail.com
Requires-Python: >=3.8, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*, !=3.7.*
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: accelerate (>=0.16.0,<0.17.0)
Requires-Dist: argh (>=0.27.2,<0.28.0)
Requires-Dist: dacite (>=1.8.0,<2.0.0)
Requires-Dist: demucs (>=4.0.0,<5.0.0)
Requires-Dist: diffusers (>=0.9.0)
Requires-Dist: flask (>=2.2.2,<3.0.0)
Requires-Dist: flask-cors (>=3.0.10,<4.0.0)
Requires-Dist: numpy (>=1.24.2,<2.0.0)
Requires-Dist: pillow (>=9.4.0,<10.0.0)
Requires-Dist: plotly (>=5.13.0,<6.0.0)
Requires-Dist: pydub (>=0.25.1,<0.26.0)
Requires-Dist: pysoundfile (>=0.9.0.post1,<0.10.0)
Requires-Dist: scipy (>=1.10.0,<2.0.0)
Requires-Dist: soundfile (>=0.11.0,<0.12.0)
Requires-Dist: sox (>=1.4.1,<2.0.0)
Requires-Dist: streamlit (>=1.17.0,<1.18.0)
Requires-Dist: torch (>=1.13.1,<2.0.0)
Requires-Dist: torchaudio (>=0.13.1,<0.14.0)
Requires-Dist: torchvision (>=0.14.1,<0.15.0)
Requires-Dist: transformers (>=4.26.1,<5.0.0)
Requires-Dist: watchdog (>=2.3.0,<3.0.0)
Description-Content-Type: text/markdown

# :guitar: Riffusion

<!-- markdownlint-disable MD033 MD034 -->

<a href="https://github.com/riffusion/riffusion/actions/workflows/ci.yml?query=branch%3Amain"><img alt="CI status" src="https://github.com/riffusion/riffusion/actions/workflows/ci.yml/badge.svg" /></a>
<img alt="Python 3.9 | 3.10" src="https://img.shields.io/badge/Python-3.9%20%7C%203.10-blue" />
<a href="https://github.com/riffusion/riffusion/tree/main/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/License-MIT-yellowgreen" /></a>

Riffusion is a library for real-time music and audio generation with stable diffusion.

Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.

This is the core repository for riffusion image and audio processing code.

* Diffusion pipeline that performs prompt interpolation combined with image conditioning
* Conversions between spectrogram images and audio clips
* Command-line interface for common tasks
* Interactive app using streamlit
* Flask server to provide model inference via API
* Various third party integrations

Related repositories:

* Web app: https://github.com/riffusion/riffusion-app
* Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1

## Citation

If you build on this work, please cite it as follows:

```txt
@article{Forsgren_Martiros_2022,
  author = {Forsgren, Seth* and Martiros, Hayk*},
  title = {{Riffusion - Stable diffusion for real-time music generation}},
  url = {https://riffusion.com/about},
  year = {2022}
}
```

## Install

Tested in CI with Python 3.9 and 3.10.

It's highly recommended to set up a virtual Python environment with `conda` or `virtualenv`:

```shell
conda create --name riffusion python=3.9
conda activate riffusion
```

Install Python package:

```shell
pip install -U riffusion
```

or clone the repository and install from source:

```shell
git clone https://github.com/riffusion/riffusion.git
cd riffusion
python -m pip install --editable .
```

In order to use audio formats other than WAV, [ffmpeg](https://ffmpeg.org/download.html) is required.

```shell
sudo apt-get install ffmpeg          # linux
brew install ffmpeg                  # mac
conda install -c conda-forge ffmpeg  # conda
```

If torchaudio has no backend, you may need to install `libsndfile`. See [this issue](https://github.com/riffusion/riffusion/issues/12).

If you have an issue, try upgrading [diffusers](https://github.com/huggingface/diffusers). Tested with 0.9 - 0.11.

Guides:

* [Simple Install Guide for Windows](https://www.reddit.com/r/riffusion/comments/zrubc9/installation_guide_for_riffusion_app_inference/)

## Backends

### CPU

`cpu` is supported but is quite slow.

### CUDA

`cuda` is the recommended and most performant backend.

To use with CUDA, make sure you have torch and torchaudio installed with CUDA support. See the
[install guide](https://pytorch.org/get-started/locally/) or
[stable wheels](https://download.pytorch.org/whl/torch_stable.html).

To generate audio in real-time, you need a GPU that can run stable diffusion with approximately 50
steps in under five seconds, such as a 3090 or A10G.

Test availability with:

```python
import torch
torch.cuda.is_available()
```

### MPS

The `mps` backend on Apple Silicon is supported for inference but some operations fall back to CPU,
particularly for audio processing. You may need to set
`PYTORCH_ENABLE_MPS_FALLBACK=1`.

In addition, this backend is not deterministic.

Test availability with:

```python
import torch
torch.backends.mps.is_available()
```

## Command-line interface

Riffusion comes with a command line interface for performing common tasks.

See available commands:

```shell
riffusion -h
```

Get help for a specific command:

```shell
riffusion image-to-audio -h
```

Execute:

```shell
riffusion image-to-audio --image spectrogram_image.png --audio clip.wav
```

## Riffusion Playground

Riffusion contains a [streamlit](https://streamlit.io/) app for interactive use and exploration.

Run with:

```shell
riffusion-playground
```

And access at http://127.0.0.1:8501/

<img alt="Riffusion Playground" style="width: 600px" src="https://i.imgur.com/OOMKBbT.png" />

## Run the model server

Riffusion can be run as a flask server that provides inference via API. This server enables the [web app](https://github.com/riffusion/riffusion-app) to run locally.

Run with:

```shell
riffusion-server --host 127.0.0.1 --port 3013
```

You can specify `--checkpoint` with your own directory or huggingface ID in diffusers format.

Use the `--device` argument to specify the torch device to use.

The model endpoint is now available at `http://127.0.0.1:3013/run_inference` via POST request.

Example input (see [InferenceInput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L28) for the API):

```json
{
  "alpha": 0.75,
  "num_inference_steps": 50,
  "seed_image_id": "og_beat",

  "start": {
    "prompt": "church bells on sunday",
    "seed": 42,
    "denoising": 0.75,
    "guidance": 7.0
  },

  "end": {
    "prompt": "jazz with piano",
    "seed": 123,
    "denoising": 0.75,
    "guidance": 7.0
  }
}
```

Example output (see [InferenceOutput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L54) for the API):

```json
{
  "image": "< base64 encoded JPEG image >",
  "audio": "< base64 encoded MP3 clip >"
}
```

## Tests

Tests live in the `test/` directory and are implemented with `unittest`.

To run all tests:

```shell
python -m unittest test/*_test.py
```

To run a single test:

```shell
python -m unittest test.audio_to_image_test
```

To preserve temporary outputs for debugging, set `RIFFUSION_TEST_DEBUG`:

```shell
RIFFUSION_TEST_DEBUG=1 python -m unittest test.audio_to_image_test
```

To run a single test case within a test:

```shell
python -m unittest test.audio_to_image_test -k AudioToImageTest.test_stereo
```

To run tests using a specific torch device, set `RIFFUSION_TEST_DEVICE`. Tests should pass with
`cpu`, `cuda`, and `mps` backends.

## Development Guide

Install additional packages for dev with `python -m pip install -r requirements-dev.txt`.

* Linters: `ruff`, `flake8`, `pylint`
* Formatter: `black`
* Type checker: `mypy`
* Docstring checker: `pydocstyle`

These are configured in `pyproject.toml`.

The results of `mypy .`, `black .`, and `ruff .` *must* be clean to accept a PR.

CI is run through GitHub Actions from `.github/workflows/ci.yml`.

Contributions are welcome through pull requests.

