Requirements
- GPU Memory: 12GB (Inference)
- System: Linux, WSL
System Setup
OpenAudio supports multiple installation methods. Choose the one that best fits your development environment.
Prerequisites: Install system dependencies for audio processing:
Conda
conda create -n fish-speech python=3.12
conda activate fish-speech
# GPU installation (choose your CUDA version: cu126, cu128, cu129)
pip install -e .[cu129]
# CPU-only installation
pip install -e .[cpu]
# Default installation (uses PyTorch default index)
pip install -e .
UV
UV provides faster dependency resolution and installation:
# GPU installation (choose your CUDA version: cu126, cu128, cu129)
uv sync --python 3.12 --extra cu129
# CPU-only installation
uv sync --python 3.12 --extra cpu
Intel Arc XPU support
For Intel Arc GPU users, install with XPU support:
conda create -n fish-speech python=3.12
conda activate fish-speech
# Install required C++ standard library
conda install libstdcxx -c conda-forge
# Install PyTorch with Intel XPU support
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu
# Install Fish Speech
pip install -e .
Warning
The compile
option is not supported on windows and macOS, if you want to run with compile, you need to install trition by yourself.
Docker Setup
OpenAudio S1 series model provides multiple Docker deployment options to suit different needs. You can use pre-built images from Docker Hub, build locally with Docker Compose, or manually build custom images.
We provided Docker images for both WebUI and API server on both GPU(CUDA126 for default) and CPU. You can use the pre-built images from Docker Hub, or build locally with Docker Compose, or manually build custom images. If you want to build locally, follow the instructions below. If you just want to use the pre-built images, follow inference guide to use directly.
Prerequisites
- Docker and Docker Compose installed
- NVIDIA Docker runtime (for GPU support)
- At least 12GB GPU memory for CUDA inference
Use docker compose
For development or customization, you can use Docker Compose to build and run locally:
# Clone the repository first
git clone https://github.com/fishaudio/fish-speech.git
cd fish-speech
# Start WebUI with CUDA
docker compose --profile webui up
# Start WebUI with compile optimization
COMPILE=1 docker compose --profile webui up
# Start API server
docker compose --profile server up
# Start API server with compile optimization
COMPILE=1 docker compose --profile server up
# For CPU-only deployment
BACKEND=cpu docker compose --profile webui up
Environment Variables for Docker Compose
You can customize the deployment using environment variables:
# .env file example
BACKEND=cuda # or cpu
COMPILE=1 # Enable compile optimization
GRADIO_PORT=7860 # WebUI port
API_PORT=8080 # API server port
UV_VERSION=0.8.15 # UV package manager version
The comand will build the image and run the container. You can access the WebUI at http://localhost:7860
and the API server at http://localhost:8080
.
Manual Docker Build
For advanced users who want to customize the build process:
# Build WebUI image with CUDA support
docker build \
--platform linux/amd64 \
-f docker/Dockerfile \
--build-arg BACKEND=cuda \
--build-arg CUDA_VER=12.6.0 \
--build-arg UV_EXTRA=cu126 \
--target webui \
-t fish-speech-webui:cuda .
# Build API server image with CUDA support
docker build \
--platform linux/amd64 \
-f docker/Dockerfile \
--build-arg BACKEND=cuda \
--build-arg CUDA_VER=12.6.0 \
--build-arg UV_EXTRA=cu126 \
--target server \
-t fish-speech-server:cuda .
# Build CPU-only images (supports multi-platform)
docker build \
--platform linux/amd64,linux/arm64 \
-f docker/Dockerfile \
--build-arg BACKEND=cpu \
--target webui \
-t fish-speech-webui:cpu .
# Build development image
docker build \
--platform linux/amd64 \
-f docker/Dockerfile \
--build-arg BACKEND=cuda \
--target dev \
-t fish-speech-dev:cuda .
Build Arguments
BACKEND
:cuda
orcpu
(default:cuda
)CUDA_VER
: CUDA version (default:12.6.0
)UV_EXTRA
: UV extra for CUDA (default:cu126
)UBUNTU_VER
: Ubuntu version (default:24.04
)PY_VER
: Python version (default:3.12
)
Volume Mounts
Both methods require mounting these directories:
./checkpoints:/app/checkpoints
- Model weights directory./references:/app/references
- Reference audio files directory
Environment Variables
COMPILE=1
- Enable torch.compile for faster inference (~10x speedup)GRADIO_SERVER_NAME=0.0.0.0
- WebUI server hostGRADIO_SERVER_PORT=7860
- WebUI server portAPI_SERVER_NAME=0.0.0.0
- API server hostAPI_SERVER_PORT=8080
- API server port
Note
The Docker containers expect model weights to be mounted at /app/checkpoints
. Make sure to download the required model weights before starting the containers.
Warning
GPU support requires NVIDIA Docker runtime. For CPU-only deployment, remove the --gpus all
flag and use CPU images.