Metadata-Version: 2.4
Name: funasr-python
Version: 0.1.5
Summary: A high-performance Python client for FunASR WebSocket speech recognition service
Project-URL: Documentation, https://github.com/alibaba-damo-academy/FunASR
Project-URL: Repository, https://github.com/alibaba-damo-academy/FunASR
Project-URL: Changelog, https://github.com/alibaba-damo-academy/FunASR/blob/main/CHANGELOG.md
Project-URL: Bug Reports, https://github.com/alibaba-damo-academy/FunASR/issues
Author: FunASR Team
Maintainer: FunASR Team
License: MIT License
        Copyright (c) 2025 FunASR
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in
        all copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
        THE SOFTWARE.
License-File: LICENSE
Keywords: asr,funasr,real-time,speech-recognition,streaming,websocket
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Requires-Dist: aiofiles>=23.0.0
Requires-Dist: build>=1.2.2.post1
Requires-Dist: librosa>=0.9.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=0.19.0
Requires-Dist: soundfile>=0.12.0
Requires-Dist: twine>=6.1.0
Requires-Dist: typing-extensions>=4.0.0; python_version < '3.11'
Requires-Dist: websockets>=11.0.0
Provides-Extra: all
Requires-Dist: coverage[toml]>=7.0.0; extra == 'all'
Requires-Dist: hatch>=1.7.0; extra == 'all'
Requires-Dist: mypy>=1.5.0; extra == 'all'
Requires-Dist: orjson>=3.8.0; extra == 'all'
Requires-Dist: pre-commit>=3.0.0; extra == 'all'
Requires-Dist: pyaudio>=0.2.11; extra == 'all'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'all'
Requires-Dist: pytest-cov>=4.0.0; extra == 'all'
Requires-Dist: pytest-mock>=3.10.0; extra == 'all'
Requires-Dist: pytest-timeout>=2.1.0; extra == 'all'
Requires-Dist: pytest>=7.0.0; extra == 'all'
Requires-Dist: ruff>=0.1.0; extra == 'all'
Requires-Dist: types-setuptools; extra == 'all'
Requires-Dist: uvloop>=0.17.0; (sys_platform != 'win32') and extra == 'all'
Requires-Dist: webrtcvad>=2.0.10; extra == 'all'
Provides-Extra: audio
Requires-Dist: pyaudio>=0.2.11; extra == 'audio'
Requires-Dist: webrtcvad>=2.0.10; extra == 'audio'
Provides-Extra: dev
Requires-Dist: coverage[toml]>=7.0.0; extra == 'dev'
Requires-Dist: hatch>=1.7.0; extra == 'dev'
Requires-Dist: mypy>=1.5.0; extra == 'dev'
Requires-Dist: orjson>=3.8.0; extra == 'dev'
Requires-Dist: pre-commit>=3.0.0; extra == 'dev'
Requires-Dist: pyaudio>=0.2.11; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.10.0; extra == 'dev'
Requires-Dist: pytest-timeout>=2.1.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: types-setuptools; extra == 'dev'
Requires-Dist: uvloop>=0.17.0; (sys_platform != 'win32') and extra == 'dev'
Requires-Dist: webrtcvad>=2.0.10; extra == 'dev'
Provides-Extra: lint
Requires-Dist: mypy>=1.5.0; extra == 'lint'
Requires-Dist: ruff>=0.1.0; extra == 'lint'
Requires-Dist: types-setuptools; extra == 'lint'
Provides-Extra: performance
Requires-Dist: orjson>=3.8.0; extra == 'performance'
Requires-Dist: uvloop>=0.17.0; (sys_platform != 'win32') and extra == 'performance'
Provides-Extra: test
Requires-Dist: coverage[toml]>=7.0.0; extra == 'test'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'test'
Requires-Dist: pytest-cov>=4.0.0; extra == 'test'
Requires-Dist: pytest-mock>=3.10.0; extra == 'test'
Requires-Dist: pytest-timeout>=2.1.0; extra == 'test'
Requires-Dist: pytest>=7.0.0; extra == 'test'
Description-Content-Type: text/markdown

# FunASR Python Client

[![PyPI version](https://badge.fury.io/py/funasr-python.svg)](https://badge.fury.io/py/funasr-python)
[![Python versions](https://img.shields.io/pypi/pyversions/funasr-python.svg)](https://pypi.org/project/funasr-python)
[![License](https://img.shields.io/pypi/l/funasr-python.svg)](https://pypi.org/project/funasr-python)
[![Tests](https://github.com/your-org/funasr-python/workflows/Tests/badge.svg)](https://github.com/your-org/funasr-python/actions)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)

A high-performance, enterprise-grade Python client for FunASR WebSocket speech recognition service. Built for production use with comprehensive error handling, automatic reconnection, and extensive customization options.

## 📖 Table of Contents

### Getting Started
- [Features](#features) - What this client offers
- [Installation](#installation) - Get up and running
- [Quick Start](#quick-start) - 5-minute tutorial
- [Common Use Cases](#common-use-cases) - Ready-to-use examples for your scenario

### Core Concepts
- [Recognition Mode Selection](#recognition-mode-selection-guide) - Choose the right mode
- [Configuration](#configuration) - Environment variables & settings
- [Error Handling](#error-handling) - Exception handling
- [Troubleshooting](#troubleshooting) - Fix common issues

### Advanced Topics
- [Advanced Usage](#advanced-usage) - Custom configurations, callbacks, streaming
- [Performance Optimization](#performance-optimization) - Tuning for production
- [API Reference](#api-reference) - Complete API documentation
- [Command Line Interface](#command-line-interface) - CLI usage

### Development
- [Testing](#testing) - Run tests
- [Development](#development) - Contribute to the project
- [Documentation & Guides](#documentation--guides) - Additional resources

## Features

### 🚀 **High Performance**
- **Asynchronous I/O**: Built on asyncio for maximum concurrency
- **Connection Pooling**: Efficient WebSocket connection management
- **Streaming Recognition**: Real-time speech recognition with minimal latency
- **Memory Efficient**: Optimized audio processing with configurable buffering

### 🔧 **Production Ready**
- **Robust Error Handling**: Comprehensive exception handling and recovery
- **Automatic Reconnection**: Smart reconnection with exponential backoff
- **Health Monitoring**: Built-in connection health checks
- **Resource Management**: Automatic cleanup and resource deallocation

### 📊 **Recognition Modes for Different Scenarios**
- **Offline Mode**: Best for complete audio files, highest accuracy
- **Online Mode**: Ultra-low latency streaming, suitable for interactive applications
- **Two-Pass Mode** ⭐: **Recommended for real-time scenarios** - combines streaming speed with offline accuracy

### 🎯 **Enterprise Features**
- **Configuration Management**: Flexible configuration with .env support
- **Comprehensive Logging**: Structured logging with configurable levels
- **Metrics & Monitoring**: Built-in performance metrics
- **Type Safety**: Full type hints for better IDE support

### 🎵 **Audio Processing**
- **Multiple Formats**: Support for WAV, FLAC, MP3, and more
- **Automatic Resampling**: Smart audio format conversion
- **Voice Activity Detection**: Optional VAD for improved efficiency
- **Microphone Integration**: Real-time microphone recording support

## Installation

### Basic Installation

```bash
pip install funasr-python
```

### With Optional Dependencies

```bash
# Audio processing capabilities
pip install funasr-python[audio]

# Performance optimizations
pip install funasr-python[performance]

# Development tools
pip install funasr-python[dev]

# Everything
pip install funasr-python[all]
```

### From Source

```bash
git clone https://github.com/alibaba-damo-academy/FunASR.git
cd FunASR/clients/funasr-python
pip install -e .
```

## Quick Start

### Step 1: Install

```bash
pip install funasr-python
```

### Step 2: Recognize Your First Audio File

```python
import asyncio
from funasr_client import AsyncFunASRClient

async def main():
    # Create client with default settings (Two-Pass mode)
    client = AsyncFunASRClient(
        server_url="ws://localhost:10095"  # Your FunASR server
    )
    
    # Recognize an audio file
    result = await client.recognize_file("path/to/audio.wav")
    print(f"Recognition result: {result.text}")
    
    await client.close()

if __name__ == "__main__":
    asyncio.run(main())
```

**That's it!** You've just transcribed your first audio file. 🎉

### Step 3: Choose the Right Mode for Your Use Case

The client supports three recognition modes. See [Recognition Mode Selection Guide](#recognition-mode-selection-guide) to choose the best one for your scenario:

- 🎯 **Two-Pass Mode** (Default, Recommended) - Best balance for real-time apps
- ⚡ **Online Mode** - Lowest latency for interactive apps  
- 🎓 **Offline Mode** - Highest accuracy for batch processing

### Next Steps

- 📖 [Common Use Cases](#common-use-cases) - See complete examples for your scenario
- ⚙️ [Configuration Guide](#configuration) - Customize behavior
- 🔧 [Advanced Topics](#advanced-usage) - Streaming, callbacks, and more

## Common Use Cases

This section provides complete, ready-to-use examples for common scenarios.

### Batch File Transcription

Process multiple audio files efficiently:

```python
import asyncio
from funasr_client import AsyncFunASRClient

async def batch_transcribe():
    client = AsyncFunASRClient(server_url="ws://localhost:10095")
    
    files = ["file1.wav", "file2.wav", "file3.wav"]
    
    # Process files concurrently
    tasks = [client.recognize_file(f) for f in files]
    results = await asyncio.gather(*tasks)
    
    for filename, result in zip(files, results):
        print(f"{filename}: {result.text}")
    
    await client.close()

asyncio.run(batch_transcribe())
```

### Real-time Customer Service (Audio Stream)

Stream audio from customer service calls for real-time transcription:

```python
import asyncio
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode
from funasr_client.callbacks import SimpleCallback

async def customer_service_transcription():
    """Real-time transcription for customer service calls."""
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.TWO_PASS,  # Best for real-time + accuracy
        enable_vad=True,                # Detect speech/silence
        chunk_interval=10               # Balanced latency
    )
    
    client = AsyncFunASRClient(config=config)
    
    # Callback to handle transcription results
    def on_partial(result):
        # Show live transcription to agent
        print(f"[LIVE] {result.text}")
    
    def on_final(result):
        # Save to database for quality assurance
        save_to_qa_system(result.text, result.confidence)
        print(f"[FINAL] {result.text} (confidence: {result.confidence:.2f})")
    
    callback = SimpleCallback(on_partial=on_partial, on_final=on_final)
    
    # Simulate audio stream from telephony system
    async def audio_stream_from_call():
        """Stream audio chunks from phone call (e.g., WebRTC, SIP)."""
        # In production, this would be from:
        # - WebRTC media stream
        # - SIP/RTP packets
        # - Twilio/Asterisk audio feed
        
        # Example: Read from audio buffer or network socket
        import wave
        with wave.open("customer_call.wav", 'rb') as wav:
            chunk_size = 3200  # 100ms at 16kHz, 16-bit mono
            while True:
                chunk = wav.readframes(1600)  # 100ms of frames
                if not chunk:
                    break
                yield chunk
                await asyncio.sleep(0.1)  # Real-time simulation
    
    await client.start()
    await client.recognize_stream(audio_stream_from_call(), callback)
    await client.close()

asyncio.run(customer_service_transcription())
```

### Live Meeting/Conference Transcription

Real-time transcription for online meetings with speaker diarization support:

```python
import asyncio
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode
from funasr_client.callbacks import SimpleCallback

async def meeting_transcription():
    """Real-time meeting transcription with timestamp."""
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.TWO_PASS,
        enable_vad=True,
        chunk_interval=10
    )
    
    client = AsyncFunASRClient(config=config)
    
    transcript_buffer = []
    
    def on_final(result):
        from datetime import datetime
        timestamp = datetime.now().strftime("%H:%M:%S")
        line = f"[{timestamp}] {result.text}"
        transcript_buffer.append(line)
        print(line)
    
    callback = SimpleCallback(on_final=on_final)
    
    # Stream audio from meeting platform (Zoom, Teams, etc.)
    async def meeting_audio_stream():
        """Stream audio from meeting platform API."""
        # In production, integrate with:
        # - Zoom SDK: https://marketplace.zoom.us/docs/sdk/native-sdks/audio
        # - Teams Bot: https://docs.microsoft.com/en-us/microsoftteams/platform/bots/calls-and-meetings/
        # - Agora: https://docs.agora.io/en/voice-call-4.x-preview/landing-page
        
        # Example: Streaming from audio input device
        import pyaudio
        
        CHUNK = 1600  # 100ms at 16kHz
        FORMAT = pyaudio.paInt16
        CHANNELS = 1
        RATE = 16000
        
        p = pyaudio.PyAudio()
        stream = p.open(
            format=FORMAT,
            channels=CHANNELS,
            rate=RATE,
            input=True,
            frames_per_buffer=CHUNK
        )
        
        try:
            print("🎤 Meeting recording started...")
            while True:
                data = await asyncio.get_event_loop().run_in_executor(
                    None, stream.read, CHUNK
                )
                yield data
        except KeyboardInterrupt:
            pass
        finally:
            stream.stop_stream()
            stream.close()
            p.terminate()
            
            # Save transcript to file
            with open("meeting_transcript.txt", "w") as f:
                f.write("\n".join(transcript_buffer))
            print(f"\n📝 Transcript saved: {len(transcript_buffer)} lines")
    
    await client.start()
    await client.recognize_stream(meeting_audio_stream(), callback)
    await client.close()

asyncio.run(meeting_transcription())
```

### Voice Command Recognition (Streaming)

Low-latency streaming recognition for voice-controlled IoT devices:

```python
import asyncio
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode
from funasr_client.callbacks import SimpleCallback

async def voice_control_device():
    """Voice commands for smart home devices."""
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.ONLINE,  # Lowest latency for commands
        chunk_interval=5,
        enable_vad=True
    )
    
    client = AsyncFunASRClient(config=config)
    
    def execute_command(result):
        if not result.is_final:
            return
        
        command = result.text.lower()
        print(f"Heard: {command}")
        
        # Command matching
        if "turn on" in command and "light" in command:
            print("✅ Turning on lights")
            # control_device("light", "on")
        elif "turn off" in command and "light" in command:
            print("✅ Turning off lights")
        elif "temperature" in command:
            print("🌡️  Current temperature: 22°C")
        elif "play music" in command:
            print("🎵 Starting music playback")
        else:
            print("❓ Command not recognized")
    
    callback = SimpleCallback(on_final=execute_command)
    
    # Stream from microphone with wake word detection
    async def voice_stream_with_wakeword():
        """Stream audio only after wake word detected."""
        import pyaudio
        
        CHUNK = 1600
        FORMAT = pyaudio.paInt16
        CHANNELS = 1
        RATE = 16000
        
        p = pyaudio.PyAudio()
        stream = p.open(
            format=FORMAT,
            channels=CHANNELS,
            rate=RATE,
            input=True,
            frames_per_buffer=CHUNK
        )
        
        print("🎤 Say 'Hey Assistant' to start...")
        
        try:
            while True:
                # In production, use wake word detection here
                # e.g., Porcupine, Snowboy, or custom model
                
                data = await asyncio.get_event_loop().run_in_executor(
                    None, stream.read, CHUNK
                )
                yield data
        except KeyboardInterrupt:
            pass
        finally:
            stream.stop_stream()
            stream.close()
            p.terminate()
    
    await client.start()
    await client.recognize_stream(voice_stream_with_wakeword(), callback)
    await client.close()

asyncio.run(voice_control_device())
```

### Live Broadcast Subtitle Generation

Generate real-time subtitles for live streaming platforms:

```python
import asyncio
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode
from funasr_client.callbacks import SimpleCallback

async def live_broadcast_subtitles():
    """Generate real-time subtitles for live streams."""
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.TWO_PASS,
        enable_vad=True,
        chunk_interval=8  # Balance between latency and accuracy
    )
    
    client = AsyncFunASRClient(config=config)
    
    subtitle_queue = []
    
    def on_partial(result):
        # Show live preview (may change)
        print(f"[PREVIEW] {result.text}", end='\r')
    
    def on_final(result):
        # Send to subtitle overlay system
        subtitle = {
            'text': result.text,
            'confidence': result.confidence,
            'timestamp': result.timestamp
        }
        subtitle_queue.append(subtitle)
        
        # Push to OBS, RTMP overlay, or subtitle service
        send_to_subtitle_overlay(subtitle)
        print(f"\n[SUBTITLE] {result.text}")
    
    callback = SimpleCallback(on_partial=on_partial, on_final=on_final)
    
    # Stream from broadcast source
    async def broadcast_audio_stream():
        """Stream audio from broadcast source (RTMP, HLS, etc.)."""
        # In production, integrate with:
        # - FFmpeg for RTMP streams
        # - OBS WebSocket for local capture
        # - Media server APIs (Wowza, Ant Media, etc.)
        
        # Example: Stream from RTMP using FFmpeg subprocess
        import subprocess
        
        ffmpeg_command = [
            'ffmpeg',
            '-i', 'rtmp://live-server/stream/key',  # Input stream
            '-f', 's16le',          # Output format: signed 16-bit little-endian
            '-ar', '16000',         # Sample rate: 16kHz
            '-ac', '1',             # Channels: mono
            '-'                     # Output to stdout
        ]
        
        process = subprocess.Popen(
            ffmpeg_command,
            stdout=subprocess.PIPE,
            stderr=subprocess.DEVNULL
        )
        
        chunk_size = 3200  # 100ms of 16-bit mono at 16kHz
        
        try:
            while True:
                chunk = process.stdout.read(chunk_size)
                if not chunk:
                    break
                yield chunk
                await asyncio.sleep(0)  # Yield control
        finally:
            process.terminate()
    
    await client.start()
    
    try:
        await client.recognize_stream(broadcast_audio_stream(), callback)
    except KeyboardInterrupt:
        print("\n\n📝 Broadcast ended. Saving subtitles...")
        # Save to SRT file
        save_to_srt_file(subtitle_queue, "broadcast_subtitles.srt")
    
    await client.close()

def send_to_subtitle_overlay(subtitle):
    """Send subtitle to overlay system (OBS, WebSocket, etc.)."""
    # Example: Send to OBS via WebSocket
    # obs_client.send_command("SetTextGDIPlusText", {"text": subtitle['text']})
    pass

def save_to_srt_file(subtitles, filename):
    """Save subtitles to SRT format."""
    with open(filename, 'w', encoding='utf-8') as f:
        for i, sub in enumerate(subtitles, 1):
            # SRT format requires timing - simplified example
            f.write(f"{i}\n")
            f.write(f"00:00:00,000 --> 00:00:05,000\n")
            f.write(f"{sub['text']}\n\n")

asyncio.run(live_broadcast_subtitles())
```

### Podcast/Meeting Transcription (File-based)

High-accuracy transcription for long-form content:

```python
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode

async def transcribe_podcast():
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.OFFLINE,  # Highest accuracy
        enable_itn=True  # Normalize numbers, dates, etc.
    )
    
    client = AsyncFunASRClient(config=config)
    
    result = await client.recognize_file("podcast_episode.wav")
    
    # Save to file
    with open("transcript.txt", "w") as f:
        f.write(result.text)
    
    print(f"Transcription saved. Confidence: {result.confidence:.2%}")
    
    await client.close()

asyncio.run(transcribe_podcast())
```

### Environment Configuration

Load settings from `.env` file for easy deployment:

```bash
# .env file
FUNASR_WS_URL=ws://production-server:10095
FUNASR_MODE=2pass
FUNASR_ENABLE_VAD=true
```

```python
from funasr_client import create_async_client

async def use_env_config():
    # Automatically loads from .env
    client = create_async_client()
    
    result = await client.recognize_file("audio.wav")
    print(result.text)
    
    await client.close()

asyncio.run(use_env_config())
```

## Recognition Mode Selection Guide

Choose the optimal recognition mode for your use case:

### Real-time Recognition (Microphone)

For real-time applications, we recommend **Two-Pass Mode** which provides the best balance of speed and accuracy:

```python
import asyncio
from funasr_client import AsyncFunASRClient
from funasr_client.models import RecognitionMode, ClientConfig

async def realtime_recognition():
    # Two-Pass Mode: Optimal for real-time scenarios
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.TWO_PASS,  # Recommended for real-time
        enable_vad=True,  # Voice activity detection
        chunk_interval=10  # Balanced latency/accuracy
    )

    client = AsyncFunASRClient(config=config)
    
    # Use recognize_file for testing
    result = await client.recognize_file("examples/audio/asr_example.wav")
    print(f"Recognition: {result.text}")
    
    await client.close()

if __name__ == "__main__":
    asyncio.run(realtime_recognition())
```

### Ultra-Low Latency (Interactive Applications)

For scenarios requiring minimal latency (e.g., voice assistants):

```python
async def ultra_low_latency():
    config = ClientConfig(
        server_url="ws://localhost:10095",
        mode=RecognitionMode.ONLINE,  # Ultra-low latency
        chunk_interval=5,  # Faster processing
        enable_vad=True
    )

    client = AsyncFunASRClient(config=config)
    
    # Use recognize_file for testing
    result = await client.recognize_file("examples/audio/asr_example.wav")
    print(f"Recognition: {result.text}")
    
    await client.close()
```

### Configuration with Environment Variables

Create a `.env` file:

```env
FUNASR_WS_URL=ws://localhost:10095
FUNASR_MODE=2pass  # Recommended: Two-Pass Mode for optimal real-time performance
FUNASR_SAMPLE_RATE=16000
FUNASR_ENABLE_ITN=true
FUNASR_ENABLE_VAD=true  # Recommended for real-time scenarios
```

```python
from funasr_client import create_async_client

# Configuration loaded automatically from .env
# Note: create_async_client() is a synchronous function
client = create_async_client()
result = await client.recognize_file("examples/audio/asr_example.wav")
print(result.text)
await client.close()
```

## Advanced Usage

### Custom Configuration

```python
from funasr_client import AsyncFunASRClient, ClientConfig, AudioConfig
from funasr_client.models import RecognitionMode, AudioFormat

config = ClientConfig(
    server_url="ws://your-server:10095",  # Must specify server URL
    mode=RecognitionMode.TWO_PASS,
    timeout=30.0,
    max_retries=3,
    audio=AudioConfig(
        sample_rate=16000,
        format=AudioFormat.PCM,
        channels=1
    )
)

client = AsyncFunASRClient(config=config)
```

### Callback Handlers

```python
from funasr_client.callbacks import SimpleCallback

def on_result(result):
    print(f"Received: {result.text}")

def on_error(error):
    print(f"Error: {error}")

callback = SimpleCallback(
    on_result=on_result,
    on_error=on_error
)

client = AsyncFunASRClient(callback=callback)
```

### Multiple Recognition Sessions

```python
async def recognize_multiple():
    # Use Two-Pass Mode for optimal performance
    client = AsyncFunASRClient(
        mode=RecognitionMode.TWO_PASS  # ⭐ Recommended
    )

    # Process multiple files concurrently
    tasks = [
        client.recognize_file("examples/audio/asr_example.wav"),
        client.recognize_file("examples/audio/61-70970-0001.wav"),
        client.recognize_file("examples/audio/61-70970-0016.wav")
    ]

    results = await asyncio.gather(*tasks)
    for i, result in enumerate(results, 1):
        print(f"File {i}: {result.text}")
```

### Real-time Applications Examples

#### Live Streaming Transcription

```python
async def live_transcription():
    """Real-time transcription for live streams."""
    config = ClientConfig(
        mode=RecognitionMode.TWO_PASS,  # ⭐ Optimal for live streaming
        enable_vad=True,                # Filter silence
        chunk_interval=8,               # Balanced performance
        auto_reconnect=True             # Handle network issues
    )

    client = AsyncFunASRClient(config=config)

    def on_result(result):
        if result.is_final:
            # Send to subtitle system
            send_subtitle(result.text, result.confidence)
        else:
            # Show live preview
            show_live_text(result.text)

    from funasr_client.callbacks import SimpleCallback
    callback = SimpleCallback(on_final=on_result, on_partial=on_result)

    await client.start()
    session = await client.start_realtime(callback)

    # Your audio streaming implementation here
    await stream_audio_to_session(session)
```

#### Voice Assistant Integration

```python
async def voice_assistant():
    """Voice assistant with Two-Pass optimization."""
    config = ClientConfig(
        mode=RecognitionMode.TWO_PASS,  # ⭐ Best for voice assistants
        enable_vad=True,                # Automatic speech detection
        chunk_interval=10               # Good responsiveness
    )

    client = AsyncFunASRClient(config=config)

    async def process_command(result):
        if result.is_final and result.confidence > 0.8:
            # Process voice command
            response = await process_voice_command(result.text)
            await speak_response(response)

    from funasr_client.callbacks import AsyncSimpleCallback
    callback = AsyncSimpleCallback(on_final=process_command)

    await client.start()
    session = await client.start_realtime(callback)

    print("🎤 Voice assistant ready. Speak now...")
    # Your microphone streaming logic here
```

## Command Line Interface

The package includes a full-featured CLI:

```bash
# Basic recognition
funasr-client recognize examples/audio/asr_example.wav

# Real-time recognition from microphone
funasr-client stream --source microphone

# Batch processing
funasr-client batch examples/audio/*.wav --output results.jsonl

# Server configuration
funasr-client configure --server-url ws://localhost:10095

# Test connection
funasr-client test-connection
```

## Recognition Mode Selection Guide

Choose the optimal recognition mode for your use case:

| Mode | Latency | Accuracy | Best For | Use Cases |
|------|---------|----------|----------|-----------|
| **Two-Pass** ⭐ | Medium | **High** | **Real-time applications** | Live streaming, real-time subtitles, voice assistants |
| **Online** | **Low** | Medium | Interactive apps | Voice commands, quick responses |
| **Offline** | High | **Highest** | File processing | Transcription services, post-processing |

### Two-Pass Mode Advantages ⭐

**Recommended for real-time scenarios** because it:

- ✅ **Fast partial results** for immediate user feedback (Phase 1: Online)
- ✅ **High-accuracy final results** using 2-pass optimization (Phase 2: Offline)
- ✅ **Balanced resource usage** with smart buffering
- ✅ **Production-ready** with robust error handling

```python
# Recommended configuration for real-time applications
config = ClientConfig(
    mode=RecognitionMode.TWO_PASS,  # Best balance
    enable_vad=True,                # Improves efficiency
    chunk_interval=10,              # Optimal for most cases
    auto_reconnect=True             # Production reliability
)
```

> ⚠️ **Important**: To ensure you receive **both** partial (online) and final (offline) results in Two-Pass mode:
> - ✅ Use `recognize_file()` for complete audio files (handles end-of-speech automatically)
> - ✅ Call `end_realtime_session()` after each utterance in streaming scenarios
> - ✅ Enable VAD (`enable_vad=True`) for better speech boundary detection
> - ✅ Include sufficient silence (0.5-1s) at the end of speech segments
> 
> 📖 **See detailed guide**: [Two-Pass Best Practices](docs/TWO_PASS_BEST_PRACTICES_zh.md)

## Configuration

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `FUNASR_WS_URL` | WebSocket server URL | `ws://localhost:10095` |
| `FUNASR_MODE` | Recognition mode (`offline`, `online`, `2pass`) | `2pass` ⭐ |
| `FUNASR_TIMEOUT` | Connection timeout | `30.0` |
| `FUNASR_MAX_RETRIES` | Max retry attempts | `3` |
| `FUNASR_SAMPLE_RATE` | Audio sample rate | `16000` |
| `FUNASR_ENABLE_ITN` | Enable inverse text normalization | `true` |
| `FUNASR_ENABLE_VAD` | Enable voice activity detection | `true` |
| `FUNASR_DEBUG` | Enable debug logging | `false` |

> 💡 **Tip**: Two-Pass Mode (`2pass`) is recommended for most real-time applications as it provides the best balance between latency and accuracy.

### Configuration File

```python
from funasr_client import ConfigManager

# Load from custom config file
config = ConfigManager.from_file("my_config.json")
client = AsyncFunASRClient(config=config.client_config)
```

## Error Handling

```python
from funasr_client.errors import (
    FunASRError,
    ConnectionError,
    AudioError,
    TimeoutError
)

try:
    result = await client.recognize_file("examples/audio/asr_example.wav")
except ConnectionError:
    print("Failed to connect to server")
except AudioError:
    print("Audio processing failed")
except TimeoutError:
    print("Request timed out")
except FunASRError as e:
    print(f"Recognition error: {e}")
```

## Troubleshooting

### Connection Issues

**Problem**: `ConnectionError: Failed to connect to server`

**Solutions**:
1. Verify server is running: `curl http://localhost:10095` (should upgrade to WebSocket)
2. Check server URL in config: `ws://localhost:10095` (not `http://`)
3. Test network connectivity: `ping localhost`
4. Check firewall settings

```python
# Enable debug logging to see connection details
import logging
logging.basicConfig(level=logging.DEBUG)

client = AsyncFunASRClient(
    server_url="ws://localhost:10095",
    timeout=60.0,  # Increase timeout
    max_retries=5   # More retry attempts
)
```

### Empty or No Recognition Results

**Problem**: `recognize_file()` returns empty text or no final result

**Common Causes**:
1. **Two-Pass Mode**: Missing end-of-speech signal
2. **Audio Format**: Incorrect sample rate or channels
3. **Audio Quality**: Too quiet, noisy, or non-speech content

**Solutions**:

```python
# Solution 1: Ensure proper audio format
from funasr_client import AudioProcessor, AudioConfig

processor = AudioProcessor(target_config=AudioConfig(
    sample_rate=16000,  # Match server expectation
    channels=1          # Mono audio
))
audio_data, sr = processor.load_audio_file("audio.wav")

# Solution 2: Enable VAD for better speech detection
config = ClientConfig(
    enable_vad=True,      # Detect speech boundaries
    chunk_interval=10      # Adequate processing time
)

# Solution 3: Check audio has sufficient silence at end (Two-Pass mode)
# Add 0.5-1 second silence to audio file, or use offline mode:
config = ClientConfig(mode=RecognitionMode.OFFLINE)
```

### High Latency / Slow Recognition

**Problem**: Recognition takes too long

**Solutions**:

```python
# Use Online mode for lowest latency
config = ClientConfig(
    mode=RecognitionMode.ONLINE,  # Fastest mode
    chunk_interval=5,              # Smaller chunks
    buffer_size=4096               # Smaller buffer
)

# Or optimize Two-Pass mode
config = ClientConfig(
    mode=RecognitionMode.TWO_PASS,
    chunk_interval=8,     # Reduce from default 10
    enable_vad=True       # Skip non-speech
)
```

### Audio Format Errors

**Problem**: `AudioError: Unsupported audio format`

**Solutions**:

```python
# Check supported formats
from funasr_client import AudioProcessor

processor = AudioProcessor()

# Supported: WAV, FLAC, MP3, OGG, M4A, etc.
# If format unsupported, convert first:

# Option 1: Use AudioProcessor to convert
audio_data, sr = processor.load_audio_file("audio.mp3")
processed = processor.convert_to_target_format(audio_data, sr)

# Option 2: Pre-convert with ffmpeg
# ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav
```

### Timeout Errors

**Problem**: `TimeoutError: Request timed out`

**Solutions**:

```python
# Increase timeout for large files
config = ClientConfig(
    timeout=120.0,        # 2 minutes
    max_retries=3,
    retry_delay=2.0
)

# For very large files, consider chunking or batch processing
async def process_large_file():
    client = AsyncFunASRClient(config=config)
    # Process in segments if possible
```

### Common Error Reference

| Error | Meaning | Solution |
|-------|---------|----------|
| `ConnectionError` | Cannot connect to server | Check server URL, network, firewall |
| `AudioFileNotFoundError` | File path incorrect | Verify file exists, check path |
| `AudioError` | Audio processing failed | Check format, sample rate, channels |
| `TimeoutError` | Request took too long | Increase timeout, check file size |
| `InvalidConfigurationError` | Config invalid | Check parameter values, types |
| `ResourceExhaustedError` | Connection pool full | Increase `connection_pool_size` |

### Getting Help

If problems persist:
1. Enable debug logging: `FUNASR_DEBUG=true`
2. Check server logs for errors
3. See [Two-Pass Best Practices](docs/TWO_PASS_BEST_PRACTICES_zh.md) for mode-specific issues
4. Open an issue: [GitHub Issues](https://github.com/alibaba-damo-academy/FunASR/issues)

## Performance Optimization

### Real-time Performance Best Practices

For optimal real-time performance, follow these recommendations:

```python
from funasr_client import AsyncFunASRClient, ClientConfig
from funasr_client.models import RecognitionMode, AudioConfig

# Optimized configuration for real-time scenarios
config = ClientConfig(
    # Core settings
    mode=RecognitionMode.TWO_PASS,  # ⭐ Best balance for real-time
    enable_vad=True,                # Reduces processing load
    chunk_interval=10,              # Optimal latency/accuracy trade-off

    # Performance settings
    auto_reconnect=True,            # Production reliability
    connection_pool_size=5,         # Connection reuse
    buffer_size=8192,               # Optimal buffer size

    # Audio optimization
    audio=AudioConfig(
        sample_rate=16000,          # Standard ASR rate
        channels=1,                 # Mono for efficiency
        sample_width=2              # 16-bit PCM
    )
)

client = AsyncFunASRClient(config=config)
```

### Performance Tuning Guidelines

| Parameter | Recommended Value | Impact |
|-----------|------------------|---------|
| `mode` | `TWO_PASS` ⭐ | Best accuracy/latency balance |
| `chunk_interval` | `10` | Standard real-time performance |
| `chunk_interval` | `5` | Lower latency, higher CPU usage |
| `chunk_interval` | `20` | Higher latency, lower CPU usage |
| `enable_vad` | `True` | Reduces unnecessary processing |
| `sample_rate` | `16000` | Optimal for most ASR models |

### Connection Pooling

```python
from funasr_client import AsyncFunASRClient, ClientConfig

# Create configuration with connection pool size
config = ClientConfig(
    server_url="ws://localhost:10095",  # Specify server URL
    connection_pool_size=10
)

# Create clients with shared configuration
client1 = AsyncFunASRClient(config=config)
client2 = AsyncFunASRClient(config=config)

# Both clients will use the same pool size configuration
```

### Audio Processing

```python
from funasr_client import AudioProcessor, AudioConfig, AsyncFunASRClient

# Create audio configuration
audio_config = AudioConfig(
    sample_rate=16000,
    channels=1
)

# Pre-process audio for better performance
processor = AudioProcessor(target_config=audio_config)

# Load and process audio file
audio_data, sample_rate = processor.load_audio_file("examples/audio/asr_example.wav")
processed_audio = processor.convert_to_target_format(audio_data, sample_rate)

# Use standard file recognition
client = AsyncFunASRClient()
result = await client.recognize_file("examples/audio/asr_example.wav")
```

## Testing

Run the test suite:

```bash
# Install test dependencies
pip install funasr-python[test]

# Run all tests
pytest

# Run with coverage
pytest --cov=funasr_client

# Run specific test categories
pytest -m unit
pytest -m integration
```

## Development

### Setup Development Environment

```bash
git clone https://github.com/alibaba-damo-academy/FunASR.git
cd FunASR/clients/funasr-python

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install
```

### Code Quality

```bash
# Format code
ruff format src/ tests/

# Lint code
ruff check src/ tests/

# Type check
mypy src/

# Run all quality checks
pre-commit run --all-files
```

## API Reference

### Core Classes

- **`AsyncFunASRClient`**: Main asynchronous client
- **`FunASRClient`**: Synchronous client wrapper
- **`ClientConfig`**: Client configuration
- **`AudioConfig`**: Audio processing configuration
- **`RecognitionResult`**: Recognition result container

### Callback System

- **`RecognitionCallback`**: Abstract callback interface
- **`SimpleCallback`**: Basic callback implementation
- **`LoggingCallback`**: Logging-based callback
- **`MultiCallback`**: Combines multiple callbacks

### Audio Processing

- **`AudioProcessor`**: Audio processing utilities
- **`AudioRecorder`**: Microphone recording
- **`AudioFileStreamer`**: File-based audio streaming

### Utilities

- **`ConfigManager`**: Configuration management
- **`ConnectionManager`**: Connection pooling
- **`Timer`**: Performance timing utilities

## Documentation & Guides

### Quick References ⚡
- [Two-Pass Quick Reference](docs/TWO_PASS_QUICK_REFERENCE.md) - Fast solutions for common Two-Pass mode issues
- [Examples Directory](examples/) - Comprehensive usage examples

### Detailed Guides 📖
- [Two-Pass Best Practices (中文)](docs/TWO_PASS_BEST_PRACTICES_zh.md) - Complete guide to avoid empty Phase 2 results
- API Reference (Coming soon)
- Configuration Guide (Coming soon)
- Performance Optimization (Coming soon)

### Architecture Documentation
- [FunASR WebSocket Protocol](../../runtime/docs/websocket_protocol.md)
- [Two-Pass Architecture](../../runtime/docs/funasr-wss-server-2pass-architecture.puml)

## Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details.

### Development Process

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Run the test suite
6. Submit a pull request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history.

## Support

- **Documentation**: [FunASR Documentation](https://github.com/alibaba-damo-academy/FunASR)
- **Issues**: [GitHub Issues](https://github.com/alibaba-damo-academy/FunASR/issues)
- **Discussions**: [GitHub Discussions](https://github.com/alibaba-damo-academy/FunASR/discussions)

## Acknowledgments

- Built on the excellent [FunASR](https://github.com/alibaba-damo-academy/FunASR) speech recognition toolkit
- Inspired by best practices from the Python asyncio ecosystem
- Thanks to all contributors and users for feedback and improvements