Metadata-Version: 2.4
Name: visual-width
Version: 0.1.0
Summary: A Python library for calculating visual width of text, especially for subtitle display
License: MIT
Project-URL: Homepage, https://github.com/speech2srt/visual-width
Project-URL: Repository, https://github.com/speech2srt/visual-width
Project-URL: Issues, https://github.com/speech2srt/visual-width/issues
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# visual-width

A lightweight Python library for calculating visual width of text, especially designed for subtitle display.

## Features

- **Accurate width calculation**: Calculates visual width based on Unicode properties and character characteristics
- **Multi-language support**: Supports CJK (Chinese, Japanese, Korean), Latin, Arabic, Hebrew, and many other scripts
- **Subtitle-optimized**: Designed specifically for subtitle text width calculation
- **CJK comprehensive**: Full support for all CJK character ranges including extensions A-I
- **Zero dependencies**: Uses only Python standard library (`unicodedata`, `math`, `functools`)
- **Type safety**: Full type hints for better IDE support
- **Performance**: Uses LRU cache for efficient repeated calculations

## Installation

```bash
pip install visual-width
```

## Quick Start

### Calculate visual width of text

```python
from visual_width import VisualWidth

# Calculate width of English text
width = VisualWidth.calc("Hello, world!")
print(width)  # 13.0

# Calculate width of Chinese text
width = VisualWidth.calc("你好，世界")
print(width)  # 10.0 (5 characters × 2.0 width each)

# Calculate width of mixed text
width = VisualWidth.calc("Hello 世界")
print(width)  # 12.0 (5 + 1 + 6)
```

### Character width mapping

The library uses the following width mapping:

- **CJK characters** (Chinese, Japanese, Korean): 2.0
- **Latin lowercase letters**: ~1.0 (varies by character)
- **Latin uppercase letters**: ~1.15 (except I=0.4, M/W=1.3)
- **Numbers**: 0.9
- **Space**: 0.3
- **Very narrow characters** (i, l, I, etc.): 0.4
- **Narrow characters** (f, j, t, r, etc.): 0.6
- **Wide characters** (m, w, M, W, etc.): 1.3

### Return value

The `calc()` method returns a float value rounded up to one decimal place (always rounds up).

Examples:

- `1.23` → `1.3`
- `1.21` → `1.3`
- `1.20` → `1.2`

## API Reference

### VisualWidth

Main utility class for visual width calculation. All methods are static.

#### `VisualWidth.calc(text)`

Calculate the visual width of text for subtitle display.

This method calculates the visual width of text (in character units), where 1.0 represents the width of a standard Latin lowercase letter.

**Parameters:**

- `text` (str): Input text string.

**Returns:**

- `float`: Visual width in character units, rounded up to one decimal place.

**Examples:**

```python
from visual_width import VisualWidth

# English text
width = VisualWidth.calc("Hello")
print(width)  # 5.0

# Chinese text
width = VisualWidth.calc("你好")
print(width)  # 4.0

# Mixed text
width = VisualWidth.calc("Hello 世界")
print(width)  # 12.0

# Text with numbers and punctuation
width = VisualWidth.calc("Price: $99.50")
print(width)  # 13.0

# Empty string
width = VisualWidth.calc("")
print(width)  # 0.0
```

#### Class Constants

The `VisualWidth` class provides the following constants for character width categories:

- `VisualWidth.VERY_NARROW_CHARS`: Set of very narrow characters (width 0.4)
- `VisualWidth.NARROW_CHARS`: Set of narrow characters (width 0.6)
- `VisualWidth.WIDE_CHARS`: Set of wide characters (width 1.3)

These constants can be accessed directly if needed for custom logic.

## Supported Character Sets

### CJK Characters

- **CJK Unified Ideographs**: 0x4E00-0x9FFF
- **CJK Extension A-I**: All extension blocks (0x3400-0x323AF, 0x2EBF0-0x2EE5F)
- **CJK Compatibility Ideographs**: 0xF900-0xFAFF, 0x2F800-0x2FA1F
- **Hiragana**: 0x3040-0x309F
- **Katakana**: 0x30A0-0x30FF
- **Hangul Syllables**: 0xAC00-0xD7AF (plus extensions)
- **Bopomofo (Zhuyin)**: 0x3100-0x312F

### Other Scripts

- **Latin**: Standard and extended Latin characters
- **Greek**: Greek letters (uppercase: 1.1, lowercase: 1.0)
- **Cyrillic**: Cyrillic letters (uppercase: 1.15, lowercase: 1.0)
- **Arabic**: Arabic letters (0.8 width)
- **Hebrew**: Hebrew letters (0.9 width)
- **Thai**: Thai characters (0.9 width)
- **Devanagari**: Devanagari script (0.9 width)
- **Emoji**: Emoji and symbols (2.0 width)

## Requirements

- Python >= 3.10

No external dependencies required. This package uses only Python standard library modules (`unicodedata`, `math`, `functools`).

## License

MIT License
