Metadata-Version: 2.3
Name: singer-to-schema
Version: 0.2.1
Summary: A CLI and library to convert Singer catalogs to data warehouse schemas
Author: Andrew Jones
Author-email: Andrew Jones <andrew@andrew-jones.com>
License: MIT License
         
         Copyright (c) 2025 Andrew Jones
         
         Permission is hereby granted, free of charge, to any person obtaining a copy
         of this software and associated documentation files (the "Software"), to deal
         in the Software without restriction, including without limitation the rights
         to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
         copies of the Software, and to permit persons to whom the Software is
         furnished to do so, subject to the following conditions:
         
         The above copyright notice and this permission notice shall be included in all
         copies or substantial portions of the Software.
         
         THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
         IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
         FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
         AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
         LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
         OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
         SOFTWARE. 
Requires-Dist: pytest>=7.0.0 ; extra == 'dev'
Requires-Python: >=3.8
Project-URL: Homepage, https://github.com/andrew-jones/singer-to-schema
Project-URL: Repository, https://github.com/andrew-jones/singer-to-schema
Provides-Extra: dev
Description-Content-Type: text/markdown

# Singer to Schema

A Python library to convert Singer catalog JSON to BigQuery table schema format.

## Installation

```bash
pip install singer-to-schema
```

Or run directly with `uvx`.

```bash
uvx singer-to-schema --help
```

## Usage

The `SingerToSchema` class takes a Singer catalog JSON string and converts it to BigQuery table schema format.

### Command Line Interface

The package provides a command-line interface for easy conversion:

```bash
# Convert catalog.json to BigQuery schema and print to stdout
singer-to-schema catalog.json

# Convert and save to output file
singer-to-schema catalog.json -o bigquery_schema.json

# Read from stdin and output to file
cat catalog.json | singer-to-schema - -o schema.json

# Pretty print the output
singer-to-schema catalog.json --pretty

# Convert object/array fields to STRING instead of JSON
singer-to-schema catalog.json --no-json-fields

# Show help
singer-to-schema --help
```

### Library Usage

```python
from singer_to_schema import SingerToSchema

# Example Singer catalog JSON
catalog_json = '''{
  "streams": [
    {
      "tap_stream_id": "users",
      "stream": "users",
      "schema": {
        "type": ["null", "object"],
        "additionalProperties": false,
        "properties": {
          "id": {
            "type": ["null", "string"]
          },
          "name": {
            "type": ["null", "string"]
          },
          "date_modified": {
            "type": ["null", "string"],
            "format": "date-time"
          }
        }
      }
    }
  ]
}'''

# Create converter instance (default: use JSON fields)
converter = SingerToSchema(catalog_json)

# Or disable JSON fields to use STRING instead
converter_no_json = SingerToSchema(catalog_json, use_json_fields=False)

# Convert to BigQuery schema format
bigquery_schema = converter.to_bigquery()
print(bigquery_schema)

# Or get as JSON string
json_schema = converter.to_bigquery_json()
print(json_schema)
```

### Output

The `to_bigquery()` method returns a dictionary with the following structure:

```json
{
  "users": {
    "fields": [
      {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
      },
      {
        "name": "name",
        "type": "STRING",
        "mode": "NULLABLE"
      },
      {
        "name": "date_modified",
        "type": "TIMESTAMP",
        "mode": "NULLABLE"
      }
    ]
  }
}
```

## Type Mapping

The library maps Singer types to BigQuery types as follows:

| Singer Type | BigQuery Type |
|-------------|---------------|
| `string` | `STRING` |
| `integer` | `INT64` |
| `number` | `FLOAT64` |
| `boolean` | `BOOL` |
| `object` | `JSON` |
| `array` | `REPEATED` (with item type) |

### Date/Time Formats

When a string field has a `format` property, it's mapped to appropriate BigQuery types:

| Format | BigQuery Type |
|--------|---------------|
| `date-time` | `TIMESTAMP` |
| `date` | `DATE` |
| `time` | `TIME` |

### Array Fields

Array fields are converted to BigQuery `REPEATED` mode with the appropriate item type:

```json
{
  "tags": {
    "type": "array",
    "items": {
      "type": "string"
    }
  }
}
```

Becomes:

```json
{
  "name": "tags",
  "type": "STRING",
  "mode": "REPEATED"
}
```

## API Reference

### SingerToSchema

#### `__init__(catalog_json: str, use_json_fields: bool = True)`

Initialize the converter with a Singer catalog JSON string.

**Parameters:**
- `catalog_json`: A JSON string containing Singer catalog data
- `use_json_fields`: If True, object and array fields use JSON type. If False, they use STRING type.

**Raises:**
- `ValueError`: If the catalog structure is invalid
- `json.JSONDecodeError`: If the JSON is malformed

#### `to_bigquery() -> Dict[str, Any]`

Convert the Singer catalog to BigQuery table schema format.

**Returns:**
- Dictionary containing BigQuery schema for each stream

#### `to_bigquery_json() -> str`

Convert the Singer catalog to BigQuery table schema format as a JSON string.

**Returns:**
- JSON string containing BigQuery schema

## Development

### Running Tests

```bash
uv run pytest
```

## License

This project is licensed under the MIT License.
