Metadata-Version: 2.4
Name: dbt2looker-bigquery
Version: 0.14.2
Summary: Generate lookml view files from dbt models for Bigquery
License-File: LICENSE
Requires-Dist: datamodel-code-generator>=0.26.3
Requires-Dist: importlib-metadata>=4
Requires-Dist: lkml>=1.1
Requires-Dist: pre-commit>=4.0.1
Requires-Dist: pydantic>=2.9
Requires-Dist: pytest-mock>=3.14.0
Requires-Dist: pyyaml>=5.0
Requires-Dist: rich>=13.9.4
Requires-Dist: typing-extensions>=4.0
Description-Content-Type: text/markdown

# dbt2looker-bigquery (active as of 13.9.2024)
Use `dbt2looker-bigquery` to generate Looker view files automatically from dbt models in Bigquery.

This is a fork of dbt2looker that is specific to bigquery.
The intention is to allow one to define most of the simple and tedious lookml settings in dbt.
That way the lookml code gets less bloated, and can be more focused on advanced metrics and explores.

Want a deeper integration between dbt and your BI tool?
You should also checkout [Lightdash - the open source alternative to Looker](https://github.com/lightdash/lightdash)

**Features**
* Warehouses: BigQuery

* **Column descriptions** synced to looker
* **Dimension** for each column in dbt model
* **Define Dimensions** define common lookml settings in dbt like label, group label, hidden
* **Opinionated Primary key** automatically set the first column to be the primary key, and hide it.
* **Create explores for structs** automatically generate explores for complex tables with structs and arrays in them.
* **Dimension groups** for datetime/timestamp/date columns
* **Measures** defined through dbt column `metadata` [see below](#defining-measures)

## Quickstart

Run `dbt2looker` in the root of your dbt project *after compiling looker docs*.
(dbt2looker-bigquery uses docs to infer types and such)

**Generate Looker view files for all models:**
```shell
dbt docs generate
dbt2looker
```

**Generate Looker view files for all models tagged `prod`**
```shell
dbt2looker --tag prod
```

**Generate Looker view files for all exposed models **
[dbt docs - exposures](https://docs.getdbt.com/docs/build/exposures)
```shell
dbt2looker --exposed_only
```

**Generate Looker view files with hidden=yes paramenter for all models**
```shell
dbt2looker --hidden_dimensions
```

## Install

**Install from PyPi repository**

Install from pypi into a fresh virtual environment.

```
# Create virtual env
python3.7 -m venv dbt2looker-venv
source dbt2looker-venv/bin/activate

# Install
pip install dbt2looker-bigquery

# Run
dbt2looker
```

Or if you use Poetry 
```
# Create virtual env
poetry add dbt2looker-bigquery

# Run
dbt2looker
```


## Defining measures

You can define looker measures in your dbt `schema.yml` files. For example:

```yaml
models:
  - name: pages
    columns:
      - name: url
        description: "Page url"
      - name: event_id
        description: unique event id for page view
        meta:
            looker:
              hidden: True
              label: event
              group_label: identifiers
              value_format_name: id
              
            looker_measures:
              - type: count_distinct
                sql_distinct_key: ${url}
              - type: count
                value_format_name: decimal_1

```

## How to use in a production setting:
dbt2looker can be used in a production setting.
It is a bit of a project, so if you want to try to do it, prepare to spend some time.

This explanation is based on you using some kind of pipeline tool to do ci/cd on your dbt project.
I am using github actions and google cloud platform, so I will describe it as such.

after having done your continous deployment in the pipeline you can run:
(assuming that the manifest.json and catalog.json is in the target folder)
```bash
dbt docs generate
dbt2looker --target-dir ./target/ --output-dir ./looker --exposures_only
```

at this point you have a bunch of lookml files in a folder named /looker within your github action runner.
you will now need to store these files somewhere.

example using gsutil:
```bash
gsutil -m cp -r ./looker/views/ "gs://{{dbt2looker-backend}}/${{ github.ref_name }}/"
```
this will put the files in a google cloud storage bucket, in a folder named after the branch you deploy from.
So if that branch was main, you now have a lookml state stored in a folder called /main/ in your bucket.

The next step is to create a looker repository for exposing these autogenerated files in looker.
in this repo, you want a github action that syncs in the lookml files from the bucket into this project.

here is an example script:
(using some ENVIRONMENT VARIABLES)
(This script also allows for "locking" certain files from being synced, by adding #nosync anywhere in the file.)
```
          - name: Sync script
            id: find_script
            run: |
                #!/bin/bash
                set -e

                if [ -z "${GCS_AUTOGEN_BUCKET}" ]; then
                    echo "🗝️ The $ GCS_AUTOGEN_BUCKET repository variable is not set, Skipping dbt file sync. 😊"
                    exit 1
                fi

                filename="manifest.lkml"

                if [ -z "${DBT_SYNC_BRANCH}" ]; then
                    echo "🗝️ The DBT_SYNC_BRANCH variable is not set in lkml or as a repository variable. Skipping dbt file sync. 😊"
                    exit 1
                fi

                if [ -z "${SYNC_TO_TOP_LEVEL}" ]; then
                    echo "Sync to top level is not set. syncing to ./views/_autogen_dbt"
                    destination_dir="views/_autogen_dbt"
                else
                    echo "Sync to top level is set. syncing to ."
                    destination_dir="." 
                fi

                source_dir=".tmp/_autogen_dbt"
                mkdir .tmp

                mkdir $source_dir
                gsutil -m cp -r gs://${GCS_AUTOGEN_BUCKET}/${DBT_SYNC_BRANCH}/views/ $source_dir

                copy_recursive() {
                    local source="$1"
                    local destination="$2"
                    local -a stack=("$source")

                    mkdir -p "$destination" # create destination directory if it doesn't exist

                    while [ ${#stack[@]} -gt 0 ]; do
                        local last_index=$(( ${#stack[@]} - 1 ))  # Get the last index
                        local current="${stack[last_index]}"       # Get the last item in the stack
                        unset stack[last_index]                    # Remove the last item from the stack
                        stack=("${stack[@]}")                     # Rebuild the array to maintain contiguous indexing

                        echo "iterating over folder $current"
                        for item in "$current"/*; do
                            if [ -d "$item" ] && [ ! -L "$item" ]; then
                                local rel_path="${item#$source}"  # Remove source path prefix
                                rel_path="${rel_path#/}"          # Remove leading slash
                                mkdir -p "$destination/$rel_path"
                                stack+=("$item")                  # Add subdirectory to the stack
                            elif [ -f "$item" ]; then
                                local rel_path="${item#$source}"  # Remove source path prefix
                                rel_path="${rel_path#/}"          # Remove leading slash
                                local dest_path="$(dirname "$destination/$rel_path")"

                                if grep -q "#nosync" "$destination/$item"; then
                                    echo "'$item' contains '#nosync' tag and is not synced."
                                else
                                    mkdir -p "$dest_path"
                                    echo "copying $item to $dest_path"
                                    cp -f "$item" "$dest_path/"
                                fi
                            fi
                        done
                    done
                }

                copy_recursive "$source_dir" "$destination_dir"
```

Now you have the autogenerated, autoupdating files available in looker!
But how do you keep them fresh?

One option is to trigger the github action in the looker repository from the dbt repository.
One way to do this is to use a PAT-token to trigger a github action from one repo to another.

Another option is to schedule the refresh of the files in the looker-repository by running the github action for syncing files on a schedule.

So how do you use this automatically refreshing repository?
My recommendation is to import the project as a dependency into your main looker project, and ref the files from the autogen looker project.
That way you do not end up with merge conflicts and similar issues, and you can go back to previous versions of the sync project, if a problem arises.

