## Instructions to set up a GCE Disk Image for a Borealis Fireworker.
##
## NOTE:
## * Non-comment lines can be pasted into an ssh shell one section at a time.
##   Respond to input prompts and watch for errors.
## * `#` lines are instructions to carry out manually.
## * `##` lines are comments.

# * Create a Google Cloud Platform (GCP) project if you don't have one already.
#   If needed, enable billing, Compute Engine, Cloud Storage, StackDriver
#   Logging, IAM, Container Registry, and Cloud Build.
# * In the Google Cloud Platform Console > IAM > Service Accounts
#   https://console.cloud.google.com/iam-admin/serviceaccounts create a
#   Service Account "fireworker" and grant it access to your Cloud project.
# * In the Google Cloud Platform Console > IAM
#   https://console.cloud.google.com/iam-admin/iam grant these permissions to
#   your Compute Engine default service account *and* to the fireworker service
#   account:
#       Service Account User
#       Compute Instance Admin v1
#       Logs Writer
#       Storage Object Admin
#       Project Viewer
# * Decide which GCP Region and Zone you want to use. Costs vary by region.
# * Each user will need a MongoDB database. The easiest way is to have users
#   create their own databases on Atlas
#   (https://matsci.org/t/heres-how-to-connect-to-atlas-mongodb/4816).
#   Alternatively, install MongoDB onto a Debian server on Compute Engine
#   (https://docs.mongodb.com/manual/installation/), or set up a Google managed
#   MongoDB service or a Marketplace installation.


## =========================================================================
## TO CREATE A COMPUTE ENGINE DISK IMAGE
## This Disk Image will be the basis for creating Compute Engine VM instances.
## It will begin an Image Family to swap in upgraded Images when needed.

# * Create a Compute Engine VM instance via the Google Cloud Platform web
#   console https://console.cloud.google.com/compute/instancesAdd or via a
#   `gcloud compute instances create` command line.
#     Name: fireworker
#     Region & Zone: <where you want to run everything>
#     Machine family/series: N1 n1-standard-1 [These steps create a Disk Image.
#       Its machine type won't affect the VM instances created from it.]
#     Boot disk: Debian 10 (or later), new 30 GB persistent disk. [Do NOT pick
#       "COS" Container-Optimized OS. COS doesn't have a package manager and is
#       not built to install software. The Disk Image's boot disk type won't
#       affect the VM instances created from it. In the future, you can increase
#       the disk size.]
#     Identity and API access: "fireworker" service account.
#     Open the "Management, security, disks, networking, sole tenancy" section.
#       Management > Description: Fireworks worker node
#       Networking > Network interfaces: You can set External IP to None for
#         better security from internet attackers if you set up GCP's
#         Identity-Aware Proxy (IAP) (so you can still ssh to these VMs) and set
#         up Cloud NAT (so these VMs can access PyPI, Docker repos, etc.).
#   [Create]

## Connect to the GCE VM to run the setup steps, below.
gcloud compute ssh fireworker

sudo apt update && sudo apt upgrade && sudo apt autoremove

sudo apt install -y docker.io
sudo apt install -y make build-essential libssl-dev zlib1g-dev libbz2-dev \
    libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \
    libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev \
    liblzma-dev python-openssl nano


## Reboot the GCE VM to ensure the apts take effect.
sudo reboot
## Give it a few seconds to reboot, then reconnect.
gcloud compute ssh fireworker


## NOTE: You might have to install the gcloud component docker-credential-gcr to
## pull GCR Docker images. (GCR is the Google Container Registry where you can
## store Docker Images.) If you selected Ubuntu instead of Debian, installing
## gcloud components requires uninstalling google-cloud-sdk and reinstalling it
## per https://cloud.google.com/sdk/docs/downloads-interactive since the Ubuntu
## snap package doesn't support installing components or updates. See the
## April 16, 2021 version of this file in git for instructions to do that.

## Enable the docker service.
sudo systemctl enable docker
sudo systemctl start docker

## Join the `docker` group so you can run docker commands without `sudo`.
sudo usermod -aG docker "$USER"

## To make that take effect, disconnect and reconnect.
exit
gcloud compute ssh fireworker

## Test docker
docker --version
docker info

## Set up to authenticate to GCR Docker repositories.
## You can pass specific gcr repo names here, e.g. gcr.io,eu.gcr.io,us.gcr.io,asia.gcr.io
echo y | gcloud auth configure-docker

## Pull any Docker Images you want preinstalled for faster task startup.
##
## Tip: If each developer's application Docker Image is built 'FROM' a lower
## level Image that's common to the team (containing runtime components like
## Linux apts, Python, and pips), then preinstalling the runtime Image will
## save Docker Image build time as well as task startup time.
### Example: docker pull python:3.8.9
### Example: docker pull gcr.io/$(gcloud config get-value core/project)/my-app-runtime


## Install the Monitoring Agent so you can monitor memory & disk space utilization.
curl -sSO https://dl.google.com/cloudagents/add-monitoring-agent-repo.sh
sudo bash add-monitoring-agent-repo.sh --also-install
sudo service stackdriver-agent start


## ------------------------------------------------------------------------
## Create and switch to the fireworker user that'll run the borealis service.
sudo adduser --disabled-password fireworker
sudo usermod -aG docker fireworker
sudo su -l fireworker

## Set up fireworker to authenticate to GCR Docker repositories.
echo y | gcloud auth configure-docker

## Set up the software environment for borealis-fireworker
curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash
echo -e 'export PYENV_ROOT="$HOME/.pyenv"\nexport PATH="$PYENV_ROOT/bin:$PATH"\neval "$(pyenv init --path)"\n\n'"$(cat .profile)" > .profile
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
echo -e "alias ll='ls -l'\nalias la='ls -A'" >> ~/.bash_aliases

## Logout from the fireworker account and log back in to apply these changes.
exit
sudo su -l fireworker

mkdir ~/borealis && cd ~/borealis

pyenv install 3.8.9

pyenv global 3.8.9
pyenv local 3.8.9
pip install --upgrade pip setuptools virtualenv virtualenvwrapper virtualenv-clone wheel

pyenv virtualenv fireworker
pyenv local fireworker
pip install --upgrade pip setuptools virtualenv virtualenvwrapper virtualenv-clone wheel borealis-fireworks
pyenv rehash

## Find the borealis setup files that we'll need to copy.
SETUP=$(fireworker --setup)
echo The borealis-fireworker "setup" files are in: $SETUP

## Install the pip versions selected for borealis-fireworks which might be different.
pip install -r ${SETUP}/requirements.txt

## Copy the launchpad yaml file from the borealis setup directory.
cp ${SETUP}/example_my_launchpad.yaml my_launchpad.yaml

## Configure fireworker's my_launchpad.yaml
#
# Edit my_launchpad.yaml (e.g. using nano) so it can default to the host and
# port to connect to the MongoDB instance you set up, if any, e.g.:
#     host: mongo
#     port: 27017
#
# (The database connection fields
#     name:
#     username:
#     password:
# will normally be overridden by metadata attributes set when launching the GCE
# VMs, but you can set them here for a default database.)
#
# ... and optionally add settings like
#     idle_for_rockets: 900
#     idle_for_waiters: 3600
# to adjust how long Fireworker will idle before exiting. The Fireworker service
# will idle `idle_for_rockets` seconds (default 15 * 60) for any READY-to-run
# rockets to appear in the queue or `idle_for_waiters` seconds (default 60 * 60,
# >= idle_for_rockets) for WAITING rockets to become READY, that is for queued
# rockets that are just waiting on other upstream rockets to finish. Consider
# making it longer when your workflow contains tasks that have to wait longer
# than that for prerequisite tasks.
#
# ... and optionally set
#     logdir: /home/fireworker/fw/logs
# for local Fireworks log files.
nano my_launchpad.yaml

## Copy the startup.sh script and the service file.
cp ${SETUP}/startup.sh ${SETUP}/borealis-fireworker.service .

## Logout from the fireworker account.
exit

### ------------------------------------------------------------------------

# Follow the instructions in
# /home/fireworker/borealis/borealis-fireworker.service
# to set up that systemd service.

### ------------------------------------------------------------------------

## MAKE A DISK IMAGE IN THE DISK IMAGE FAMILY "fireworker":

## Start up the service and check its journal for problems:
sudo systemctl start borealis-fireworker
sudo journalctl -u borealis-fireworker
sudo systemctl enable borealis-fireworker

## Stop the service and the server.
sudo systemctl stop borealis-fireworker
sudo shutdown -h now

# * Watch the Compute Engine > VM instances page to see when this VM has fully stopped.
# * Click on the stopped VM to open its "VM instance details" page.
# * Remove any "Custom metadata" values (set by launching the VM via the gce
#   command) such as the MongoDB "host", "uri_mode", or "db" that you don't want
#   to override fields of my_launchpad.yaml.
# * Click the disk "fireworker" (or "fireworker-0") in the Compute Engine > Disks
#   console page, https://console.cloud.google.com/compute/disks
# * Click "CREATE IMAGE".
#   * Name: fireworker-v0  -- picking the next number in the fireworker series
#   * Family: fireworker   <== MUST DO THIS. The launch-workers script will
#     instantiate workers by the Disk Image Family name.
#   * Description: Fireworks worker node  -- documentation
#   * Click [Create].
# * When it finishes:
#   * Test the Disk Image by running a test workflow or start it via the gce
#     command, ssh to it, and check its journal.
#   * Delete the stopped "fireworker" VM (including its boot disk) using the
#     console or the gcloud command line.
#   * Consider pruning the number of "fireworker-v*" Disk Images to 2.


## =========================================================================
## TO UPDATE THIS DISK IMAGE IN THE FUTURE
## First start a VM from this Disk Image using the gce.py script (or use the
## GCE console or gcloud command line to create a VM "fireworker-0" from the
## Disk Image Family "fireworker"):
gce fireworker

## Connect to it:
gcloud compute ssh fireworker-0

## Stop the service:
sudo systemctl stop borealis-fireworker

## Upgrade apts:
## (If it says "Resource temporarily unavailable", wait and try again.)
sudo apt update && sudo apt upgrade && sudo apt autoremove

## If you upgraded any apts or if it printed `*** System restart required ***`:
sudo reboot
## ... and wait a few seconds, then reconnect:
gcloud compute ssh fireworker-0
sudo systemctl stop borealis-fireworker

## Trim the service journal.
sudo journalctl --vacuum-time=2d

## Pull any Docker images that you want to preload for speed.
docker pull python:3.8.9  # or similar image(s) including your project's runtime Image

## Prune Docker images:
docker image prune

## Update the fireworker pips:
sudo su -l fireworker
cd ~/borealis
pip install --upgrade pip setuptools wheel borealis-fireworks
pip install -r $(fireworker --setup)/requirements.txt
pyenv rehash

## Disconnect:
exit

# * Repeat the "Make a disk image" steps, above.
# * Test the new disk image. If it doesn't work, you can mark it "deprecated" in
#   the Image Family so new launches will use a previous Disk Image.
