Metadata-Version: 2.1
Name: skypilot
Version: 0.6.1
Summary: SkyPilot: An intercloud broker for the clouds
Author: SkyPilot Team
License: Apache 2.0
Project-URL: Homepage, https://github.com/skypilot-org/skypilot
Project-URL: Issues, https://github.com/skypilot-org/skypilot/issues
Project-URL: Discussion, https://github.com/skypilot-org/skypilot/discussions
Project-URL: Documentation, https://skypilot.readthedocs.io/en/latest/
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Distributed Computing
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: wheel
Requires-Dist: cachetools
Requires-Dist: click >=7.0
Requires-Dist: colorama
Requires-Dist: cryptography
Requires-Dist: jinja2 >=3.0
Requires-Dist: jsonschema
Requires-Dist: networkx
Requires-Dist: pandas >=1.3.0
Requires-Dist: pendulum
Requires-Dist: PrettyTable >=2.0.0
Requires-Dist: python-dotenv
Requires-Dist: rich
Requires-Dist: tabulate
Requires-Dist: typing-extensions
Requires-Dist: filelock >=3.6.0
Requires-Dist: packaging
Requires-Dist: psutil
Requires-Dist: pulp
Requires-Dist: pyyaml !=5.4.*,>3.13
Requires-Dist: requests
Provides-Extra: all
Requires-Dist: urllib3 <2 ; extra == 'all'
Requires-Dist: awscli >=1.27.10 ; extra == 'all'
Requires-Dist: botocore >=1.29.10 ; extra == 'all'
Requires-Dist: boto3 >=1.26.1 ; extra == 'all'
Requires-Dist: colorama <0.4.5 ; extra == 'all'
Requires-Dist: azure-cli >=2.31.0 ; extra == 'all'
Requires-Dist: azure-core ; extra == 'all'
Requires-Dist: azure-identity >=1.13.0 ; extra == 'all'
Requires-Dist: azure-mgmt-network ; extra == 'all'
Requires-Dist: azure-storage-blob ; extra == 'all'
Requires-Dist: msgraph-sdk ; extra == 'all'
Requires-Dist: ray[default] !=2.6.0,>=2.2.0 ; extra == 'all'
Requires-Dist: google-api-python-client >=2.69.0 ; extra == 'all'
Requires-Dist: google-cloud-storage ; extra == 'all'
Requires-Dist: ibm-cloud-sdk-core ; extra == 'all'
Requires-Dist: ibm-vpc ; extra == 'all'
Requires-Dist: ibm-platform-services ; extra == 'all'
Requires-Dist: ibm-cos-sdk ; extra == 'all'
Requires-Dist: docker ; extra == 'all'
Requires-Dist: oci ; extra == 'all'
Requires-Dist: kubernetes >=20.0.0 ; extra == 'all'
Requires-Dist: protobuf !=3.19.5,>=3.15.3 ; extra == 'all'
Requires-Dist: pydantic !=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3 ; extra == 'all'
Requires-Dist: runpod >=1.5.1 ; extra == 'all'
Requires-Dist: cudo-compute >=0.1.10 ; extra == 'all'
Requires-Dist: pyvmomi ==8.0.1.0.2 ; extra == 'all'
Requires-Dist: grpcio !=1.48.0,<=1.51.3,>=1.32.0 ; (python_version < "3.10" and sys_platform != "darwin") and extra == 'all'
Requires-Dist: grpcio !=1.48.0,<=1.49.1,>=1.32.0 ; (python_version < "3.10" and sys_platform == "darwin") and extra == 'all'
Requires-Dist: grpcio !=1.48.0,<=1.51.3,>=1.42.0 ; (python_version >= "3.10" and sys_platform != "darwin") and extra == 'all'
Requires-Dist: grpcio !=1.48.0,<=1.49.1,>=1.42.0 ; (python_version >= "3.10" and sys_platform == "darwin") and extra == 'all'
Provides-Extra: aws
Requires-Dist: urllib3 <2 ; extra == 'aws'
Requires-Dist: awscli >=1.27.10 ; extra == 'aws'
Requires-Dist: botocore >=1.29.10 ; extra == 'aws'
Requires-Dist: boto3 >=1.26.1 ; extra == 'aws'
Requires-Dist: colorama <0.4.5 ; extra == 'aws'
Provides-Extra: azure
Requires-Dist: azure-cli >=2.31.0 ; extra == 'azure'
Requires-Dist: azure-core ; extra == 'azure'
Requires-Dist: azure-identity >=1.13.0 ; extra == 'azure'
Requires-Dist: azure-mgmt-network ; extra == 'azure'
Requires-Dist: azure-storage-blob ; extra == 'azure'
Requires-Dist: msgraph-sdk ; extra == 'azure'
Requires-Dist: ray[default] !=2.6.0,>=2.2.0 ; extra == 'azure'
Provides-Extra: cloudflare
Requires-Dist: urllib3 <2 ; extra == 'cloudflare'
Requires-Dist: awscli >=1.27.10 ; extra == 'cloudflare'
Requires-Dist: botocore >=1.29.10 ; extra == 'cloudflare'
Requires-Dist: boto3 >=1.26.1 ; extra == 'cloudflare'
Requires-Dist: colorama <0.4.5 ; extra == 'cloudflare'
Provides-Extra: cudo
Requires-Dist: cudo-compute >=0.1.10 ; extra == 'cudo'
Provides-Extra: docker
Requires-Dist: docker ; extra == 'docker'
Requires-Dist: ray[default] !=2.6.0,>=2.2.0 ; extra == 'docker'
Provides-Extra: fluidstack
Provides-Extra: gcp
Requires-Dist: google-api-python-client >=2.69.0 ; extra == 'gcp'
Requires-Dist: google-cloud-storage ; extra == 'gcp'
Provides-Extra: ibm
Requires-Dist: ibm-cloud-sdk-core ; extra == 'ibm'
Requires-Dist: ibm-vpc ; extra == 'ibm'
Requires-Dist: ibm-platform-services ; extra == 'ibm'
Requires-Dist: ibm-cos-sdk ; extra == 'ibm'
Requires-Dist: ray[default] !=2.6.0,>=2.2.0 ; extra == 'ibm'
Provides-Extra: kubernetes
Requires-Dist: kubernetes >=20.0.0 ; extra == 'kubernetes'
Provides-Extra: lambda
Requires-Dist: ray[default] !=2.6.0,>=2.2.0 ; extra == 'lambda'
Provides-Extra: oci
Requires-Dist: oci ; extra == 'oci'
Requires-Dist: ray[default] !=2.6.0,>=2.2.0 ; extra == 'oci'
Provides-Extra: paperspace
Provides-Extra: remote
Requires-Dist: protobuf !=3.19.5,>=3.15.3 ; extra == 'remote'
Requires-Dist: pydantic !=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3 ; extra == 'remote'
Requires-Dist: grpcio !=1.48.0,<=1.51.3,>=1.32.0 ; (python_version < "3.10" and sys_platform != "darwin") and extra == 'remote'
Requires-Dist: grpcio !=1.48.0,<=1.49.1,>=1.32.0 ; (python_version < "3.10" and sys_platform == "darwin") and extra == 'remote'
Requires-Dist: grpcio !=1.48.0,<=1.51.3,>=1.42.0 ; (python_version >= "3.10" and sys_platform != "darwin") and extra == 'remote'
Requires-Dist: grpcio !=1.48.0,<=1.49.1,>=1.42.0 ; (python_version >= "3.10" and sys_platform == "darwin") and extra == 'remote'
Provides-Extra: runpod
Requires-Dist: runpod >=1.5.1 ; extra == 'runpod'
Provides-Extra: scp
Requires-Dist: ray[default] !=2.6.0,>=2.2.0 ; extra == 'scp'
Provides-Extra: vsphere
Requires-Dist: pyvmomi ==8.0.1.0.2 ; extra == 'vsphere'

<p align="center">
  <img alt="SkyPilot" src="https://raw.githubusercontent.com/skypilot-org/skypilot/master/docs/source/images/skypilot-wide-light-1k.png" width=55%>
</p>

<p align="center">
  <a href="https://skypilot.readthedocs.io/en/latest/">
    <img alt="Documentation" src="https://readthedocs.org/projects/skypilot/badge/?version=latest">
  </a>

  <a href="https://github.com/skypilot-org/skypilot/releases">
    <img alt="GitHub Release" src="https://img.shields.io/github/release/skypilot-org/skypilot.svg">
  </a>

  <a href="http://slack.skypilot.co">
    <img alt="Join Slack" src="https://img.shields.io/badge/SkyPilot-Join%20Slack-blue?logo=slack">
  </a>

</p>


<h3 align="center">
    Run LLMs and AI on Any Cloud
</h3>

----
:fire: *News* :fire:
- [Jul, 2024] [Finetune](./llm/llama-3_1-finetuning/) and [serve](./llm/llama-3_1/) **Llama 3.1** on your infra
- [Jun, 2024] Reproduce **GPT** with [llm.c](https://github.com/karpathy/llm.c/discussions/481) on any cloud: [**guide**](./llm/gpt-2/)
- [Apr, 2024] Serve and finetune [**Llama 3**](https://skypilot.readthedocs.io/en/latest/gallery/llms/llama-3.html) on any cloud or Kubernetes: [**example**](./llm/llama-3/)
- [Apr, 2024] Serve [**Qwen-110B**](https://qwenlm.github.io/blog/qwen1.5-110b/) on your infra: [**example**](./llm/qwen/)
- [Apr, 2024] Using [**Ollama**](https://github.com/ollama/ollama) to deploy quantized LLMs on CPUs and GPUs: [**example**](./llm/ollama/)
- [Feb, 2024] Deploying and scaling [**Gemma**](https://blog.google/technology/developers/gemma-open-models/) with SkyServe: [**example**](./llm/gemma/)
- [Feb, 2024] Serving [**Code Llama 70B**](https://ai.meta.com/blog/code-llama-large-language-model-coding/) with vLLM and SkyServe: [**example**](./llm/codellama/)
- [Dec, 2023] [**Mixtral 8x7B**](https://mistral.ai/news/mixtral-of-experts/), a high quality sparse mixture-of-experts model, was released by Mistral AI! Deploy via SkyPilot on any cloud: [**example**](./llm/mixtral/)
- [Nov, 2023] Using [**Axolotl**](https://github.com/OpenAccess-AI-Collective/axolotl) to finetune Mistral 7B on the cloud (on-demand and spot): [**example**](./llm/axolotl/)
- [Sep, 2023] Case study: [**Covariant**](https://covariant.ai/) transformed AI development on the cloud using SkyPilot, delivering models 4x faster cost-effectively: [**read the case study**](https://blog.skypilot.co/covariant/)
- [Aug, 2023] **Finetuning Cookbook**: Finetuning Llama 2 in your own cloud environment, privately: [**example**](./llm/vicuna-llama-2/), [**blog post**](https://blog.skypilot.co/finetuning-llama2-operational-guide/)
- [June, 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/)

<details>
  <summary>Archived</summary>

- [Mar, 2024] Serve and deploy [**Databricks DBRX**](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) on your infra: [**example**](./llm/dbrx/)
- [Feb, 2024] Speed up your LLM deployments with [**SGLang**](https://github.com/sgl-project/sglang) for 5x throughput on SkyServe: [**example**](./llm/sglang/)
- [Dec, 2023] Using [**LoRAX**](https://github.com/predibase/lorax) to serve 1000s of finetuned LLMs on a single instance in the cloud: [**example**](./llm/lorax/)
- [Sep, 2023] [**Mistral 7B**](https://mistral.ai/news/announcing-mistral-7b/), a high-quality open LLM, was released! Deploy via SkyPilot on any cloud: [**Mistral docs**](https://docs.mistral.ai/self-deployment/skypilot)
- [July, 2023] Self-Hosted **Llama-2 Chatbot** on Any Cloud: [**example**](./llm/llama-2/)
- [April, 2023] [SkyPilot YAMLs](./llm/vicuna/) for finetuning & serving the [Vicuna LLM](https://lmsys.org/blog/2023-03-30-vicuna/) with a single command!

</details>

----

SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution.

SkyPilot **abstracts away cloud infra burdens**:
- Launch jobs & clusters on any cloud
- Easy scale-out: queue and run many jobs, automatically managed
- Easy access to object stores (S3, GCS, Azure, R2, IBM)

SkyPilot **maximizes GPU availability for your jobs**:
* Provision in all zones/regions/clouds you have access to ([the _Sky_](https://arxiv.org/abs/2205.07147)), with automatic failover

SkyPilot **cuts your cloud costs**:
* [Managed Spot](https://skypilot.readthedocs.io/en/latest/examples/spot-jobs.html): 3-6x cost savings using spot VMs, with auto-recovery from preemptions
* [Optimizer](https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html): 2x cost savings by auto-picking the cheapest VM/zone/region/cloud
* [Autostop](https://skypilot.readthedocs.io/en/latest/reference/auto-stop.html): hands-free cleanup of idle clusters

SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes.

Install with pip:
```bash
pip install -U "skypilot[aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,kubernetes]"  # choose your clouds
```
To get the latest features and fixes, use the nightly build or [install from source](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html):
```bash
pip install "skypilot-nightly[aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,kubernetes]"  # choose your clouds
```

Current supported providers (AWS, Azure, GCP, OCI, Lambda Cloud, RunPod, Fluidstack, Paperspace, Cudo, IBM, Samsung, Cloudflare, any Kubernetes cluster):
<p align="center">
  <img alt="SkyPilot" src="https://raw.githubusercontent.com/skypilot-org/skypilot/master/docs/source/images/cloud-logos-light.png" width=85%>
</p>


## Getting Started
You can find our documentation [here](https://skypilot.readthedocs.io/en/latest/).
- [Installation](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)
- [Quickstart](https://skypilot.readthedocs.io/en/latest/getting-started/quickstart.html)
- [CLI reference](https://skypilot.readthedocs.io/en/latest/reference/cli.html)

## SkyPilot in 1 Minute

A SkyPilot task specifies: resource requirements, data to be synced, setup commands, and the task commands.

Once written in this [**unified interface**](https://skypilot.readthedocs.io/en/latest/reference/yaml-spec.html) (YAML or Python API), the task can be launched on any available cloud.  This avoids vendor lock-in, and allows easily moving jobs to a different provider.

Paste the following into a file `my_task.yaml`:

```yaml
resources:
  accelerators: V100:1  # 1x NVIDIA V100 GPU

num_nodes: 1  # Number of VMs to launch

# Working directory (optional) containing the project codebase.
# Its contents are synced to ~/sky_workdir/ on the cluster.
workdir: ~/torch_examples

# Commands to be run before executing the job.
# Typical use: pip install -r requirements.txt, git clone, etc.
setup: |
  pip install "torch<2.2" torchvision --index-url https://download.pytorch.org/whl/cu121

# Commands to run as a job.
# Typical use: launch the main program.
run: |
  cd mnist
  python main.py --epochs 1
```

Prepare the workdir by cloning:
```bash
git clone https://github.com/pytorch/examples.git ~/torch_examples
```

Launch with `sky launch` (note: [access to GPU instances](https://skypilot.readthedocs.io/en/latest/cloud-setup/quota.html) is needed for this example):
```bash
sky launch my_task.yaml
```

SkyPilot then performs the heavy-lifting for you, including:
1. Find the lowest priced VM instance type across different clouds
2. Provision the VM, with auto-failover if the cloud returned capacity errors
3. Sync the local `workdir` to the VM
4. Run the task's `setup` commands to prepare the VM for running the task
5. Run the task's `run` commands

<p align="center">
  <img src="https://i.imgur.com/TgamzZ2.gif" alt="SkyPilot Demo"/>
</p>


Refer to [Quickstart](https://skypilot.readthedocs.io/en/latest/getting-started/quickstart.html) to get started with SkyPilot.

## More Information
To learn more, see our [Documentation](https://skypilot.readthedocs.io/en/latest/) and [Tutorials](https://github.com/skypilot-org/skypilot-tutorial).

<!-- Keep this section in sync with index.rst in SkyPilot Docs -->
Runnable examples:
- LLMs on SkyPilot
  - [GPT-2 via `llm.c`](./llm/gpt-2/)
  - [Llama 3](./llm/llama-3/)
  - [Qwen](./llm/qwen/)
  - [Databricks DBRX](./llm/dbrx/)
  - [Gemma](./llm/gemma/)
  - [Mixtral 8x7B](./llm/mixtral/); [Mistral 7B](https://docs.mistral.ai/self-deployment/skypilot/) (from official Mistral team)
  - [Code Llama](./llm/codellama/)
  - [vLLM: Serving LLM 24x Faster On the Cloud](./llm/vllm/) (from official vLLM team)
  - [SGLang: Fast and Expressive LLM Serving On the Cloud](./llm/sglang/) (from official SGLang team)
  - [Vicuna chatbots: Training & Serving](./llm/vicuna/) (from official Vicuna team)
  - [Train your own Vicuna on Llama-2](./llm/vicuna-llama-2/)
  - [Self-Hosted Llama-2 Chatbot](./llm/llama-2/)
  - [Ollama: Quantized LLMs on CPUs](./llm/ollama/)
  - [LoRAX](./llm/lorax/)
  - [QLoRA](https://github.com/artidoro/qlora/pull/132)
  - [LLaMA-LoRA-Tuner](https://github.com/zetavg/LLaMA-LoRA-Tuner#run-on-a-cloud-service-via-skypilot)
  - [Tabby: Self-hosted AI coding assistant](https://github.com/TabbyML/tabby/blob/bed723fcedb44a6b867ce22a7b1f03d2f3531c1e/experimental/eval/skypilot.yaml)
  - [LocalGPT](./llm/localgpt)
  - [Falcon](./llm/falcon)
  - Add yours here & see more in [`llm/`](./llm)!
- Framework examples: [PyTorch DDP](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml), [DeepSpeed](./examples/deepspeed-multinode/sky.yaml), [JAX/Flax on TPU](https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml), [Stable Diffusion](https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion), [Detectron2](https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml), [Distributed](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py) [TensorFlow](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml), [Ray Train](examples/distributed_ray_train/ray_train.yaml), [NeMo](https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo.yaml), [programmatic grid search](https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py), [Docker](https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml), [Cog](https://github.com/skypilot-org/skypilot/blob/master/examples/cog/), [Unsloth](https://github.com/skypilot-org/skypilot/blob/master/examples/unsloth/unsloth.yaml), [Ollama](https://github.com/skypilot-org/skypilot/blob/master/llm/ollama), [llm.c](https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2) and [many more (`examples/`)](./examples).

Follow updates:
- [Twitter](https://twitter.com/skypilot_org)
- [Slack](http://slack.skypilot.co)
- [SkyPilot Blog](https://blog.skypilot.co/) ([Introductory blog post](https://blog.skypilot.co/introducing-skypilot/))

Read the research:
- [SkyPilot paper](https://www.usenix.org/system/files/nsdi23-yang-zongheng.pdf) and [talk](https://www.usenix.org/conference/nsdi23/presentation/yang-zongheng) (NSDI 2023)
- [Sky Computing whitepaper](https://arxiv.org/abs/2205.07147)
- [Sky Computing vision paper](https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s02-stoica.pdf) (HotOS 2021)
- [Policy for Managed Spot Jobs](https://www.usenix.org/conference/nsdi24/presentation/wu-zhanghao)  (NSDI 2024)

## Support and Questions
We are excited to hear your feedback!
* For issues and feature requests, please [open a GitHub issue](https://github.com/skypilot-org/skypilot/issues/new).
* For questions, please use [GitHub Discussions](https://github.com/skypilot-org/skypilot/discussions).

For general discussions, join us on the [SkyPilot Slack](http://slack.skypilot.co).

## Contributing
We welcome and value all contributions to the project! Please refer to [CONTRIBUTING](CONTRIBUTING.md) for how to get involved.
