Computing with the Planetary Computer¶
The core components of the Planetary Computer are the datasets and APIs for querying them. This document provides an overview of the various ways you can compute on data hosted by the Planetary Computer.
Regardless of how you compute on the data, to ensure maximum efficiency you should locate your compute as close to the data as possible. Most of the Planetary Computer Data Catalog is hosted in Azure’s West Europe region, so your compute should be there too.
Use your own compute¶
For production workloads, we recommend deploying your own compute, which gives you more control over the hardware and software environment.
See Scaling with Dask for an introduction to Dask. This setup was pioneered by the Pangeo Community. The Pangeo Cloud documention provides additional background on how to use Dask-enabled JupyterHubs.
Using Azure Machine Learning¶
If you have an existing Azure Machine Learning workspace, you can use it to access data and APIs hosted by the Planetary Computer. Here we show accessing the Planetary Computer’s Metadata API from Azure Machine Learning Studio.
Under this scenario, we’re using Azure Machine Learning Studio to connect to a virtual machine running in Azure. That virtual machine has a high-bandwidth connection to the Planetary Computer Data and Metadata APIs.
Using Dask Cloud Provider¶
Users with requiring specialized software environments or a lot of compute can use their own resources to access the Planetary Computer’s Data and Metadata APIs.
In this example, we use Dask Cloud Provider
to create a Dask cluster with just an Azure subscription. After following the setup instructions at https://cloudprovider.dask.org/en/latest/azure.html, you can create your cluster:
>>> from dask_cloudprovider.azure import AzureVMCluster
>>> cluster = AzureVMCluster(resource_group="<resource group>",
... vnet="<vnet>",
... security_group="<security group>",
... n_workers=1)
Creating scheduler instance
Assigned public IP
Network interface ready
Creating VM
Created VM dask-5648cc8b-scheduler
Waiting for scheduler to run
Scheduler is running
Creating worker instance
Network interface ready
Creating VM
Created VM dask-5648cc8b-worker-e1ebfc0e
and connect to it
>>> from dask.distributed import Client
>>> client = Client(cluster)
Like the previous setup, the Dask scheduler and workers are running in Azure near the data. The local client might be outside of Azure.