Documentation >

Deploy your own JupyterHub

Deploying your own JupyterHub is a good option for a team of users looking to work with data from the Planetary Computer who need a specialized environment, require additional computation resources, or want to tie a compute environmnt into a broader Azure deployment while still using data from the Planetary Computer.

In this guide you will:

  • Deploy an AKS cluster using the Azure CLI

  • Deploy JupyterHub and Dask Gateway using the daskhub Helm chart.

We describe two deployment scenarios, a simple and a recommended deployment. If you’re new to Azure, Kubernetes, or JupyterHub, then you should try the simple deployment to verify that the basics work, before moving on to the more advanced recommended deployment. Finally, the configuration for the Planetary Computer based JupyterHub is available on GitHub, which provides a reference for a real-world deployment.

For background, we recommend reading the Zero to JupyterHub with Kubernetes guide and the Dask Gateway on Kubernetes documentation.

Note

Make sure to use the westeurope region for your Azure resources. This will place your compute in the same region as the Planetary Computer’s data.

Prerequisites

We’ll assume that you’ve completed the prerequisites for creating an AKS cluster. This includes:

Simple deployment

This section walks through the simplest possible deployment, but lacks basic features like authentication, HTTPS, and a user-friendly DNS name. We recommend trying this deployment to ensure that the tools work, before deleting things and moving on to the advanced deployment.

Kubernetes cluster

Following the Kubernetes quickstart, we’ll use the Azure CLI to create an AKS cluster.

For ease of reading we’ll repeat the steps here, but visit the guide to build understanding about what each command does. We’ll delete this simple deployment shortly, so we recommend creating a brand-new resource group to make cleaning up easy.

# Create a Resource group
$ az group create --name pangeo --location westeurope
{
 "id": "/subscriptions/<guid>/resourceGroups/pangeo",
 "location": "westeurope",
 "managedBy": null,
 "name": "pangeo",
 "properties": {
   "provisioningState": "Succeeded"
 },
 "tags": null
}

# Create an AKS cluster
$ az aks create --resource-group pangeo --name pangeoCluster --generate-ssh-keys \
 --node-count=1 --enable-cluster-autoscaler --min-count=1 --max-count=5

# Get credentials for kubectl / helm
$ az aks get-credentials --name pangeoCluster --resource-group pangeo

Notice that we use --location westeurope to ensure the compute nodes are in the same Azure region as the Planetary Computer’s data.

At this point, you should have a Kubernetes cluster up and running. Verify that things are are working OK with kubectl.

$ kubectl get node
NAME                                STATUS   ROLES  AGE   VERSION
aks-nodepool1-26963941-vmss000000   Ready    agent   1m   v1.19.11

JupyterHub and Dask Gateway

Now we’re ready to install JupyterHub and Dask Gateway on our AKS cluster using the daskhub Helm chart. Visit the documentation at https://github.com/dask/helm-chart/tree/main/daskhub for more background about the chart, and https://helm.sh/ for information about Helm.

Download or update the daskhub Helm chart

$ helm repo add dask https://helm.dask.org
$ helm repo update

Generate a secret token

Dask Gateway needs a token to authenticate with JupyterHub.

$ openssl rand -hex 32
<secret token>

Install daskhub

We’ll install daskhub into a new dhub namespace, but you can use whatever namespace you like. Make sure to substitute your <secret token> from earlier. You can create a configuration file to pass to helm.

# file: config.yaml
jupyterhub:
  hub:
    # Needed for dask-gateway<=0.9.0. https://github.com/dask/helm-chart/issues/142
    networkPolicy:
      enabled: false
  
    services:
      dask-gateway:
        apiToken: "<secret token from above>"
   
dask-gateway:
  gateway:
    auth:
      jupyterhub:
        apiToken: "<secret token from above>"
    backend:
      image:
        name: pangeo/base-notebook
        tag: 2021.06.05

Now we can deploy JupyterHub and Dask Gateway with Helm.

$ helm upgrade --wait --install --create-namespace \
      dask dask/daskhub \
      --namespace dhub \
      --values config.yaml

Release "dask" does not exist. Installing it now.
NAME: dask
LAST DEPLOYED: Fri Jun  4 14:21:33 2021
NAMESPACE: dhub
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
DaskHub
-------

Thank you for installing DaskHub, a multiuser, Dask-enabled JupyterHub!

Your release is named dask and installed into the namespace dhub.


Jupyter Hub
-----------

You can find if the hub and proxy is ready by doing:

kubectl --namespace=dhub get pod

and watching for both those pods to be in status 'Ready'.

You can find the public IP of the JupyterHub by doing:

kubectl --namespace=dhub get svc proxy-public

It might take a few minutes for it to appear!

The instructions printed above show how to get the IP address of your hub.

Warning

This simple deployment doesn’t have any kind of authentication. See the recommended deployment for instructions on creating a deployment with authentication.

When you log in (using any username and password, since we don’t have authentication yet) and start a notebook sever, you should be able to connect to the Dask Gateway server and create a cluster.

>>> from dask_gateway import Gateway
>>> gateway = Gateway()
>>> gateway.list_clusters()
[]
>>> cluster = gateway.new_cluster()
>>> client = cluster.get_client()
>>> cluster.scale(1)

After a moment, the Dask scheduler and worker pods should start up. Check the pods with kubectl -n dhub get pods.

Cleanup

The easiest way to clean up the resources is to delete the resource group.

$ az group delete -n pangeo

Azure resources

In this section we’ll use the Azure CLI to create a resource group and AKS cluster, and the Azure Portal to create an app registration. This is a nice way to build familiarity with the underlying services, but you might consider using a tool like Terraform and the Azure Provider for Terraform to configure your Azure resources.

Create a resource group

# Create a resource group
$ az group create --name pangeo --location westeurope
{
 "id": "/subscriptions/<subscriptionId>/resourceGroups/pangeo",
 "location": "westeurope",
 "managedBy": null,
 "name": "pangeo",
 "properties": {
   "provisioningState": "Succeeded"
 },
 "tags": null,
}

Create an app registration

To authenticate users, we’ll create app registration for the Microsoft Identity Platform in the Azure Portal following these instructions. In this example, the sign-in audience will be accounts in this organizational directory only. This is appropriate when you are administering a Hub for other users within your tenant. By default, all users with a directory will be able to log into your Hub. You can manage access using Azure Active Directory groups.

When creating a new app registration, you’ll be asked for a redirect URI. This URI should match where your users will access the Hub. If your organization already has a DNS provider, use that. Alternatively, you can have Azure handle the DNS for your Hub service automatically, which is what we’ll use in this guide. We’re calling our cluster pangeo-hub and deploying it in West Europe, so the callback URL is https://pangeo-hub.westeurope.cloudapp.azure.com/hub/oauth_callback. In general the pattern is https://<hub-name>.<azure-region>.cloudapp.azure.com/hub/oauth_callback.

If you need to further customize the platform settings, do so under the “Web” platform. The JupyterHub server will be the web server in this context.

Finally, create a client secret to pass to JupyterHub: Under the Manage section, select Certificates and Secrets then New client secret. We’ll use the Value later on. You will also need the app registration’s Client ID and Tenant ID, which are available on the app registration’s main page, under Essentials.

To summarize, we now have our app registration’s:

  • Client ID

  • Tenant ID

  • Client secret

  • OAuth callback URL

For more on authentication see Authentication and Authorization in the JupyterHub documentation, in particular the section on Azure AD.

Create a Kubernetes Cluster

Now we’ll create a Kubernetes cluster. Compared to last time, we’ll have three node pools:

  • A “core” pool for JupyterHub pods (the Hub, etc.) and Kubernetes itself

  • A “user” pool for user pods and Dask schedulers

  • A preemptible “worker” pool for Dask workers

# Create an AKS cluster
$ az aks create --resource-group pangeo --name pangeoCluster --generate-ssh-keys \
 --node-count=1 \
 --nodepool-name core \
 --nodepool-labels hub.jupyter.org/node-purpose=core

# Add a node pool: one for the users and Dask schedulers
$ az aks nodepool add \
   --name users \
   --cluster-name pangeoCluster \
   --resource-group pangeo \
   --enable-cluster-autoscaler \
   --node-count 1 \
   --min-count 0 --max-count 10 \
   --node-vm-size Standard_D2s_v3 \
   --labels hub.jupyter.org/node-purpose=user

# Add a node pool for Dask workers
$ az aks nodepool add \
   --name workers \
   --cluster-name pangeoCluster \
   --resource-group pangeo \
   --enable-cluster-autoscaler \
   --node-count 1 \
   --min-count 0 --max-count 50 \
   --node-vm-size Standard_D2s_v3 \
   --priority Spot \
   --eviction-policy Delete \
   --spot-max-price -1 \
   --labels="k8s.dask.org/dedicated=worker"

At this point, you should have a functioning Kubernetes cluster with multiple node pools. For example:

$ az aks get-credentials \
   --name pangeoCluster \
   --resource-group pangeo \
   --output table

$ kubectl get node
NAME                              STATUS   ROLES   AGE     VERSION
aks-core-26963941-vmss000000      Ready    agent   15m     v1.19.11
aks-users-26963941-vmss000000     Ready    agent   8m19s   v1.19.11
aks-workers-26963941-vmss000000   Ready    agent   3m3s    v1.19.11

Deploy DaskHub

Now that we have our Azure resources in place, we can deploy JupyterHub and Dask Gateway.

Get the Helm chart

Download or update the daskhub helm chart.

$ helm repo add dask https://helm.dask.org
$ helm repo update

Generate a secret token

Dask Gateway needs a token to authenticate with JupyterHub.

$ openssl rand -hex 32
<secret token>

Create a configuration file

This configuration file is used to customize the deployment with Helm. You can start with the reference config file.

Warning

For simplicity, we’ve included all of the configuration values in a single config.yaml file, including sensitive values. We recommend keeping the sensitive values in a separate, encrypted file that’s decrypted just when deploying.

# file: config.yaml 
# This file should not be stored in version control unless it's encrypted.
# Sensitive values are marked a comment "# Sensitive".
# You should update the following fields:
# - [ ] jupyterhub.proxy.https.hosts
# - [ ] jupyterhub.proxy.https.letsencrypt.contactEmail
# - [ ] jupyterhub.proxy.service.annotations.service.beta.kubernetes.io.azure-dns-label-name
# - [ ] jupyterhub.hub.config.AzureAdOauthenticator.oauth_callback_url
# - [ ] jupyterhub.hub.config.AzureAdOauthenticator.client_id
# - [ ] jupyterhub.hub.config.AzureAdOauthenticator.client_secret
# - [ ] jupyterhub.hub.config.AzureAdOauthenticator.tenant_id
# - [ ] jupyterhub.hub.services.dask-gateway.apiToken
# - [ ] dask-gateway.gateway.auth.jupyterhub.apiToken

jupyterhub:
  proxy:
    https:
      enabled: true
      hosts:
        # Update this with your hub's name.
        # If using AKS to get a DNS name, the first section of this hostname must match the value of
        # service.beta.kubernetes.io/azure-dns-label-name.
        - "<hub-name>.<azure-region>.cloudapp.azure.com"
      letsencrypt:
        contactEmail: "<your-email>"
    service:
      annotations:
        # Update this with your hub's name if you're using Azure DNS.
        # The value should should match the first part of "hosts" up above.
        service.beta.kubernetes.io/azure-dns-label-name: "<hub-name>"

  hub:
    # Disable hub network Policy, so that the dask gateway server API can reach the hub directly
    # Not required for dask-gateway>0.9.0
    # https://github.com/dask/helm-chart/issues/142
    networkPolicy:
      enabled: false

    services:
      dask-gateway:
        # output from openssl rand -hex 32. Must match dask-gateway.gateway.auth.jupyterhub.apiToken
        apiToken: "<secret token>"  # Sensitive

    config:
      # Use our Azure AD App Registration for authentication
      # https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app
      JupyterHub:
        authenticator_class: azuread
      AzureAdOAuthenticator:
        # Update this with the URL for your hub with /oauth_callback.
        # If you're using Azure to obtain the DNS name from AKS, this must match the
        oauth_callback_url: "https://<hub-name>.<hub-region>.cloudapp.azure.com/hub/oauth_callback"
        # Update these with the values for your Azure AD app registration.
        client_id: "<adapp-client-id>"
        client_secret: "<adapp-client-secret>"  # Sensitive
        tenant_id: "<adapp-tenant-id>"

  singleuser:
    image:
      # Use the images from https://github.com/microsoft/planetary-computer-containers
      # We recommend pinning an exact tag.
      name: mcr.microsoft.com/planetary-computer/python
      tag: latest

    extraEnv:
      DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: '{JUPYTER_IMAGE_SPEC}'
      DASK_DISTRIBUTED__DASHBOARD_LINK: '/user/{JUPYTERHUB_USER}/proxy/{port}/status'
      DASK_LABEXTENSION__FACTORY__MODULE: 'dask_gateway'
      DASK_LABEXTENSION__FACTORY__CLASS: 'GatewayCluster'

dask-gateway:
  gateway:
    auth:
      jupyterhub:
        # output from openssl rand -hex 32. Must match jupyterhub.services.dask-gateway.apiToken
        apiToken: "<secret token>"  # Sensitive

    backend:
      worker:
        # Ensure workers are scheduled on the worker pool
        extraPodConfig:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: "k8s.dask.org/dedicated"
                    operator: "In"
                    values:
                      - "worker"

          tolerations:
            # allow workers to be scheduled on the worker pool, which has preemptible nodes.
            - key: "k8s.dask.org/dedicated"
              operator: "Equal"
              value: "worker"
              effect: "NoSchedule"
            - key: "k8s.dask.org_dedicated"
              operator: "Equal"
              value: "worker"
              effect: "NoSchedule"
            - key: "kubernetes.azure.com/scalesetpriority"
              operator: "Equal"
              value: "spot"
              effect: "NoSchedule"
              
    extraConfig:
      01-optionHandler: |
          # Configure options to
          # 1. Have the default worker image match the singleuser image
          # 2. Place bounds on worker CPU and Memory requests
          # 3. Accept a mapping of environment variables to pass to workers.
          from dask_gateway_server.options import Options, Float, String, Mapping
          def cluster_options(user):
              def option_handler(options):
                  if ":" not in options.image:
                      raise ValueError("When specifying an image you must also provide a tag")

                  return {
                      "worker_cores": 0.88 * min(options.worker_cores / 2, 1),
                      "worker_cores_limit": options.worker_cores,
                      "worker_memory": "%fG" % (0.95 * options.worker_memory),
                      "worker_memory_limit": "%fG" % options.worker_memory,
                      "image": options.image,
                      "environment": options.environment,
                  }
              return Options(
                  Float("worker_cores", 1, min=1, max=16, label="Worker Cores"),
                  Float("worker_memory", 2, min=1, max=32, label="Worker Memory (GiB)"),
                  String("image", default="pangeo/pangeo-notebook:latest", label="Image"),
                  Mapping("environment", {}, label="Environment Variables"),
                  handler=option_handler,
              )
          c.Backend.cluster_options = cluster_options

Install daskhub

We’ll install it into a new dhub namespace, but you can use whatever namespace you like.

$ helm upgrade --wait --install --create-namespace \
      dask dask/daskhub \
      --namespace dhub \
      --values config.yaml

Verify that all the pods are running with:

$ kubectl -n dhub get pod
NAME                                           READY   STATUS    RESTARTS   AGE
api-dask-dask-gateway-947887bf9-f748w          1/1     Running   0          18m
autohttps-66bd64d49b-wskqc                     2/2     Running   0          18m
continuous-image-puller-nwq4l                  1/1     Running   0          18m
controller-dask-dask-gateway-ccf4595c8-lx2h7   1/1     Running   0          18m
hub-56d584b5b5-7rxvk                           1/1     Running   0          18m
proxy-5b4bb9b8bb-q8r7x                         1/1     Running   0          18m
traefik-dask-dask-gateway-d9d4cc45c-whmmw      1/1     Running   0          18m
user-scheduler-86c6bc8cd-h6dx2                 1/1     Running   0          18m
user-scheduler-86c6bc8cd-hhhbn                 1/1     Running   0          18m

Note

If you see an HTTPS error accessing the hub, you may need to recreate the autohttps pod created by JupyterHub. The command kubectl -n dhub delete pod -l app=jupyterhub,component=autohttps will recreate the autohttps pod, which should then get a certificate to serve your site over HTTPS.

When you log in and start a notebook sever, you should be able to connect to the Dask Gateway server and create a cluster.

>>> from dask_gateway import Gateway
>>> gateway = Gateway()
>>> gateway.list_clusters()
[]
>>> cluster = gateway.new_cluster()
>>> client = cluster.get_client()
>>> cluster.scale(1)

After a moment, the Dask scheduler and worker pods should start up. Check the pods with kubectl -n dhub get pods.

Cleanup

The easiest way to clean up the resources is to delete the resource group.

$ az group delete -n pangeo

Next steps

Your AKS cluster and JupyterHub deployments can be customized in various ways. Visit the the Azure Kubernetes Service overview for more on AKS, Zero to JupyterHub with Kubernetes documentation for more on JupyterHub and the JupyterHub helm chart, and Dask Gateway for more on Dask Gateway.