Documentation >

Cloudless Mosaic

This tutorial constructs a cloudless mosaic (also known as a composite) from a time series of satellite images. We’ll see the following:

  • Find a time series of images at a particular point on Earth

  • Stack those images together into a single array

  • Compute the cloudless mosaic by taking a median

  • Visualize the results

This example uses Sentinel-2 Level-2A data. The techniques used here apply equally well to other remote-sensing datasets.

[1]:
import numpy as np
import xarray as xr

import rasterio.features
import stackstac
import pystac_client
import planetary_computer

import xrspatial.multispectral as ms

from dask_gateway import GatewayCluster

Create a Dask cluster

We’re going to process a large amount of data. To cut down on the execution time, we’ll use a Dask cluster to do the computation in parallel, adaptively scaling to add and remove workers as needed. See Scale With Dask for more on using Dask.

[2]:
cluster = GatewayCluster()  # Creates the Dask Scheduler. Might take a minute.

client = cluster.get_client()

cluster.adapt(minimum=4, maximum=24)
print(cluster.dashboard_link)
https://pcc-staging.westeurope.cloudapp.azure.com/compute/services/dask-gateway/clusters/staging.5d235feb3f08434baf1868f21aeb3e99/status

Discover data

In this example, we define our area of interest as a GeoJSON object. It’s near Redmond, Washington.

[3]:
area_of_interest = {
    "type": "Polygon",
    "coordinates": [
        [
            [-122.27508544921875, 47.54687159892238],
            [-121.96128845214844, 47.54687159892238],
            [-121.96128845214844, 47.745787772920934],
            [-122.27508544921875, 47.745787772920934],
            [-122.27508544921875, 47.54687159892238],
        ]
    ],
}
bbox = rasterio.features.bounds(area_of_interest)

Using pystac_client we can search the Planetary Computer’s STAC endpoint for items matching our query parameters.

[4]:
stac = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

search = stac.search(
    bbox=bbox,
    datetime="2016-01-01/2020-12-31",
    collections=["sentinel-2-l2a"],
    query={"eo:cloud_cover": {"lt": 25}},
)

items = search.item_collection()
print(len(items))
138

So 138 items match our search requirements, over space, time, and cloudiness. Those items will still have some clouds over portions of the scenes, though. To create our cloudless mosaic, we’ll load the data into an xarray DataArray using stackstac and then reduce the time-series of images down to a single image.

[5]:
data = (
    stackstac.stack(
        items,
        assets=["B04", "B03", "B02"],  # red, green, blue
        chunksize=4096,
        resolution=100,
    )
    .where(lambda x: x > 0, other=np.nan)  # sentinel-2 uses 0 as nodata
    .assign_coords(band=lambda x: x.common_name.rename("band"))  # use common names
)
data
[5]:
<xarray.DataArray 'stackstac-6f1865b36c5ddc90ada70893ec944f5f' (time: 138,
                                                                band: 3,
                                                                y: 1099, x: 1099)>
dask.array<where, shape=(138, 3, 1099, 1099), dtype=float64, chunksize=(1, 1, 1099, 1099), chunktype=numpy.ndarray>
Coordinates: (12/46)
  * time                                     (time) datetime64[ns] 2016-02-08...
    id                                       (time) <U54 'S2A_MSIL2A_20160208...
  * band                                     (band) <U5 'red' 'green' 'blue'
  * x                                        (x) float64 4.999e+05 ... 6.097e+05
  * y                                        (y) float64 5.3e+06 ... 5.19e+06
    s2:processing_baseline                   (time) <U5 '03.00' ... '02.12'
    ...                                       ...
    proj:bbox                                object {5190240.0, 609780.0, 499...
    gsd                                      float64 10.0
    common_name                              (band) <U5 'red' 'green' 'blue'
    center_wavelength                        (band) float64 0.665 0.56 0.49
    full_width_half_max                      (band) float64 0.038 0.045 0.098
    epsg                                     int64 32610
Attributes:
    spec:        RasterSpec(epsg=32610, bounds=(499900, 5190200, 609800, 5300...
    crs:         epsg:32610
    transform:   | 100.00, 0.00, 499900.00|\n| 0.00,-100.00, 5300100.00|\n| 0...
    resolution:  100

Since the data matching our query isn’t too large we can persist it in distributed memory. Once in memory, subsequent operations will be much faster.

[6]:
data = data.persist()

Median composite

Using normal xarray operations, we can compute the median over the time dimension. Under the assumption that clouds are transient, the composite shouldn’t contain (many) clouds, since they shouldn’t be the median pixel value at that point over many images.

This will be computed in parallel on the cluster (make sure to open the Dask Dashboard using the link printed out above).

[7]:
median = data.median(dim="time").compute()

To visualize the data, we’ll use xarray-spatial’s true_color method to convert to red/green/blue values.

[8]:
image = ms.true_color(*median)  # expects red, green, blue DataArrays
[9]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(8, 8))

ax.set_axis_off()
image.plot.imshow(ax=ax);

Monthly composite

Now suppose we don’t want to combine images from different parts of the year (for example, we might not want to combine images from January that often include snow with images from July). Again using standard xarray syntax, we can create set of per-month composites by grouping by month and then taking the median.

[10]:
monthly = data.groupby("time.month").median().compute()

Let’s convert each of those arrays to a true-color image and plot the results as a grid.

[11]:
images = [ms.true_color(*x) for x in monthly]
images = xr.concat(images, dim="time")

g = images.plot.imshow(x="x", y="y", rgb="band", col="time", col_wrap=3, figsize=(6, 8))
for ax in g.axes.flat:
    ax.set_axis_off()

plt.tight_layout()

Learn more

To learn more about using the the Planetary Computer’s STAC API, see Reading data from the STAC API. To learn more about Dask, see Scaling with Dask.