041 GDAL: time series

Purpose

In this section, we'll continue to look at the MODIS LAI, with a view to forming a time series dataset. At the end of this session, you will be able to generate a 3D numpy array of some MODIS geophysical variable for a selcted area and time.

Prerequisites

You must make sure you can recall the details of the work covered in 040_GDAL_mosaicing_and_masking. You will also need to know how to do graph plotting, including sub-figures and errorbars, and image display.

Test

You should run a NASA account test if you have not already done so.

Introduction

Recall that we can use getModis from geog0111.modisUtils as a simple interface to download and stitch MODIS data.

For example:

from geog0111.modisUtils import getModis

warp_args = {
    'dstNodata'     : 255,
    'format'        : 'MEM',
    'cropToCutline' : True,
    'cutlineWhere'  : "FIPS='UK'",
    'cutlineDSName' : 'data/TM_WORLD_BORDERS-0.3.shp'
}

kwargs = {
    'tile'      :    ['h17v03','h18v03','h17v04','h18v04'],
    'product'   :    'MCD15A3H',
    'sds'       :    'Lai_500m',
    'doys'       : [1,5,41],
    'year'      : 2019,
    'warp_args' : warp_args
}

datafiles,bnames = getModis(verbose=False,timeout=10000,**kwargs)

print(datafiles)
print(bnames)

['work/stitch_Lai_500m_2019_001_Tiles_h17v03_h18v03_h17v04_h18v04_Selektor_FIPS_UK_warp.vrt', 'work/stitch_Lai_500m_2019_005_Tiles_h17v03_h18v03_h17v04_h18v04_Selektor_FIPS_UK_warp.vrt', 'work/stitch_Lai_500m_2019_041_Tiles_h17v03_h18v03_h17v04_h18v04_Selektor_FIPS_UK_warp.vrt']
['2019-001', '2019-005', '2019-041']

Timeseries

We can conveniently generate a timeseries dataset using the gdal VRT file approach we used in 040_GDAL_mosaicing_and_masking.

We can directly use the output from getModis for this.

We use gdal.BuildVRT() as we have previously, specifying the output name work/stitch_set.vrt here, and the list of SDS that goes in to the time series,datafiles here. The one difference is that we set separate=True for a time series (or for multiple bands) so the data aren't merged spatially.

from osgeo import gdal
import numpy as np

# build a VRT "work/stitch_set.vrt"
stitch_vrt = gdal.BuildVRT("work/stitch_set.vrt", datafiles,separate=True)
del stitch_vrt
# test it by reading and plotting
g = gdal.Open("work/stitch_set.vrt")

# apply scaling and read
data = g.ReadAsArray() * 0.1
print(data.shape,data.dtype)

(3, 2623, 1394) float64

The dataset is now 3D. The first dimensions represent the time samples, so:

data[0] -> bnames[0]

etc.

import numpy as np
for i in range(data.shape[0]):
    print(f'band {bnames[i]} -> mean {np.mean(data[i])}')

band 2019-001 -> mean 18.3917659748686
band 2019-005 -> mean 18.39242177274097
band 2019-041 -> mean 18.313179488806387

These are strange average LAI numbers. Why would that be?

Exercise 1

We have seen in 040_GDAL_mosaicing_and_masking that you can use gdal to creat a GeoTiff format image, for example with:

g = gdal.Warp(output_name, input_name ,format='GTiff',options=['COMPRESS=LZW'])
g.FlushCache()

Convert the gdal file work/stitch_set.vrt to a more portable GeoTiff file called work/stitch_set.tif
Confirm that this has worked by reading and displaying data from the file

The reason for the 'funny' averages above was because the data is not masked: it includes invalid numbers (e.g. 255) where there are no data.

Now let's mask out invalid data points (> 10.0 when scaled by 0.1), and display the results:

# valid mask
data[data>10.0] = np.nan
print(data.shape)

(3, 2623, 1394)

from osgeo import gdal
import matplotlib.pyplot as plt

fig, axs = plt.subplots(1,3,figsize=(13,5))
axs = axs.flatten()

for i in range(data.shape[0]):
    im = axs[i].imshow(data[i],vmax=7,\
                cmap=plt.cm.inferno_r,interpolation='nearest')
    fig.colorbar(im, ax=axs[i])
    axs[i].set_title(bnames[i])

png

A year of data

To get a year of LAI data for Luxembourg, we can specify doys = [i for i in range(1,366,4)]. Note that we could have chosen any location, but we select a small country to make the running more feasible in a practical session.

If the data are already downloaded into the local cache, it should not take too long to form the time series. It will need to generate the VRT files for how every many files and tile you have requested, so that may take some tens of minutes, even if the HDF files are already generated.

If you are attempting to get data not already in the cache, it will take some considerable time to download these datasets for whole years, for multiple tiles.

If you want to download a dataset that is not covered in these notebooks, you are of course welcome to do so, but plan your work ahead of time, and try to pre-download the data before attempting any processing. In such a case, you should take some code such as that below and paste it into a Python file and run that as a Python script from a command line. You might make sure you set:

'verbose' : True

in the kwargs dictionary if you want some feedback about any long-running processes.

from geog0111.modisUtils import getModis

warp_args = {
    'dstNodata'     : 255,
    'format'        : 'MEM',
    'cropToCutline' : True,
    'cutlineWhere'  : "FIPS='LU'",
    'cutlineDSName' : 'data/TM_WORLD_BORDERS-0.3.shp'
}

kwargs = {
    'tile'      :    ['h18v03','h18v04'],
    'product'   :    'MCD15A3H',
    'sds'       :    'Lai_500m',
    'doys'      : np.arange(1,366,4),
    'year'      : 2019,
    'warp_args' : warp_args
}

datafiles,bnames = getModis(verbose=False,timeout=10000,**kwargs)

# check the files
print(f'filenames of the form: {datafiles[0]}\n')
print(bnames)

filenames of the form: work/stitch_Lai_500m_2019_001_Tiles_h18v03_h18v04_Selektor_FIPS_LU_warp.vrt

['2019-001', '2019-005', '2019-009', '2019-013', '2019-017', '2019-021', '2019-025', '2019-029', '2019-033', '2019-037', '2019-041', '2019-045', '2019-049', '2019-053', '2019-057', '2019-061', '2019-065', '2019-069', '2019-073', '2019-077', '2019-081', '2019-085', '2019-089', '2019-093', '2019-097', '2019-101', '2019-105', '2019-109', '2019-113', '2019-117', '2019-121', '2019-125', '2019-129', '2019-133', '2019-137', '2019-141', '2019-145', '2019-149', '2019-153', '2019-157', '2019-161', '2019-165', '2019-169', '2019-173', '2019-177', '2019-181', '2019-185', '2019-189', '2019-193', '2019-197', '2019-201', '2019-205', '2019-209', '2019-213', '2019-217', '2019-221', '2019-225', '2019-229', '2019-233', '2019-237', '2019-241', '2019-245', '2019-249', '2019-253', '2019-257', '2019-261', '2019-265', '2019-269', '2019-273', '2019-277', '2019-281', '2019-285', '2019-289', '2019-293', '2019-297', '2019-301', '2019-305', '2019-309', '2019-313', '2019-317', '2019-321', '2019-325', '2019-329', '2019-333', '2019-337', '2019-341', '2019-345', '2019-349', '2019-353', '2019-357', '2019-361', '2019-365']

We can build a VRT dataset as before:

stitch_vrt = gdal.BuildVRT("work/stitch_time.vrt", datafiles,separate=True)
del stitch_vrt

And read the data into a numpy array:

from osgeo import gdal

g = gdal.Open("work/stitch_time.vrt")
data = g.ReadAsArray() * 0.1

We can now plot the data on sub-plots:

import matplotlib.pyplot as plt

shape=(8,12)
x_size,y_size=(30,20)

fig, axs = plt.subplots(*shape,figsize=(x_size,y_size))
axs = axs.flatten()
plt.setp(axs, xticks=[], yticks=[])

for i in range(data.shape[0]):
    im = axs[i].imshow(data[i],vmax=7,cmap=plt.cm.inferno_r,\
                       interpolation='nearest')
    axs[i].set_title(bnames[i])
    fig.colorbar(im, ax=axs[i])

png

Plotting time series

We might now want to plot some time series. We will need to know the day of year associated with each spatial dataset in the time series. We could just attempt to specify doys = [i for i in range(1,366,4)] as above. But that might not be robust (e.g. if no data were available for some particular day).

It is better to try to develop an internally consistent dataset. The doy information associated with each spatial dataset in the time series is encoded in the band name strings we examined above, such as 2019-001.

First, extract the doy from bnames:

print(bnames)

['2019-001', '2019-005', '2019-009', '2019-013', '2019-017', '2019-021', '2019-025', '2019-029', '2019-033', '2019-037', '2019-041', '2019-045', '2019-049', '2019-053', '2019-057', '2019-061', '2019-065', '2019-069', '2019-073', '2019-077', '2019-081', '2019-085', '2019-089', '2019-093', '2019-097', '2019-101', '2019-105', '2019-109', '2019-113', '2019-117', '2019-121', '2019-125', '2019-129', '2019-133', '2019-137', '2019-141', '2019-145', '2019-149', '2019-153', '2019-157', '2019-161', '2019-165', '2019-169', '2019-173', '2019-177', '2019-181', '2019-185', '2019-189', '2019-193', '2019-197', '2019-201', '2019-205', '2019-209', '2019-213', '2019-217', '2019-221', '2019-225', '2019-229', '2019-233', '2019-237', '2019-241', '2019-245', '2019-249', '2019-253', '2019-257', '2019-261', '2019-265', '2019-269', '2019-273', '2019-277', '2019-281', '2019-285', '2019-289', '2019-293', '2019-297', '2019-301', '2019-305', '2019-309', '2019-313', '2019-317', '2019-321', '2019-325', '2019-329', '2019-333', '2019-337', '2019-341', '2019-345', '2019-349', '2019-353', '2019-357', '2019-361', '2019-365']

we can try to extract the information from these strings then. Let's build up to that, by considering a single example string, and work out how to deal with that. Once we know that, we can build a loop around it (e.g. in a list comprehension like we had for [i for i in range(1,366,4)].

# convert to 1 by splitting the string
test = '2019-001'
int(test.split('-')[1])

doy = [int(i.split('-')[1]) for i in bnames]
print(doy)

[1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61, 65, 69, 73, 77, 81, 85, 89, 93, 97, 101, 105, 109, 113, 117, 121, 125, 129, 133, 137, 141, 145, 149, 153, 157, 161, 165, 169, 173, 177, 181, 185, 189, 193, 197, 201, 205, 209, 213, 217, 221, 225, 229, 233, 237, 241, 245, 249, 253, 257, 261, 265, 269, 273, 277, 281, 285, 289, 293, 297, 301, 305, 309, 313, 317, 321, 325, 329, 333, 337, 341, 345, 349, 353, 357, 361, 365]

It is interesting to see a set of sub-plots for a range of pixel locations. We can follow tghe approach we have taken to sub-plots previously, but now we want to make use of the 2D nature of the sub-plot axs variable (recall that previously we have flattened this to a 1D representation).

We will define a sub-plot shape: shape=(10,10) and a starting pixel pixel = (100,70). We will then loop in row and column to produce the sub-plots:

import matplotlib.pyplot as plt

x_size,y_size=(20,20)

shape=(10,10)
fig, axs = plt.subplots(*shape,figsize=(x_size,y_size))
plt.setp(axs, xticks=[], yticks=[])

pixel = (100,70)
x = doy

for i in range(shape[0]):
    p0 = pixel[0] + i
    for j in range(shape[1]):
        p1 = pixel[1] + j
        im = axs[i,j].plot(x,data[:,p0,p1])
        axs[i,j].set_title(f'{p0} {p1}')
        # ensure the same scale for all
        axs[i,j].set_ylim(0,7)

png

Exercise 2

Produce a set of spatial plots of the quantity Fpar_500m over Luxembourg for 2019

Exercise 3

Write a function called modisAnnual(**kwargs) with arguments based on:

warp_args = {
    'dstNodata'     : 255,
    'format'        : 'MEM',
    'cropToCutline' : True,
    'cutlineWhere'  : "FIPS='LU'",
    'cutlineDSName' : 'data/TM_WORLD_BORDERS-0.3.shp'
}

kwargs = {
    'tile'      :    ['h18v03','h18v04'],
    'product'   :    'MCD15A3H',
    'sds'       :    ['Lai_500m', 'Fpar_500m'],
    'doys'      : np.arange(1,366,4),
    'year'      : 2019,
    'warp_args' : warp_args
    'ofile_root': 'work/output_filename_ex3'
}

where sds is a list of SDS

That returns:

bnames  : names for the items in first (time) dimension
odict   : a dictionary with items in sds for keys and the names of associated VRT files as values

Uncertainty and weighting

The full set of SDS available is

['Fpar_500m',
 'Lai_500m',
 'FparLai_QC',
 'FparExtra_QC',
 'FparStdDev_500m',
 'LaiStdDev_500m']

We should always examine any uncertainty information available when trying to interpret a dataset. In this exercise, we will take the uncertainty, defined as a standard deviation, and generate a weight from this in the form:

W  = 1/(std * std)

where std is the uncertainty. Where this weighting is high, we can put more emphasis on the reliability of the data than when it is low.

We make modisAnnual() available from geog0111.modisUtils so we can load multiple SDS datasets into a data dictionary:

from geog0111.modisUtils import modisAnnual

warp_args = {
    'dstNodata'     : 255,
    'format'        : 'MEM',
    'cropToCutline' : True,
    'cutlineWhere'  : "FIPS='LU'",
    'cutlineDSName' : 'data/TM_WORLD_BORDERS-0.3.shp'
}

kwargs = {
    'tile'      :    ['h18v03','h18v04'],
    'product'   :    'MCD15A3H',
    'sds'       :    ['Fpar_500m','Lai_500m','FparLai_QC','FparExtra_QC','FparStdDev_500m','LaiStdDev_500m']
,
    'doys'      : np.arange(1,366,4),
    'year'      : 2019,
    'warp_args' : warp_args
}

# run
odict,bnames = modisAnnual(**kwargs)

# outputs
print(odict,bnames)

{'Fpar_500m': 'work/output_filename_Selektor_FIPS_LU_YEAR_2019_DOYS_1_365_SDS_Fpar_500m.vrt', 'Lai_500m': 'work/output_filename_Selektor_FIPS_LU_YEAR_2019_DOYS_1_365_SDS_Lai_500m.vrt', 'FparLai_QC': 'work/output_filename_Selektor_FIPS_LU_YEAR_2019_DOYS_1_365_SDS_FparLai_QC.vrt', 'FparExtra_QC': 'work/output_filename_Selektor_FIPS_LU_YEAR_2019_DOYS_1_365_SDS_FparExtra_QC.vrt', 'FparStdDev_500m': 'work/output_filename_Selektor_FIPS_LU_YEAR_2019_DOYS_1_365_SDS_FparStdDev_500m.vrt', 'LaiStdDev_500m': 'work/output_filename_Selektor_FIPS_LU_YEAR_2019_DOYS_1_365_SDS_LaiStdDev_500m.vrt'} ['2019-001', '2019-005', '2019-009', '2019-013', '2019-017', '2019-021', '2019-025', '2019-029', '2019-033', '2019-037', '2019-041', '2019-045', '2019-049', '2019-053', '2019-057', '2019-061', '2019-065', '2019-069', '2019-073', '2019-077', '2019-081', '2019-085', '2019-089', '2019-093', '2019-097', '2019-101', '2019-105', '2019-109', '2019-113', '2019-117', '2019-121', '2019-125', '2019-129', '2019-133', '2019-137', '2019-141', '2019-145', '2019-149', '2019-153', '2019-157', '2019-161', '2019-165', '2019-169', '2019-173', '2019-177', '2019-181', '2019-185', '2019-189', '2019-193', '2019-197', '2019-201', '2019-205', '2019-209', '2019-213', '2019-217', '2019-221', '2019-225', '2019-229', '2019-233', '2019-237', '2019-241', '2019-245', '2019-249', '2019-253', '2019-257', '2019-261', '2019-265', '2019-269', '2019-273', '2019-277', '2019-281', '2019-285', '2019-289', '2019-293', '2019-297', '2019-301', '2019-305', '2019-309', '2019-313', '2019-317', '2019-321', '2019-325', '2019-329', '2019-333', '2019-337', '2019-341', '2019-345', '2019-349', '2019-353', '2019-357', '2019-361', '2019-365']

We can read some of these data in now:

from osgeo import gdal

g = gdal.Open(odict['Lai_500m'])
if g:
    Lai_500m = g.ReadAsArray() * 0.1
    print(Lai_500m.shape)

g = gdal.Open(odict['LaiStdDev_500m'])
if g:
    LaiStdDev_500m = g.ReadAsArray() * 0.1
    print(LaiStdDev_500m.shape)

(92, 175, 122)
(92, 175, 122)

There is a feature in the LaiStdDev_500m dataset where some pixels have apparently zero uncertainty. Indeed, all LAI values with an uncertainty under 1.0 seem suspect as individual values. So we treat them hear, to set all values less than 1 to 1.

We know that LAI values over 10 (100 * 0.1) are invalid, so we should make sure these are also weighted zero:

import numpy as np

weight = np.zeros_like(LaiStdDev_500m)
std = LaiStdDev_500m

# fix low values
std[std<1.0] = 1.0

mask = (std > 0)
weight[mask] = 1./(std[mask]**2)

weight[Lai_500m > 10] = 0.
weight[LaiStdDev_500m==0] = 0.
# look at some stats
print(weight.min(),weight.max())

0.0 1.0

We can now go ahead and plot the dataset and the weights. We switch to a greyscale colourmap to make the interpretation of numbers easier.

# produce image plots of the both quantities
import matplotlib.pyplot as plt

shape=(8,12)
x_size,y_size=(30,20)

fig, axs = plt.subplots(*shape,figsize=(x_size,y_size))
axs = axs.flatten()
plt.setp(axs, xticks=[], yticks=[])
fig.suptitle("LAI Luxembourg 2019")
for i in range(Lai_500m.shape[0]):
    im = axs[i].imshow(Lai_500m[i],vmax=7,cmap=plt.cm.gray,\
                       interpolation='nearest')
    axs[i].set_title(bnames[i])
    fig.colorbar(im, ax=axs[i])

png

# Plot the weight
import matplotlib.pyplot as plt

shape=(8,12)
x_size,y_size=(30,20)

fig, axs = plt.subplots(*shape,figsize=(x_size,y_size))
axs = axs.flatten()
plt.setp(axs, xticks=[], yticks=[])
fig.suptitle("LAI weight Luxembourg 2019")

for i in range(Lai_500m.shape[0]):
    im = axs[i].imshow(weight[i],vmax=1,cmap=plt.cm.gray,\
                       interpolation='nearest')
    axs[i].set_title(bnames[i])
    fig.colorbar(im, ax=axs[i])

png

# LAI time series
import matplotlib.pyplot as plt

doy = [int(i.split('-')[1]) for i in bnames]

x_size,y_size=(20,20)

shape=(10,10)
fig, axs = plt.subplots(*shape,figsize=(x_size,y_size))
plt.setp(axs, xticks=[], yticks=[])

pixel = (100,70)
x = doy

for i in range(shape[0]):
    p0 = pixel[0] + i
    for j in range(shape[1]):
        p1 = pixel[1] + j
        im = axs[i,j].plot(x,Lai_500m[:,p0,p1])
        axs[i,j].set_title(f'{p0} {p1}')
        # ensure the same scale for all
        axs[i,j].set_ylim(0,7)

png

# weight time series plots 
import matplotlib.pyplot as plt
doy = [int(i.split('-')[1]) for i in bnames]

x_size,y_size=(20,20)

shape=(10,10)
fig, axs = plt.subplots(*shape,figsize=(x_size,y_size))
plt.setp(axs, xticks=[], yticks=[])

pixel = (100,70)
x = doy

for i in range(shape[0]):
    p0 = pixel[0] + i
    for j in range(shape[1]):
        p1 = pixel[1] + j
        im = axs[i,j].plot(x,weight[:,p0,p1])
        axs[i,j].set_title(f'{p0} {p1}')
        # ensure the same scale for all
        axs[i,j].set_ylim(0,1)

png

We can look at an individual plot now, and use the weights for error bars.

In this case, we want to convert the array weight to an array error, applying the opposite of what we did above to get standard deviation, and multiplying by 1.96.

We want to avoid the division 1/0, so we first build an array the same as weight, but fiulled with zero values:

error = np.zeros_like(weight)

and then only do the division for the mask weight>0:

error[weight>0] = np.sqrt(1./(weight[weight>0] )) * 1.96

import matplotlib.pyplot as plt

error = np.zeros_like(weight)
error[weight>0] = np.sqrt(1./(weight[weight>0] )) * 1.96

doy = [int(i.split('-')[1]) for i in bnames]
p0,p1 = (107,72)
x_size,y_size=(10,5)

shape=(10,10)
fig, axs = plt.subplots(1,1,figsize=(x_size,y_size))

pixel = (100,70)
x = doy

axs.errorbar(x,Lai_500m[:,p0,p1],yerr=error[:,p0,p1])
axs.set_title(f'{p0} {p1}')
# ensure the same scale for all
axs.set_ylim(0,7)
axs.set_xlabel('DOY 2019')
axs.set_ylabel('LAI')

Text(0, 0.5, 'LAI')

png

We might question the validity of some of the LAI uncertainty here: we expect LAI to have a smnooth trajectory in time, but that is clearly not always the case here. However, they provide a traceable estimate of the reliability of each individual measurement and are useful in that respect.

Exercise 3

Write a function getLai that takes as argument:

year : integer year
tile : list of tiles to process
fips : country fips code (e.g. BE for Belgium)

and returns the annual LAI, standard deviation and day of year

test your code for Belgium for 2018 for tiles ['h17v03','h18v03','h17v04','h18v04']
show the shape of the arrays returned

Hint: You may find it useful to use modisAnnual

Summary

We now know how to combine geospatial data in both space and time VRT files using gdal. Remember that we can also do such things using numpy, but if we are able to keep the geospatial information and other metadata in a gdal file, so much the better. We have seen how to write the dataset to a portable file format such as GeoTiff.

We have seen that some datasets such as the MODIS LAI product come with a per-pixel estimate of uncertainty. We have investigated this data field and used it to provide a weighting to associate with the reliability of each pixel. We have seen how to generate, filter, and process 3D spatio-temporal datasets. The examples here

Remember:

from geog0111.modisUtils import modisAnnual
help(modisAnnual)

We have also seen how we can incrementally develop codes to do more complex tasks and wrap up a utility such as getLai in the exercise above to retrieve a ready-to-use dataset.

Last update: December 6, 2021