06. Interpolation - Multiple Linear Regression (+ residual correction)#
In this tutorial, we’ll cover the interpolation of point data using the
Multiple Linear Regression (MLR) methodology and applying a residual
correction, available in PyMica as mlr+id2d
and mlr+id3d
depending on the residual correction interpolation method. This
methodology requires location (lon
and lat
), predictor variables
such as altitude (altitude
) or distance to coast line (among
others), and value to interpolate. If mlr+id3d
is selected,
altitude
must be provided in the variables_files
.
We’ll use Meteorological Service of Catalonia sample data to demonstrate how to apply these interpolation techniques. Therefore, we need to import the required modules. First, we need to load observation data and also the PyMica class.
import json
from pymica.pymica import PyMica
Interpolation mlr+id2d#
Let’s call the PyMica class with the appropriate parameters, setting the
methodology to mlr+id2d
and the configuration dictionary as follows:
config_file = 'sample-data/configuration_sample.json'
with open('sample-data/configuration_sample.json', 'r') as f_p:
config = json.load(f_p)
config['mlr+id2d']
{'id_power': 2.5,
'id_smoothing': 0.0,
'clusters': 'None',
'variables_files': {'altitude': 'sample-data/explanatory/cat_dem_25831.tif',
'dist': 'sample-data/explanatory/cat_distance_coast.tif'},
'interpolation_bounds': [260000, 4488100, 530000, 4750000],
'resolution': 270,
'EPSG': 25831}
where:
id_power
: rate at which the influence of distant data points diminishes as we move away from them.id_smoothing
: if 0.0 the interpolated value at that point location becomes identical to the observation value recorded at that precise data point.clusters
: set to None as no clusters will be used.variables_files
: dictionary with predictor variables as keys and their corresponding GeoTIFF path as values. Here, altitude asaltitude
and distance to coast line asdist
.interpolation_bounds
: [minimum_x_coordinate, minimum_y_coordinate, maximum_x_coordinate, maximum_y_coordinate], it must be the same as the variable files.resolution
: spatial resolution.EPSG
: EPSG projection code.
With all these parameters and configurations set, let’s initialize the
PyMica
class with the methodology set to ‘mlr+id2d’.
mlr_id2d_method = PyMica(methodology='mlr+id2d', config=config_file)
Now that we have the interpolator set, we can input some data for interpolation. We will use data from the Meteorological Service of Catalonia AWS network.
with open('sample-data/data/smc_data.json', 'r') as f_p:
data = json.load(f_p)
data[0]
{'id': 'C6',
'value': 8.8,
'lon': 0.9517200000000001,
'lat': 41.6566,
'altitude': 264.0,
'dist': 0.8587308027349195}
As we can see, the first element of the data meets the requirements of
PyMica input data and has the same predictor variables as the ones
provided in the configuration dictionary. Therefore, we only need to
call the interpolate
method from the mlr_id2d_method
interpolator class.
data_field = mlr_id2d_method.interpolate(data)
Now, we can get a quick look of the data_field
array using
matplotlib
.
import matplotlib.pyplot as plt
plt.imshow(data_field)
plt.colorbar(label='Air temperature (\u00b0C)')
Now, we can save the result into a GeoTIFF file using save_file()
from PyMica
class.
mlr_id2d_method.save_file("sample-data/results/mlr_id2d.tif")
We have now completed the first part of this tutorial on how to
interpolate station data using the mlr+id2d
methodology. The
obtained result is similar to the one in 05 Interpolation - Multiple
linear regression, but with the additional
application of residual correction, which is evident in the interpolated
field. You can experiment with changing the variables_files
,
id_power
, and id_smoothing
parameters in the configuration
dictionary to observe how each parameter affects the interpolation
result.
mlr+id3d#
Let’s call the PyMica class with the appropriate parameters, setting the
methodology to mlr+id2d
and the configuration dictionary as follows:
config_file = 'sample-data/configuration_sample.json'
with open('sample-data/configuration_sample.json', 'r') as f_p:
config = json.load(f_p)
config['mlr+id3d']
{'id_power': 2.5,
'id_smoothing': 0.0,
'id_penalization': 30,
'clusters': 'None',
'variables_files': {'altitude': 'sample-data/explanatory/cat_dem_25831.tif',
'dist': 'sample-data/explanatory/cat_distance_coast.tif'},
'interpolation_bounds': [260000, 4488100, 530000, 4750000],
'resolution': 270,
'EPSG': 25831}
where:
id_power
: rate at which the influence of distant data points diminishes as we move away from them.id_smoothing
: if 0.0 the interpolated value at that point location becomes identical to the observation value recorded at that precise data point.clusters
: set to None as no clusters will be used.variables_files
: dictionary with predictor variables as keys and their corresponding GeoTIFF path as values. Here, altitude asaltitude
and distance to coast line asdist
.altitude
is mandatory as selected residual correction isid3d
.interpolation_bounds
: [minimum_x_coordinate, minimum_y_coordinate, maximum_x_coordinate, maximum_y_coordinate], it must be the same as the variable files.resolution
: spatial resolution.EPSG
: EPSG projection code.
With all these parameters and configurations set, let’s initialize the
PyMica
class with the methodology set to ‘mlr+id3d’.
mlr_id3d_method = PyMica(methodology='mlr+id3d', config=config_file)
The data we’ll use for interpolation is the same as the one used in the
mlr+id2d
section. Then, let’s call the interpolate
class method.
data_field = mlr_id3d_method.interpolate(data)
Now, we can get a quick look of the data_field
array using
matplotlib
.
import matplotlib.pyplot as plt
plt.imshow(data_field)
plt.colorbar(label='Air temperature (\u00b0C)')
Finally, we can save the result into a GeoTIFF file using
pymica.pymica.PyMica.save_file()
from PyMica
class.
mlr_id3d_method.save_file("sample-data/results/mlr_id3d.tif")
We have now completed this tutorial on how to interpolate station data
using the mlr
methodology combined with residuals correction
(id2d
and id3d
). You can experiment with changing the
variables_files
in the configuration dictionary to observe how the
behavior of each variable affects the interpolation result.