01. Preparing Weather Station Data for PyMica#

In this tutorial, we’ll cover the preparation of weather station data for use in PyMica.

The data format for weather station data used by PyMica is a list containing a dictionary for each weather station, including at least the following variables:

id: Identification code.
lon: Longitude coordinate.
lat: Latitude coordinate.
value: Observation value.

It can also contain other keys referring to the variables used in interpolation, such as altitude or distance to the coast. Altitude must be named ‘altitude’; the names of other explanatory variables do not need to be specific in PyMica.

An element of the list containing these variables is organized as follows for each weather station:

{
    "id": "id_code",
    "lon": "longitude coordinate value",
    "lat": "latitude coordinate value",
    "value": "value",
    "altitude": "altitude value"
}

The weather station data is supplied to pymica.pymica.PyMica.interpolate() as a list of dictionaries, one for each station.

As an example, we’ll work with data from the Automatic Weather Station Network (XEMA) of the Meteorological Service of Catalonia (XEMA). However, you can also provide your own data to PyMica.

First, let’s import the required library.

import pandas as pd

Now, let’s suppose that our data is in a .csv format. In the sample-data/data directory, we’ll find data from the XEMA network for 2017/02/21 12:00 UTC and its corresponding metadata.

We’ll open both .csv files, XEMA_20170221_1200.csv and XEMA_metadata.csv, using the pandas library and present the head of data file.

file_data = 'sample-data/data/XEMA_20170221_1200.csv'
file_metadata = 'sample-data/data/XEMA_metadata.csv'

station_data = pd.read_csv(file_data)
metadata = pd.read_csv(file_metadata)

station_data.head()

	key	altitude	dist	hr	lat	lon	temp
0	C6	264.0	0.858731	80.0	41.65660	0.95172	8.8
1	C7	427.0	0.839116	86.0	41.66695	1.16234	7.1
2	C8	554.0	0.825381	76.0	41.67555	1.29609	9.3
3	C9	240.0	0.448604	47.0	40.71825	0.39988	15.7
4	CC	626.0	0.849968	47.0	42.07398	2.20862	15.2

And we also present the head of metedata.

metadata.head()

	key	altitude	dist	lat	lon	name
0	C6	264.0	0.858731	41.65660	0.95172	Castellnou de Seana
1	C7	427.0	0.839116	41.66695	1.16234	Tàrrega
2	C8	554.0	0.825381	41.67555	1.29609	Cervera
3	C9	240.0	0.448604	40.71825	0.39988	Mas de Barberans
4	CC	626.0	0.849968	42.07398	2.20862	Orís

Now, let’s prepare the data in the format required by PyMICA, selecting the air temperature variable (temp) and using altitude and dist as predictor variables. The variable dist refers to the distance from a station to the coastline to account for proximity to sea influence.

data = []
for key in station_data['key']:
    df_data = station_data[station_data['key'] == key]
    df_meta = metadata[metadata['key'] == key]
    data.append(
        {
            'id': key,
            'lon': float(df_meta['lon'].iloc[0]),
            'lat': float(df_meta['lat'].iloc[0]),
            'value': float(df_data['temp'].iloc[0]),
            'altitude': float(df_meta['altitude'].iloc[0]),
            'dist': float(df_meta['dist'].iloc[0])
        }
    )

If we print the first element of data, we can see all the required variables for a station, which include identification code, longitude, latitude, temperature value, altitude, and distance to the coastline.

print('Sample data: ', data[0])
print('Number of points: ', len(data))

Sample data:  {'id': 'C6', 'lon': 0.95172, 'lat': 41.6566, 'value': 8.8, 'altitude': 264.0, 'dist': 0.8587308027349195}
Number of points:  180

We have now completed this tutorial on how to prepare raw observation station data to be ready to feed the PyMICA class.