01. Preparing Weather Station Data for PyMica#

In this tutorial, we’ll cover the preparation of weather station data for use in PyMica.

The data format for weather station data used by PyMica is a list containing a dictionary for each weather station, including at least the following variables:

  • id: Identification code.

  • lon: Longitude coordinate.

  • lat: Latitude coordinate.

  • value: Observation value.

It can also contain other keys referring to the variables used in interpolation, such as altitude or distance to the coast. Altitude must be named ‘altitude’; the names of other explanatory variables do not need to be specific in PyMica.

An element of the list containing these variables is organized as follows for each weather station:

{
    "id": "id_code",
    "lon": "longitude coordinate value",
    "lat": "latitude coordinate value",
    "value": "value",
    "altitude": "altitude value"
}

The weather station data is supplied to pymica.pymica.PyMica.interpolate() as a list of dictionaries, one for each station.

As an example, we’ll work with data from the Automatic Weather Station Network (XEMA) of the Meteorological Service of Catalonia (XEMA). However, you can also provide your own data to PyMica.

First, let’s import the required library.

import pandas as pd

Now, let’s suppose that our data is in a .csv format. In the sample-data/data directory, we’ll find data from the XEMA network for 2017/02/21 12:00 UTC and its corresponding metadata.

We’ll open both .csv files, XEMA_20170221_1200.csv and XEMA_metadata.csv, using the pandas library and present the head of data file.

file_data = 'sample-data/data/XEMA_20170221_1200.csv'
file_metadata = 'sample-data/data/XEMA_metadata.csv'

station_data = pd.read_csv(file_data)
metadata = pd.read_csv(file_metadata)

station_data.head()
key altitude dist hr lat lon temp
0 C6 264.0 0.858731 80.0 41.65660 0.95172 8.8
1 C7 427.0 0.839116 86.0 41.66695 1.16234 7.1
2 C8 554.0 0.825381 76.0 41.67555 1.29609 9.3
3 C9 240.0 0.448604 47.0 40.71825 0.39988 15.7
4 CC 626.0 0.849968 47.0 42.07398 2.20862 15.2

And we also present the head of metedata.

metadata.head()
key altitude dist lat lon name
0 C6 264.0 0.858731 41.65660 0.95172 Castellnou de Seana
1 C7 427.0 0.839116 41.66695 1.16234 Tàrrega
2 C8 554.0 0.825381 41.67555 1.29609 Cervera
3 C9 240.0 0.448604 40.71825 0.39988 Mas de Barberans
4 CC 626.0 0.849968 42.07398 2.20862 Orís

Now, let’s prepare the data in the format required by PyMICA, selecting the air temperature variable (temp) and using altitude and dist as predictor variables. The variable dist refers to the distance from a station to the coastline to account for proximity to sea influence.

data = []
for key in station_data['key']:
    df_data = station_data[station_data['key'] == key]
    df_meta = metadata[metadata['key'] == key]
    data.append(
        {
            'id': key,
            'lon': float(df_meta['lon'].iloc[0]),
            'lat': float(df_meta['lat'].iloc[0]),
            'value': float(df_data['temp'].iloc[0]),
            'altitude': float(df_meta['altitude'].iloc[0]),
            'dist': float(df_meta['dist'].iloc[0])
        }
    )

If we print the first element of data, we can see all the required variables for a station, which include identification code, longitude, latitude, temperature value, altitude, and distance to the coastline.

print('Sample data: ', data[0])
print('Number of points: ', len(data))
Sample data:  {'id': 'C6', 'lon': 0.95172, 'lat': 41.6566, 'value': 8.8, 'altitude': 264.0, 'dist': 0.8587308027349195}
Number of points:  180

We have now completed this tutorial on how to prepare raw observation station data to be ready to feed the PyMICA class.