01. Preparing Weather Station Data for PyMica#
In this tutorial, we’ll cover the preparation of weather station data for use in PyMica.
The data format for weather station data used by PyMica is a list containing a dictionary for each weather station, including at least the following variables:
id
: Identification code.lon
: Longitude coordinate.lat
: Latitude coordinate.value
: Observation value.
It can also contain other keys referring to the variables used in interpolation, such as altitude or distance to the coast. Altitude must be named ‘altitude’; the names of other explanatory variables do not need to be specific in PyMica.
An element of the list containing these variables is organized as follows for each weather station:
{
"id": "id_code",
"lon": "longitude coordinate value",
"lat": "latitude coordinate value",
"value": "value",
"altitude": "altitude value"
}
The weather station data is supplied to
pymica.pymica.PyMica.interpolate()
as a list of dictionaries, one
for each station.
As an example, we’ll work with data from the Automatic Weather Station Network (XEMA) of the Meteorological Service of Catalonia (XEMA). However, you can also provide your own data to PyMica.
First, let’s import the required library.
import pandas as pd
Now, let’s suppose that our data is in a .csv format. In the
sample-data/data
directory, we’ll find data from the XEMA network
for 2017/02/21 12:00 UTC and its corresponding metadata.
We’ll open both .csv files, XEMA_20170221_1200.csv
and
XEMA_metadata.csv
, using the pandas library and present the head of
data file.
file_data = 'sample-data/data/XEMA_20170221_1200.csv'
file_metadata = 'sample-data/data/XEMA_metadata.csv'
station_data = pd.read_csv(file_data)
metadata = pd.read_csv(file_metadata)
station_data.head()
key | altitude | dist | hr | lat | lon | temp | |
---|---|---|---|---|---|---|---|
0 | C6 | 264.0 | 0.858731 | 80.0 | 41.65660 | 0.95172 | 8.8 |
1 | C7 | 427.0 | 0.839116 | 86.0 | 41.66695 | 1.16234 | 7.1 |
2 | C8 | 554.0 | 0.825381 | 76.0 | 41.67555 | 1.29609 | 9.3 |
3 | C9 | 240.0 | 0.448604 | 47.0 | 40.71825 | 0.39988 | 15.7 |
4 | CC | 626.0 | 0.849968 | 47.0 | 42.07398 | 2.20862 | 15.2 |
And we also present the head of metedata.
metadata.head()
key | altitude | dist | lat | lon | name | |
---|---|---|---|---|---|---|
0 | C6 | 264.0 | 0.858731 | 41.65660 | 0.95172 | Castellnou de Seana |
1 | C7 | 427.0 | 0.839116 | 41.66695 | 1.16234 | Tàrrega |
2 | C8 | 554.0 | 0.825381 | 41.67555 | 1.29609 | Cervera |
3 | C9 | 240.0 | 0.448604 | 40.71825 | 0.39988 | Mas de Barberans |
4 | CC | 626.0 | 0.849968 | 42.07398 | 2.20862 | Orís |
Now, let’s prepare the data in the format required by PyMICA, selecting
the air temperature variable (temp
) and using altitude
and
dist
as predictor variables. The variable dist
refers to the
distance from a station to the coastline to account for proximity to sea
influence.
data = []
for key in station_data['key']:
df_data = station_data[station_data['key'] == key]
df_meta = metadata[metadata['key'] == key]
data.append(
{
'id': key,
'lon': float(df_meta['lon'].iloc[0]),
'lat': float(df_meta['lat'].iloc[0]),
'value': float(df_data['temp'].iloc[0]),
'altitude': float(df_meta['altitude'].iloc[0]),
'dist': float(df_meta['dist'].iloc[0])
}
)
If we print the first element of data
, we can see all the required
variables for a station, which include identification code, longitude,
latitude, temperature value, altitude, and distance to the coastline.
print('Sample data: ', data[0])
print('Number of points: ', len(data))
Sample data: {'id': 'C6', 'lon': 0.95172, 'lat': 41.6566, 'value': 8.8, 'altitude': 264.0, 'dist': 0.8587308027349195}
Number of points: 180
We have now completed this tutorial on how to prepare raw observation station data to be ready to feed the PyMICA class.