Visualizing Geodata using Python

author avatar
Aniketsingh Rawat Software Engineer @ Infocusp
5 min read  .  16 July 2023

blog banner

Visualizing data is one of the most fun parts of studying it. This blog is an introduction to plotting and visualizing geo data using Python.

What is Geodata

Geodata, short for geographic data, refers to information that is associated with a specific location on the Earth's surface. It includes various types of data that are linked to spatial coordinates, such as latitude and longitude, and even temporal coordinates like time. Geodata can provide details about physical features, landmarks, boundaries, transportation networks, weather patterns, population density, etc.
It is stored in various forms like Raster, Vector, Binary, Swath, etc. These are stored in multiple file formats. Some file formats are NetCDF, Grib, Tiff, Zarr, HDF5, Burf. These files are what is called a dataset, i.e. they contain all the related weather data as a collection.

Data is collected from various sources. There are primary data sources such as GPS, surveys, aerial photographs, and remote sensing images. There are also secondary data sources like scanned maps, elevation models, topographic surveys, etc.

A lot of geodata can be accessed free of cost on various websites. Organizations upload these datasets for academic and research purposes.
Here are a few of the organizations:

  1. ecmwf
  2. copernicus
  3. nasa
  4. noaa

For the purpose of this tutorial, we are going to use this dataset but similar steps can be followed for other datasets too. https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.html (open data)

Opening geodata using Python

Geodata is present in many formats but we need a unified way to read such data. Xarray is an open-source tool that enables us to open and load such files into Python.

It can be installed by running

pip install xarray

First, we import Xarray

import xarray as xr
file_name = "/path/to/downloaded/file/*.nc"

Next, let's open the dataset and view its contents

ds = xr.tutorial.open_dataset("air_temperature")
print(ds)

output:

<xarray.Dataset> 
Dimensions: (lat: 73, lon: 144, level: 17, time: 1460, nbnds: 2) Coordinates: 
* lat (lat) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0 
* lon (lon) float32 0.0 2.5 5.0 7.5 ... 352.5 355.0 357.5 
* level (level) float32 1e+03 925.0 850.0 ... 30.0 20.0 10.0 
* time (time) object 0001-01-01 00:00:00 ... 0001-12-31 18:0... 
Dimensions without coordinates: nbnds 
Data variables: 
climatology_bounds (time, nbnds) datetime64[ns] ... 
air (time, level, lat, lon) float32 ... 
valid_yr_count (time, level, lat, lon) float32 ... 
Attributes: 
Conventions: COARDS 
description: Data is from NMC initialized reanalysis\n... 
platform: Model 
not_missing_threshold_percent: minimum 3% values input to have non-missi... 
history: Created 2011/06/29 by do4XdayLTM\nConvert... 
title: 4X daily ltm air from the NCEP Reanalysis 
dataset_title: NCEP-NCAR Reanalysis 1 
References: http://www.psl.noaa.gov/data/gridded/data...

Let's go through it one by one

  1. Dimensions are the shape of your name where the size of each coordinate is mentioned.
  2. Coordinates are like indexes. You can index data by specifying each coordinate.
  3. Data variables are where the actual data is stored. Data variables need not have all the coordinates as indexes. Each data variable is a xarray DataArray.
  4. Attributes are the metadata you get like geo information, description, source, etc.

You can see the Data variables by just indexing them using its name

print(ds['air'])

output:

<xarray.DataArray 'air' (time: 1460, level: 17, lat: 73, lon: 144)>
[260907840 values with dtype=float32]
Coordinates:
  * lat      (lat) float32 90.0 87.5 85.0 82.5 80.0 ... -82.5 -85.0 -87.5 -90.0
  * lon      (lon) float32 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5
  * level    (level) float32 1e+03 925.0 850.0 700.0 ... 50.0 30.0 20.0 10.0
  * time     (time) object 0001-01-01 00:00:00 ... 0001-12-31 18:00:00
Attributes:
    long_name:     Long Term Mean 4xDaily Air temperature
    units:         degK
    precision:     2
    GRIB_id:       11
    GRIB_name:     TMP
    var_desc:      Air temperature

Great now that we have the data, we are ready to plot this.

Plotting the data using matplotlib

We are going to use matplotlib basemap toolkit for plotting data. This helps us in plotting latitude and longitude based data easily on a plot while also providing different projections and outlines for coastlines and countries.

It can be installed using pip.

pip install basemap

Import the necessary libraries

import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap

Get all the necessary data

air_data = ds['air'].isel(time=0, level=0)
lats = ds['lat']
lons = ds['lon']

Plot the data using Basemap

# Create base map object with projection and other params
map = Basemap(projection='mill', llcrnrlon=lons.min(), llcrnrlat=lats.min(), urcrnrlon=lons.max(), urcrnrlat=lats.max())


# Add map features
map.drawcoastlines()
map.drawcountries()


# Create mesh grid of lons and lats
lon_mesh, lat_mesh = np.meshgrid(lons, lats)
x, y = map(lon_mesh, lat_mesh)


# Plot the data
plt.figure(figsize=(10, 8))
map.pcolormesh(x, y, air_data, cmap='jet')


# Add colorbar
plt.colorbar(orientation='horizontal', label="Air temperature (units: degK)", fraction=0.046, pad=0.04)


# Show the plot
plt.title("4X daily ltm air from the NCEP Reanalysis")
plt.show()

blog banner

Looks awesome!

But we can do more, we can view this data in different projections by changing the projection parameter when creating Basemap object. More projections can be found at basemap docs.

Using hammer projection

map = Basemap(projection='hammer', lon_0 = 0)

blog banner

That's it folks, it's that easy to plot geodata in python.