Introduction to nctoolkit¶

nctoolkit is a multi-purpose tool for analyzing and post-processing netCDF files. We will see what it is capable of by carrying out an exploratory analysis of sea surface temperature since 1850.

We will use the temperature data set COBE2 from the National Oceanic and Atmospheric Administration. This is a global dataset of sea surface temperature at a horizontal resolution of 1 degree for every month since 1850.

It is best to import nctoolkit as follows:

[45]:

import nctoolkit as nc


We will set the file paths as follows:

[46]:

ff = "sst.mon.mean.nc"


nctoolkit works with datasets. These contain either a single file or a list of files that we will work with. We can create using the temperature file, as follows:

[47]:

ds = nc.open_data(ff)


We can access easily access the attributes of the dataset. For example, if we wanted to find out the number of years in the dataset, we can do this:

[48]:

[min(ds.years), max(ds.years)]

[48]:

[1850, 2019]


All years from 1850 to 2019 are available. We could find out the variables available like so:

[49]:

ds.variables

[49]:

['sst']


Now, if we want to do anything with a dataset, we need to use nctoolkit’s many methods. Let’s say we want to map mean temperature for the year 2016. We could do that as follows:

[50]:

ds = nc.open_data(ff)
ds.select(year = 2000)
ds.tmean()
ds.plot()

WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace


Data type cannot be displayed:

[50]:


This was carried out in 3 clear steps. First, we selected the year 2000. Second, we calculated the temporal mean for that year. And we then plotted the result.

We might want to do something more interesting. We have a dataset of sea surface temperature. How much has the ocean warmed over this time? We can calculate that as follows:

[51]:

ds = nc.open_data(ff)
ds.tmean("year")
ds.spatial_mean()
ds.plot("sst")

WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace


Data type cannot be displayed:

[51]:


Here we did the calculation in two steps. First we used tmean to calculate the annual mean since 1850 for each grid cell. We use the year keyword to tell nctoolkit that the mean should calculated each year. We then use spatial_mean to calculate the spatial mean.

Now, we might want to map how much the oceans have warmed over the last century. We could do this as follows:

[52]:

ds_start = nc.open_data(ff)
ds_start.select(years = range(1900, 1920))
ds_start.tmean()

ds_increase = nc.open_data(ff)
ds_increase.select(years = range(2000, 2020))
ds_increase.tmean()
ds_increase.subtract(ds_start)


First, we created a dataset which gives the mean temperature between 1900 and 1919. We then create a second dataset, which initially is the mean temperature between 2000 and 2019. We then subtract the 1900-19 temperature from this dataset. We can now plot the results:

[53]:

ds_increase.plot()

WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace
WARNING:param.ParameterizedMetaclass: Use method 'params' via param namespace


Data type cannot be displayed:

[53]:


You can see that most of the world’s oceans have warmed, but some have warmed more than others.

We might want to know how much oceans have warmed or cooled relative to the rest of the planet. We can do this using the assign method:

ds_increase.assign(sst = lambda x: x.sst - spatial_mean(x.sst)) ds_increase.plot()

Areas in the red warmed more than the global average.

Under the hood¶

Let’s revisit the first code example to see how nctoolkit works behind the scenes:

[54]:

ds = nc.open_data(ff)
ds.select(year = 2000)
ds.tmean()


The plotting part has been removed. Each dataset is made of up of files. We can see what they are as follows:

[55]:

ds.current

[55]:

['sst.mon.mean.nc']


You can see that this is just the file we started with. What’s going on? The answer: nctoolkit works lazily. All calculations are carried out when the user says to, or when they have to be. To force calculations to be carried out, we use run. The plot method will, of course, for everything to be evaluated before plotting.

[56]:

ds.run()


We can now see that the file in the dataset has changed:

[57]:

ds.current

[57]:

['/tmp/nctoolkitutksgqhmnctoolkittmplg5lkgk4.nc']


This is now a new temporary file. Under the hood, nctoolkit uses Climate Data Operators CDO. CDO is a powerful and ultra-efficient system for working with netCDF files. nctoolkit requires no knowledge of CDO, but if you want to understand it further you can read their excellent user guide.

We can see the CDO commands by access the history attribute:

[58]:

ds.history

[58]:

['cdo -L -timmean -selyear,2000 sst.mon.mean.nc /tmp/nctoolkitutksgqhmnctoolkittmplg5lkgk4.nc']


You can see that 2 nctoolkit methods have been converted into one CDO call.

And don’t worry, nctoolkit will automatically remove all of the temporary files once they are no longer needed.

Click on the tabs on the left to find out what nctoolkit is capable of