Introduction tutorial

nctoolkit is designed for the efficient analysis and manipulation of netCDF files. This tutorial provides an overview of how to work with individual files.

Opening netcdf data

This tutorial will illustrate the basic usage using a dataset of average global sea surface temperature from NOAA, which is available here.

nctoolkit should be imported using the nc shorthand:

[1]:
import nctoolkit as nc

Reading in a dataset is straightforward:

[2]:
ff =  "sst.mon.ltm.1981-2010.nc"
sst = nc.open_data(ff)

We might want to know some basic information about the file. This can be done easily. Listing the available variables can be found quickly:

The current state of the dataset can be found as follows:

[3]:
sst.variables
[3]:
['sst', 'valid_yr_count']

The months available can be found using:

[4]:
sst.months
[4]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

We have 12 months available. In this case it is the monthly average temperature from 1981-2010.

Modifying datasets

Each time nctoolkit executes a command that modifies a dataset, it will generate a new NetCDF file, which becomes the current file in the dataset. Before any modification this is as follows:

[5]:
sst.current
[5]:
'sst.mon.ltm.1981-2010.nc'

We have seen that there are two variables in the dataset. But we only really care about sst. So let’s select that variable:

[6]:
sst.select_variables("sst")

We can now see that there is only one variable in the sst dataset

[7]:
sst.variables
[7]:
['sst']

We can also that a temporary file has been created with only this variable in it

[8]:
sst.current
[8]:
'/tmp/nctoolkitesugmpemnctoolkittmpxmrohbap.nc'

We have data for 12 months. But what we might really want is an average of those values. This can be quickly calculated:

[9]:
sst.mean()

Once again a new temporary file has been generated.

[10]:
sst.current
[10]:
'/tmp/nctoolkitesugmpemnctoolkittmpgz_hzyoq.nc'

Do not worry about the temporary folder getting clogged up. nctoolkit cleans it up automatically.

Quick visualization of netCDF data is always a good thing. So nctoolkit provides an easy autoplot feature.

[11]:
sst.plot()
[11]:

What we have seen so far is not computionally efficient. In the code below nctoolkit has generated temporary files twice:

[12]:
sst = nc.open_data(ff)
sst.select_variables("sst")
sst.mean()

We can see what went on behind the scenes by accessing history:

[13]:
sst.history
[13]:
['cdo -L -selname,sst sst.mon.ltm.1981-2010.nc /tmp/nctoolkitesugmpemnctoolkittmpxpb_323a.nc',
 'cdo -L -timmean /tmp/nctoolkitesugmpemnctoolkittmpxpb_323a.nc /tmp/nctoolkitesugmpemnctoolkittmp5agj679e.nc']

nctoolkit uses CDO. You do not understand how CDO works to use nctoolkit. But one nice feature of CDO is method chaining, which works like Python’s. To get it working you just need to set evaluation to lazy in nctoolkit. This means nothing is evaluated until you force it to or it has to be.

[14]:
nc.options(lazy = True)

Now, let’s run the code again:

[15]:
sst = nc.open_data(ff)
sst.select_variables("sst")
sst.mean()
sst.plot()
[15]:

When we look at history, we now see that only one temporary file was generated:

[16]:
sst.history
[16]:
['cdo -L -timmean -selname,sst sst.mon.ltm.1981-2010.nc /tmp/nctoolkitesugmpemnctoolkittmpooqi1xou.nc']

In the example, above the commands were only executed when plot was called. If we want to force commands to run we use run:

[17]:
sst = nc.open_data(ff)
sst.select_variables("sst")
sst.mean()
sst.run()