Introduction tutorial

The fundamental object of analysis in this package is a nctoolkit dataset. Each object is initialized with a single netcdf file or an ensemble of files, and it will keep track of any manipulations carried out.

Behind the scenes most of the manipulations are done using CDO. Datasets will keep track of all CDO NCO commands used. However, unless you are experienced with CDOyou can ignore all of this.

Opening netcdf data

I will illustrate the basic usage using a climatology of global sea surface temperature from NOAA. We can download this from here. To download using wget:

wget ftp://ftp.cdc.noaa.gov/Datasets/COBE/sst.mon.ltm.1981-2010.nc

The first step in any analysis will be to import nctoolkit, which I will call nc as shorthand. Please note I am suppressing warnings to make this notebook more readable. I do not recommend suppressing warnings….

[1]:
import nctoolkit as nc

Under the hood nctoolkit will generate temporary netcdf files. The package is designed to remove temp files that are no longer in use, and will automatically clean up any temporary files generated when Python closes. However, this is not 100% guaranteed to work during system crashes etc.

It is therefore recommended to do a deep_clean at the start of any session to remove any leftover netcdf files that might have existed in a previous sessions. Obviously, do not run this if you have multiple instances of nctoolkit running simultaneously.

[2]:
nc.deep_clean()

We can then set up the dataset, which we will use for manipulating the SST climatology.

[3]:
ff =  "sst.mon.ltm.1981-2010.nc"
sst = nc.open_data(ff)

Accessing dataset attributes

At this point there is very little useful information in the dataset. Essentially, all it tell us is the start file. This will always remain the same.

[4]:
sst.start
[4]:
'sst.mon.ltm.1981-2010.nc'

The current state of the dataset can be found as follows.

[5]:
sst.current
[5]:
'sst.mon.ltm.1981-2010.nc'

We can access the dataset’s history as follows, which is initially empty.

[6]:
sst.history
[6]:
[]

A simple, but important first task when analyzing netcdf data is knowing the variables in the file. We can do this quickly by accessing the variables attribute

[7]:
sst.variables
[7]:
['sst', 'valid_yr_count']

Often, we will want to know the size of a dataset. This is most relevant when we are working with multiple files. We can do this by accessing the size attribute. To speed up computations, variables and size are computed lazily.

[8]:
sst.size
[8]:
'Number of files: 1\nFile size: 4.670688 MB'

In this case we can see that the file is 4 MB, and we are also told that there is only one file.

Variable selection and geographic clipping

We can clip netcdf files in space or time using clip. Let’s say we only cared about temperature in July for the North Atlantic. This be found very easily using the following.

netcdf files often have variables that we are not interested in. We can therefore easily select or delete variables. If we want to select variables we can use the select_variables method, which requires either a single variable or a list of variables. Here I will select sst.

[9]:
sst.select_variables("sst")

We can now see that there is only one variable in the sst dataset

[10]:
sst.variables
[10]:
['sst']

We can also that a temporary file has been created with only this variable in it

[11]:
sst.current
[11]:
'/tmp/nctoolkitmexmyssrnctoolkittmp1jbp0kvg.nc'

If we want to clip the dataset geographically we can use the clip method. All we need is the longitude and latitude range. So if we wanted to clip the SST data to the North Atlantic we would do the following.

[12]:
sst.clip(lon = [-80, 20], lat = [30, 80])

We have now carried out some manipulations on the dataset. So, the current file has now changed.

Likewise, we now have a history to look at.

[13]:
sst.history
[13]:
['cdo -L -selname,sst sst.mon.ltm.1981-2010.nc /tmp/nctoolkitmexmyssrnctoolkittmp1jbp0kvg.nc',
 'cdo -L  -sellonlatbox,-80,20,30,80 /tmp/nctoolkitmexmyssrnctoolkittmp1jbp0kvg.nc /tmp/nctoolkitmexmyssrnctoolkittmpbp36y2p_.nc']

This will give us the list of CDO or NCO commands used under the hood. nctoolkit is designed to be usable without any prior knowledge of CDO or NCO.

Deleting an object

If we want to delete a dataset we simply use the standard python del approach. nctoolkit has been designed so that it is constantly cleaning up the system using a simple rule: only keep temp files created if they are among the current files of datasets in the current session. Right now, we only have one dataset, called “sst”. So if we delete “sst” it will also delete the current temp file from that dataset. We can see this by looking at what happens to the temp file related to sst when we delete sst. Right now it exists on the system.

[14]:
import os
x = sst.current
os.path.exists(x)
[14]:
True

But if we delete sst, this file will disappear.

[15]:
del sst
os.path.exists(x)
[15]:
False

Viewing a dataset using the auto plot feature

nctoolkit has a built in, though slightly experimental, method for quick plotting. This will check the contents of the dataset and plot accordingly. The general approach of autoplot is very similar to ncview on the command line.

[16]:
ff =  "sst.mon.ltm.1981-2010.nc"
sst = nc.open_data(ff)
sst.select_months(1)
sst.reduce_dims()
sst.plot()

Data type cannot be displayed:

[16]:

Statistical operations

nctoolkit has a large number of built in statistical operations, largely built around the methods available in CDO.

Time averaging

Averaging in time is one of the most common operations required on netcdf data. nctoolkit allows users to calculate long-term time averages, monthly climatologies, seasonal summaries and many other common statistics.

In this case we are analyzing a monthly climatology of SST. However, what we really might be interested in is the annual average. This can be calculated using the simple mean method, which will calculate the mean over all time steps.

[17]:
ff =  "sst.mon.ltm.1981-2010.nc"
sst = nc.open_data(ff)
sst.select_variables("sst")
sst.mean()
sst.reduce_dims()
sst.plot()

Data type cannot be displayed:

[17]:

Instead of the annual mean, we might be interested in the range of temperatures during the year.

[18]:
ff =  "sst.mon.ltm.1981-2010.nc"
sst = nc.open_data(ff)
sst.select_variables("sst")
sst.range()
sst.reduce_dims()
sst.plot()

Data type cannot be displayed:

[18]:

Other operations, such as maximum, minimum, and standard deviation are available.

Spatial statistics

Let’s move on to some more advanced methods. I will illustrate these using NOAA’s long-term monthly global data set of sea surface temperatures from 1850 to the present day. You can learn more about this data set here. This file is approximately 500 MB.

To download using wget:

wget ftp://ftp.cdc.noaa.gov/Datasets/COBE/sst.mon.mean.nc

This is a long-term data set of global sea surface temperature. So, let’s find out what has happened to average global sea surface temperature since 1850. Unsurprising spoiler: it has been going up. Let’s start by setting up the dataset.

[19]:
ff =  "sst.mon.mean.nc"
sst = nc.open_data(ff)

We now need to calculate the average global SST. We can do this using the spatial_mean method. This will calculate an area weighted mean for each time step.

[20]:
sst.spatial_mean()

We can now plot the time series of monthly global mean SST since 1850

[21]:
sst.plot()

Data type cannot be displayed:

[21]:

Our time series shows that, as expected SST increased during the 20th Century. However, this figure has too much noise. We do not care about month to month variations.Instead, let’s look at the rolling 20 year mean. To do this, we will need to first calculate an annual mean then calculated the rolling mean using a window of 20 years. Alternatively, we can just calculate a rolling mean on the initial data using a rolling mean of 20*12 = 240 months.

[22]:
ff =  "sst.mon.mean.nc"
sst = nc.open_data(ff)
sst.spatial_mean()

To calculate the annual mean we can simply use the yearly_mean method.

[23]:
sst.annual_mean()

To calculate the rolling mean, we can use rolling_mean, with window set to 20.

[24]:
sst.rolling_mean(window = 20)
[25]:
sst.plot()

Data type cannot be displayed:

[25]:

This looks much cleaner. Please note that at present nctoolkit does not adjust the time outputs from CDO. So in this case the rolling mean is centred in the middle of the 20 year period. As nctoolkit evolves more windowing options will be provided to users.