Datasets

nctoolkit works with what it calls datasets. Each dataset is made up of a single or multiple NetCDF files. Each time you apply a method to a dataset the NetCDF file or files within the dataset will be modified.

Opening datasets

There are 3 ways to create a dataset: open_data, open_url or open_thredds.

If the data you want to analyze is already available on your computer use open_data. This will accept either a path to a single file or a list of files to create a dataset.

If you want to use data that can be downloaded from a url, just use open_url. This will download the NetCDF files to a temporary folder, and it can then be analyzed.

If you want to analyze data that is available from a thredds server, then user open_thredds. The file paths should end with .nc.

Dataset attributes

We can find out key information about a dataset using its attributes. Here we will use a sea surface temperature file that is available via thredds.

[2]:
import nctoolkit as nc
sst = nc.open_thredds("https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.ltm.1981-2010.nc")
1 file was created by nctoolkit in prior or current sessions. Consider running deep_clean!

If we want to know a dataset’s variables:

[3]:
sst.variables
[3]:
['sst', 'valid_yr_count']

If we want to know a dataset’s variables, we use the following. In this case there is only one because the file only shows the sea surface.

[4]:
sst.levels
[4]:
[0.0]

If we want to know where the dataset’s NetCDF files are stored we can do the following:

[5]:
sst.current
[5]:
'https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.ltm.1981-2010.nc'

If we want to find out what times are in the dataset:

[6]:
sst.times
[6]:
['0001-01-01T00:00:00',
 '0001-02-01T00:00:00',
 '0001-03-01T00:00:00',
 '0001-04-01T00:00:00',
 '0001-05-01T00:00:00',
 '0001-06-01T00:00:00',
 '0001-07-01T00:00:00',
 '0001-08-01T00:00:00',
 '0001-09-01T00:00:00',
 '0001-10-01T00:00:00',
 '0001-11-01T00:00:00',
 '0001-12-01T00:00:00']

If we want to find out what months are in the dataset:

[7]:
sst.months
[7]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

If we want to find out what years are in the dataset:

[8]:
sst.years
[8]:
[1]

If we do anything to the dataset, things will change. Let’s calculate the average temperature:

[9]:
sst.mean()

We can see that there is now a new temporary file associated with the dataset:

[10]:
sst.current
[10]:
'/tmp/nctoolkityjoudsyznctoolkittmpm3zaqqam.nc'

We can also access the history of operations carried out on the dataset:

[11]:
sst.history
[11]:
['cdo -L -timmean https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.ltm.1981-2010.nc /tmp/nctoolkityjoudsyznctoolkittmpm3zaqqam.nc']

Behind the scenes, nctoolkit mostly uses Climata Data Operators. If you are not familiar with Climate Data Operators, you can almost certainly just ignore the operations history.

We can also see that the times have changed. The only month now available is June, which is the mid-point of the year.

[12]:
sst.months
[12]:
[6]

Lazy evaluation of datasets

The code below will calculate the average sea surface temperature for a region in the North Atlantic for January. It does not do it efficiently.

[13]:
sst = nc.open_thredds("https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.ltm.1981-2010.nc")
sst.select_months(1)
sst.clip(lon = [-80, 20], lat = [30, 70])
sst.spatial_mean()

If we look at the operation history, we see that temporary files have been created 3 times. Why not just once? We can do this by setting evaluation to lazy and then using run to evaluate everything when we need to.

[14]:
sst.history
[14]:
['cdo -L -selmonth,1 https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.ltm.1981-2010.nc /tmp/nctoolkityjoudsyznctoolkittmpwcmy57jx.nc',
 'cdo -L  -sellonlatbox,-80,20,30,70 /tmp/nctoolkityjoudsyznctoolkittmpwcmy57jx.nc /tmp/nctoolkityjoudsyznctoolkittmp34c020st.nc',
 'cdo -L -fldmean /tmp/nctoolkityjoudsyznctoolkittmp34c020st.nc /tmp/nctoolkityjoudsyznctoolkittmpwey97q4f.nc']
[15]:
nc.options(lazy = True)
sst = nc.open_thredds("https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.ltm.1981-2010.nc")
sst.select_months(1)
sst.clip(lon = [-80, 20], lat = [30, 70])
sst.spatial_mean()
sst.run()

We can now see that only one temporary file was created

[16]:
sst.history
[16]:
['cdo -L -fldmean  -sellonlatbox,-80,20,30,70 -selmonth,1 https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.ltm.1981-2010.nc /tmp/nctoolkityjoudsyznctoolkittmpdixe9jo3.nc']

Visualization of datasets

You can visualize the contents of a dataset using the plot method. Below, we will plot temperature for January and the North Atlantic:

[17]:
sst = nc.open_thredds("https://psl.noaa.gov/thredds/dodsC/Datasets/COBE/sst.mon.ltm.1981-2010.nc")
sst.select_months(1)
sst.clip(lon = [-80, 20], lat = [30, 70])
sst.plot()
Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
The global colormaps dictionary is no longer considered public API.
The global colormaps dictionary is no longer considered public API.
The global colormaps dictionary is no longer considered public API.
[17]:

To see how to use all of nctoolk’s methods, check out the options on the left panel.