How to guide¶

This guide will show how to carry out key nctoolkit operations. We will use a sea surface temperature data set and a depth-resolved ocean temperature data set. The data set can be downloaded from here.

[1]:

import nctoolkit as nc
import os
import pandas as pd
import xarray as xr

How to select years and months¶

If we want to select specific years and months we can use the select_years and select_months method

[2]:

sst = nc.open_data("sst.mon.mean.nc")
sst.select_years(1960)
sst.select_months(1)
sst.times()

[2]:

['1960-01-01T00:00:00']

How to copy a data set¶

If you want to make a deep copy of a data set, use the built in copy method. This method will return a new data set. This method should be used because of nctoolkit’s built in methods to automatically delete temporary files that are no longer required. Behind the scenes, using copy will result in nctoolkit registering that it needs the NetCDF file for both the original dataset and the new copied one. So if you copy a dataset, and then delete the original, nctoolkit knows to not remove any NetCDF files related to the dataset.

[3]:

sst = nc.open_data("sst.mon.mean.nc")
sst.select_years(1960)
sst.select_months(1)
sst1 = sst.copy()
del sst
os.path.exists(sst1.current)

[3]:

True

How to clip to a region¶

If you want to clip the data to a specific longitude and latitude box, we can use clip, with the longitude and latitude range given by lon and lat.

[4]:

sst = nc.open_data("sst.mon.mean.nc")
sst.select_months(1)
sst.select_years(1980)
sst.clip(lon = [-80, 20], lat = [40, 70])
sst.plot()

Data type cannot be displayed:

[4]:

How to rename a variable¶

If we want to rename a variable we use the rename method, and supply a dictionary where the key-value pairs are the original and new names

[5]:

sst = nc.open_data("sst.mon.mean.nc")
sst.variables

[5]:

['sst']

The original dataset had only one variable called sst. We can now rename it, and display the new variables.

[6]:

sst.rename({"sst": "temperature"})
sst.variables

[6]:

['temperature']

How to create new variables¶

New variables can be created using arithmetic operations using either mutate or transmute. The mutate method will maintain the original variables, whereas transmute will not. This method requires a dictionary, where the key, values pairs are the names of the new variables and the arithemtic operations to perform. The example below shows how to create a new variable with

[7]:

sst = nc.open_data("sst.mon.mean.nc")
sst.mutate({"sst_k": "sst+273.15"})
sst.variables

[7]:

['sst', 'sst_k']

How to calculate a spatial average¶

You can calculate a spatial average using the spatial_mean method. There are additional methods for maximums etc.

[8]:

sst = nc.open_data("sst.mon.mean.nc")
sst.spatial_mean()
sst.plot()

Data type cannot be displayed:

[8]:

How to calculate an annual mean¶

You can calculate an annual mean using the annual_mean method.

[9]:

sst = nc.open_data("sst.mon.mean.nc")
sst.spatial_mean()
sst.annual_mean()
sst.plot()

Data type cannot be displayed:

[9]:

How to calculate a rolling average¶

You can calculate a rolling mean using the rolling_mean method, with the window argument providing the number of time steps to average over. There are additional methods for rolling sums etc. The code below will calculate a rolling mean of global SST using a 20 year window.

[10]:

sst = nc.open_data("sst.mon.mean.nc")
sst.spatial_mean()
sst.annual_mean()
sst.rolling_mean(20)
sst.plot()

Data type cannot be displayed:

[10]:

How to calculate temporal anomalies¶

You can calculate annual temporal anomalies using the anomaly_annual method. This requires a baseline period.

[11]:

sst = nc.open_data("sst.mon.mean.nc")
sst.spatial_mean()
sst.annual_anomaly(baseline = [1960, 1979])
sst.plot()

Data type cannot be displayed:

[11]:

How to split data by year etc¶

Files within a dataset can be split by year, day, year and month or season using the split method. If we wanted to split by year, we do the following:

[12]:

sst = nc.open_data("sst.mon.mean.nc")
sst.split("year")
sst.size

[12]:

'Number of files in ensemble: 169\nEnsemble size: 530.445201 MB\nSmallest file: /tmp/nctoolkitayrhmwtbnctoolkittmp72u9tn3o.1898.nc has size 3.1387289999999997 MB\nLargest file: /tmp/nctoolkitayrhmwtbnctoolkittmp72u9tn3o.1898.nc has size 3.1387289999999997 MB'

How to merge files in time¶

We can merge files based on time using merge_time. We can do this by merging the dataset that results from splitting the original sst dataset. If we split the dataset by year, we see that there are 169 files, one for each year.

[13]:

sst = nc.open_data("sst.mon.mean.nc")
sst.split("year")
sst.size

[13]:

'Number of files in ensemble: 169\nEnsemble size: 530.445201 MB\nSmallest file: /tmp/nctoolkitayrhmwtbnctoolkittmp58y4ytyj.1998.nc has size 3.1387289999999997 MB\nLargest file: /tmp/nctoolkitayrhmwtbnctoolkittmp58y4ytyj.1998.nc has size 3.1387289999999997 MB'

We can then merge them together to get a single file dataset:

[14]:

sst.merge_time()
sst.size

[14]:

'Number of files: 1\nFile size: 525.828237 MB'

How to do variables based merging¶

If we have two more more files that have the same time steps, but different variables, we can merge them using merge. The code below will first create a dataset with a netcdf file with sst in K, and it will then create a new dataset with this netcd file and the original, and then merge them.

[15]:

sst1 = nc.open_data("sst.mon.mean.nc")
sst2 = nc.open_data("sst.mon.mean.nc")
sst2.transmute({"sst_k": "sst+273.15"})
new_sst = nc.open_data([sst1.current, sst2.current])
new_sst.current
new_sst.merge()
new_sst.variables

[15]:

['sst.mon.mean.nc', '/tmp/nctoolkitayrhmwtbnctoolkittmp8vgtaa28.nc']

[15]:

['sst', 'sst_k']

In some cases we will have two or more datasets we want to merge. In this case we can use the merge function as follows:

[16]:

sst1 = nc.open_data("sst.mon.mean.nc")
sst2 = nc.open_data("sst.mon.mean.nc")
sst2.transmute({"sst_k": "sst+273.15"})
new_sst = nc.merge(sst1, sst2)
new_sst.variables

[16]:

['sst', 'sst_k']

How to horizontally regrid data¶

Variables can be regridded horizontally using regrid. This method requires the new grid to be defined. This can either be a pandas data frame, with lon/lat as columns, an xarray object, a netcdfile or a dataset. I will demonstrate all three methods by regridding SST to the North Atlantic. Let’s begin by getting a grid for the North Atlantic.

[17]:

new_grid = nc.open_data("sst.mon.mean.nc")
new_grid.clip(lon = [-80, 20], lat = [30, 70])
new_grid.select_months(1)
new_grid.select_years( 2000)

First, we will use the new dataset itself to do the regridding. I will calculate mean SST using the original data, and then regrid to the North Atlantic.

[18]:

%%time
sst = nc.open_data("sst.mon.mean.nc")
sst.mean()
sst.regrid(grid = new_grid)
sst.plot()

CPU times: user 56.4 ms, sys: 94.2 ms, total: 151 ms
Wall time: 1.38 s

Data type cannot be displayed:

[18]:

We can also do this using the netcdf, which is new_grid.current

[19]:

%%time
sst = nc.open_data("sst.mon.mean.nc")
sst.mean()
sst.regrid(grid = new_grid.current)
sst.plot()

CPU times: user 60.1 ms, sys: 38.8 ms, total: 99 ms
Wall time: 1.48 s

Data type cannot be displayed:

[19]:

In a similar way we can read the new_grid in as an xarray data set.

[20]:

%%time
na_grid = xr.open_dataset(new_grid.current)
sst = nc.open_data("sst.mon.mean.nc")
sst.mean()
sst.regrid(grid = na_grid)
sst.plot()

CPU times: user 72.7 ms, sys: 44.4 ms, total: 117 ms
Wall time: 1.49 s

Data type cannot be displayed:

[20]:

or we can use a pandas data frame. In this case I will convert the xarray data set to a data frame.

[21]:

%%time
na_grid = xr.open_dataset(new_grid.current)
na_grid = na_grid.to_dataframe().reset_index().loc[:,["lon", "lat"]]
sst = nc.open_data("sst.mon.mean.nc")
sst.mean()
sst.regrid(grid = na_grid)
sst.plot()

CPU times: user 72.6 ms, sys: 39 ms, total: 112 ms
Wall time: 1.46 s

Data type cannot be displayed:

[21]:

How to temporally interpolate¶

Temporal interpolation can be carried out using time_interp. This method requires a start date (start) of the format YYYY/MM/DD and an end date (end), and a temporal resolution (resolution), which is either 1 day (“daily”), 1 week (“weekly”), 1 month (“monthly”), or 1 year (“yearly”).

[22]:

sst = nc.open_data("sst.mon.mean.nc")
sst.time_interp(start = "1990/01/01", end = "1990/12/31", resolution = "daily")

How to calculate a monthly average from daily data¶

If you have daily data, you can calculate a month average using monthly_mean. There are also methods for maximums etc.

[23]:

sst = nc.open_data("sst.mon.mean.nc")
sst.time_interp(start = "1990/01/01", end = "1990/12/31", resolution = "daily")
sst.monthly_mean()

How to calculate a monthly climatology¶

CDO outputs the date of the final month.

[24]:

sst = nc.open_data("sst.mon.mean.nc")
sst.select_years(list(range(1990, 2000)))
sst.monthly_mean_climatology()
sst.select_months(1)
sst.plot()

Data type cannot be displayed:

[24]:

How to calculate a seasonal climatology¶

[25]:

sst = nc.open_data("sst.mon.mean.nc")
sst.seasonal_mean_climatology()
sst.select_timestep(0)
sst.plot()

Data type cannot be displayed:

[25]: