How to guideΒΆ

This guide will show how to carry out key nctoolkit operations. We will use a sea surface temperature data set and a depth-resolved ocean temperature data set. The data set can be downloaded from here.

[1]:
import nctoolkit as nc
import os
import pandas as pd
import xarray as xr

How to select years and monthsΒΆ

If we want to select specific years and months we can use the select_years and select_months method

[2]:
sst = nc.open_data("sst.mon.mean.nc")
sst.select_years(1960)
sst.select_months(1)
sst.times
[2]:
['1960-01-01T00:00:00']

How to mean, mean, max etc.ΒΆ

If you want to calculate the mean value of a variable over all time steps you can use mean:

[3]:
sst = nc.open_data("sst.mon.mean.nc")
sst.mean()
sst.plot()
[3]:

Similarly, if you want to calculate the minimum, maximum, sum and range of values over time just use min, max, sum and range.

How to copy a data setΒΆ

If you want to make a deep copy of a data set, use the built in copy method. This method will return a new data set. This method should be used because of nctoolkit’s built in methods to automatically delete temporary files that are no longer required. Behind the scenes, using copy will result in nctoolkit registering that it needs the NetCDF file for both the original dataset and the new copied one. So if you copy a dataset, and then delete the original, nctoolkit knows to not remove any NetCDF files related to the dataset.

[4]:
sst = nc.open_data("sst.mon.mean.nc")
sst.select_years(1960)
sst.select_months(1)
sst1 = sst.copy()
del sst
os.path.exists(sst1.current)
[4]:
True

How to clip to a regionΒΆ

If you want to clip the data to a specific longitude and latitude box, we can use clip, with the longitude and latitude range given by lon and lat.

[5]:
sst = nc.open_data("sst.mon.mean.nc")
sst.select_months(1)
sst.select_years(1980)
sst.clip(lon = [-80, 20], lat = [40, 70])
sst.plot()
[5]:

How to rename a variableΒΆ

If we want to rename a variable we use the rename method, and supply a dictionary where the key-value pairs are the original and new names

[6]:
sst = nc.open_data("sst.mon.mean.nc")
sst.variables
[6]:
['sst']

The original dataset had only one variable called sst. We can now rename it, and display the new variables.

[7]:
sst.rename({"sst": "temperature"})
sst.variables
[7]:
['temperature']

How to create new variablesΒΆ

New variables can be created using arithmetic operations using either mutate or transmute. The mutate method will maintain the original variables, whereas transmute will not. This method requires a dictionary, where the key, values pairs are the names of the new variables and the arithemtic operations to perform. The example below shows how to create a new variable with

[8]:
sst = nc.open_data("sst.mon.mean.nc")
sst.mutate({"sst_k": "sst+273.15"})
sst.variables
[8]:
['sst', 'sst_k']

How to calculate a spatial averageΒΆ

You can calculate a spatial average using the spatial_mean method. There are additional methods for maximums etc.

[9]:
sst = nc.open_data("sst.mon.mean.nc")
sst.spatial_mean()
sst.plot()
[9]:

How to calculate an annual meanΒΆ

You can calculate an annual mean using the annual_mean method.

[10]:
sst = nc.open_data("sst.mon.mean.nc")
sst.spatial_mean()
sst.annual_mean()
sst.plot()
[10]:

How to calculate a rolling averageΒΆ

You can calculate a rolling mean using the rolling_mean method, with the window argument providing the number of time steps to average over. There are additional methods for rolling sums etc. The code below will calculate a rolling mean of global SST using a 20 year window.

[11]:
sst = nc.open_data("sst.mon.mean.nc")
sst.spatial_mean()
sst.annual_mean()
sst.rolling_mean(20)
sst.plot()
[11]:

How to calculate temporal anomaliesΒΆ

You can calculate annual temporal anomalies using the annual_anomaly method. This requires a baseline period.

[12]:
sst = nc.open_data("sst.mon.mean.nc")
sst.spatial_mean()
sst.annual_anomaly(baseline = [1960, 1979])
sst.plot()
[12]:

How to split data by year etcΒΆ

Files within a dataset can be split by year, day, year and month or season using the split method. If we wanted to split by year, we do the following:

[13]:
sst = nc.open_data("sst.mon.mean.nc")
sst.split("year")

How to merge files in timeΒΆ

We can merge files based on time using merge_time. We can do this by merging the dataset that results from splitting the original sst dataset. If we split the dataset by year, we see that there are 169 files, one for each year.

[14]:
sst = nc.open_data("sst.mon.mean.nc")
sst.split("year")

We can then merge them together to get a single file dataset:

[15]:
sst.merge_time()

How to do variables-based mergingΒΆ

If we have two more more files that have the same time steps, but different variables, we can merge them using merge. The code below will first create a dataset with a NetCDF file with SST in K, and it will then create a new dataset with this netcd file and the original, and then merge them.

[16]:
sst1 = nc.open_data("sst.mon.mean.nc")
sst2 = nc.open_data("sst.mon.mean.nc")
sst2.transmute({"sst_k": "sst+273.15"})
new_sst = nc.open_data([sst1.current, sst2.current])
new_sst.current
new_sst.merge()

In some cases we will have two or more datasets we want to merge. In this case we can use the merge function as follows:

[17]:
sst1 = nc.open_data("sst.mon.mean.nc")
sst2 = nc.open_data("sst.mon.mean.nc")
sst2.transmute({"sst_k": "sst+273.15"})
new_sst = nc.merge(sst1, sst2)
new_sst.variables
[17]:
['sst', 'sst_k']

How to horizontally regrid dataΒΆ

Variables can be regridded horizontally using regrid. This method requires the new grid to be defined. This can either be a pandas data frame, with lon/lat as columns, an xarray object, a NetCDF file or an nctolkit dataset. I will demonstrate all three methods by regridding SST to the North Atlantic. Let’s begin by getting a grid for the North Atlantic.

[18]:
new_grid = nc.open_data("sst.mon.mean.nc")
new_grid.clip(lon = [-80, 20], lat = [30, 70])
new_grid.select_months(1)
new_grid.select_years( 2000)

First, we will use the new dataset itself to do the regridding. I will calculate mean SST using the original data, and then regrid to the North Atlantic.

[19]:
sst = nc.open_data("sst.mon.mean.nc")
sst.mean()
sst.regrid(grid = new_grid)
sst.plot()
[19]:

We can also do this using the NetCDF, which is new_grid.current

[20]:
sst = nc.open_data("sst.mon.mean.nc")
sst.mean()
sst.regrid(grid = new_grid.current)
sst.plot()
[20]:

or we can use a pandas data frame. In this case I will convert the xarray data set to a data frame.

[21]:
na_grid = xr.open_dataset(new_grid.current)
na_grid = na_grid.to_dataframe().reset_index().loc[:,["lon", "lat"]]
sst = nc.open_data("sst.mon.mean.nc")
sst.mean()
sst.regrid(grid = na_grid)
sst.plot()
[21]:

How to temporally interpolateΒΆ

Temporal interpolation can be carried out using time_interp. This method requires a start date (start) of the format YYYY/MM/DD and an end date (end), and a temporal resolution (resolution), which is either 1 day (β€œdaily”), 1 week (β€œweekly”), 1 month (β€œmonthly”), or 1 year (β€œyearly”).

[22]:
sst = nc.open_data("sst.mon.mean.nc")
sst.time_interp(start = "1990/01/01", end = "1990/12/31", resolution = "daily")

How to calculate a monthly average from daily dataΒΆ

If you have daily data, you can calculate a month average using monthly_mean. There are also methods for maximums etc.

[23]:
sst = nc.open_data("sst.mon.mean.nc")
sst.time_interp(start = "1990/01/01", end = "1990/12/31", resolution = "daily")
sst.monthly_mean()

How to calculate a monthly climatologyΒΆ

If we want to calculate the mean value of variables for each month in a given dataset, we can use the monthly_mean_climatology method as follows:

[24]:
sst = nc.open_data("sst.mon.mean.nc")
sst.monthly_mean_climatology()
sst.select_months(1)
sst.plot()
[24]:

How to calculate a seasonal climatologyΒΆ

[25]:
sst = nc.open_data("sst.mon.mean.nc")
sst.seasonal_mean_climatology()
sst.select_timesteps(0)
sst.plot()
[25]:
[26]:
## How to read a dataset using pandas or xarray

To read the dataset to an xarray Dataset use to_xarray:

[27]:
sst = nc.open_data("sst.mon.mean.nc")
sst.to_xarray()
[27]:
<xarray.Dataset>
Dimensions:  (lat: 180, lon: 360, time: 2028)
Coordinates:
  * lat      (lat) float32 89.5 88.5 87.5 86.5 85.5 ... -86.5 -87.5 -88.5 -89.5
  * lon      (lon) float32 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5
  * time     (time) datetime64[ns] 1850-01-01 1850-02-01 ... 2018-12-01
Data variables:
    sst      (time, lat, lon) float32 ...
Attributes:
    title:            created 12/2013 from data provided by JRA
    history:          Created 12/2012 from data obtained from JRA by ESRL/PSD
    platform:         Analyses
    citation:         Hirahara, S., Ishii, M., and Y. Fukuda,2014: Centennial...
    institution:      NOAA ESRL/PSD
    Conventions:      CF-1.2
    References:       http://www.esrl.noaa.gov/psd/data/gridded/cobe2.html
    dataset_title:    COBE-SST2 Sea Surface Temperature and Ice
    original_source:  https://climate.mri-jma.go.jp/pub/ocean/cobe-sst2/

To read the dataset in as a pandas dataframe use to_dataframe:

[28]:
sst.to_dataframe()
[28]:
sst
lat lon time
89.5 0.5 1850-01-01 -1.712
1850-02-01 -1.698
1850-03-01 -1.707
1850-04-01 -1.742
1850-05-01 -1.725
... ... ... ...
-89.5 359.5 2018-08-01 NaN
2018-09-01 NaN
2018-10-01 NaN
2018-11-01 NaN
2018-12-01 NaN

131414400 rows Γ— 1 columns

How to calculate cell areasΒΆ

If we want to calculate the area of each cell in a dataset, we use the cell_area method. The join argument let’s you choose whether to join the cell areas to the existing dataset, or to only include cell areas in the dataset.

[29]:
sst = nc.open_data("sst.mon.mean.nc")
sst.cell_areas(join=False)
sst.plot()
[29]:

How to use urlsΒΆ

If a file is located at a url, we can send it to open_data:

[30]:
url = "ftp://ftp.cdc.noaa.gov/Datasets/COBE2/sst.mon.ltm.1981-2010.nc"
sst = nc.open_data(url)
Downloading ftp://ftp.cdc.noaa.gov/Datasets/COBE2/sst.mon.ltm.1981-2010.nc

This will download the file from the url and save it as a temp file. We can then work with it as usual. A future release of nctoolkit will have thredds support.

How to calculate an ensemble averageΒΆ

nctoolkit has built in methods for working with ensembles. Let’s start by splitting the 1850-2019 sst dataset into an ensemble, where each file is a separate year:

[31]:
sst = nc.open_data("sst.mon.mean.nc")
sst.split("year")

An ensemble mean can be calculated in two ways. First, we can calculate the mean in each time step. So here the files have temperature from 1850 onwards. We can calculate the monthly mean temperature over that time period as follows, and from there we can calculate the global mean:

[32]:
sst.ensemble_mean()
sst.spatial_mean()
sst.plot()
[32]:

We might want to calculate the average over all time steps, i.e. calculating mean temperature since 1850. We do this by changing the ignore_time argument:

[33]:
sst = nc.open_data("sst.mon.mean.nc")
sst.split("year")
sst.ensemble_mean(ignore_time=True)
sst.plot()
[33]: