API Reference#

Session options#


Define session options.

Opening/copying data#

open_data([x, checks])

Read netCDF data as a Dataset object

open_url([x, ftp_details, wait, file_stop])

Read netCDF data from a url as a DataSet object

open_thredds([x, wait, checks])

Read thredds data as a Dataset object


Open a geotiff and convert to a Dataset This requires rioxarray to be installed.


Convert an xarray dataset to an nctoolkit dataset This will first save the xarray dataset as a temporary netCDF file.


Make a deep copy of an DataSet object.

Merging or analyzing multiple datasets#

merge(\*datasets[, match])

Merge datasets

cor_time([x, y])

Calculate the temporal correlation coefficient between two datasets This will calculate the temporal correlation coefficient, for each time step, between two datasets.

cor_space([x, y])

Calculate the spatial correlation coefficient between two datasets This will calculate the spatial correlation coefficient, for each time step, between two datasets.

Adding and removing files to a dataset#

DataSet.append(self[, x])

append: Add new file(s) to a dataset.

DataSet.remove(self[, x])

remove: Remove file(s) from a dataset

Accessing attributes#


List variables contained in a dataset


Detailed list of variables contained in a dataset.


List times contained in a dataset


List years contained in a dataset


List months contained in a dataset


List levels contained in a dataset


The size of an object This will print the number of files, total size, and smallest and largest files in an DataSet object.


The current file or files in the DataSet object


The history of operations on the DataSet


The starting file or files of the DataSet object


List calendars of dataset files


List formats of files contained in a dataset


DataSet.plot(self[, vars, autoscale, out, coast])

plot: Automatically plot a dataset.

Variable modification#

DataSet.assign(self[, drop])

assign: Create new variables using mathematical operations on existing variables.

DataSet.rename(self[, newnames])

rename: Rename variables in a dataset

DataSet.as_missing(self[, value])

Change a range or individual value to missing.

DataSet.missing_as(self[, value])

Convert missing values to a constant

DataSet.set_fill(self[, value])

Set the fill value

DataSet.sum_all(self[, drop, new_name])

sum_all: Calculate the sum of all variables for each time step

netCDF file attribute modification#

DataSet.set_longnames(self[, name_dict])

Set the long names of variables

DataSet.set_units(self[, unit_dict])

Set the units for variables

Vertical/level methods#


top: Extract the top/surface level from a dataset


bottom: Extract the bottom level from a dataset

DataSet.vertical_interp(self[, levels, …])

vertical_interp: Verticaly interpolate a dataset based on given vertical levels

DataSet.vertical_mean(self[, thickness, …])

vertical_mean: Calculate the depth-averaged mean for each variable.


vertical_min: Calculate the vertical minimum of variable values.


vertical_max: Calculate the vertical maximum of variable values.


vertical_range: Calculate the vertical range of variable values.


vertical_sum: Calculate the vertical sum of variable values.

DataSet.vertical_integration(self[, …])

vertical_integration: Calculate the vertically integrated sum over the water column.


vertical_cumsum: Calculate the vertical sum of variable values.


Invert the levels of 3D variables.


bottom_mask: Create a mask identifying the deepest cell without missing values..

Rolling methods#

DataSet.rolling_mean(self[, window, align])

rolling_mean: Calculate a rolling mean based on a window

DataSet.rolling_min(self[, window, align])

rolling_min: Calculate a rolling minimum based on a window

DataSet.rolling_max(self[, window, align])

rolling_max: Calculate a rolling maximum based on a window

DataSet.rolling_sum(self[, window, align])

rolling_sum: Calculate a rolling sum based on a window

DataSet.rolling_range(self[, window, align])

rolling_range: Calculate a rolling range based on a window

DataSet.rolling_stdev(self[, window, align])

rolling_stdev: Calculate a rolling standard deviation based on a window

DataSet.rolling_var(self[, window, align])

rolling_var: Calculate a rolling variance based on a window

Evaluation setting#


Run all stored commands in a dataset

Cleaning functions#

Ensemble creation#

create_ensemble([path, recursive])

create_ensemble: Generate an ensemble of files from a directory.

Arithemetic methods#


abs: Method to get the absolute value of variables

DataSet.add(self[, x, var])

add: Add to a dataset

DataSet.assign(self[, drop])

assign: Create new variables using mathematical operations on existing variables.


exp: Method to get the exponential of variables


log: Method to get the natural log, ln, of variables


log10: Method to get the base 10 log, log10, of variables

DataSet.multiply(self[, x, var])

multiply: Multiply a dataset.

DataSet.power(self[, x])

power: Powers of variables in dataset


sqrt: Method to get the square root of variables


square: Method to get the square of variables

DataSet.subtract(self[, x, var])

subtract: Subtract from a dataset.

DataSet.divide(self[, x, var])

divide: Divide the data.

Ensemble statistics#

DataSet.ensemble_mean(self[, nco, ignore_time])

ensemble_mean: Calculate an ensemble mean

DataSet.ensemble_min(self[, nco, ignore_time])

ensemble_min: Calculate an ensemble minimum.

DataSet.ensemble_max(self[, nco, ignore_time])

ensemble_max: Calculate an ensemble maximum

DataSet.ensemble_percentile(self[, p])

ensemble_percentile: Calculate an ensemble percentile.


ensemble_range: Calculate an ensemble range


ensemble_stdev: Calculate an ensemble standard deviation


ensemble_sum: Calculate an ensemble sum


ensemble_var: Calculate an ensemble variance

Subsetting operations#

DataSet.subset(self, \*\*kwargs)

subset: A method for subsetting datasets to specific variables, years, longitudes etc.

DataSet.crop(self[, lon, lat, nco, nco_vars])

crop: Crop to a rectangular longitude and latitude box

DataSet.drop(self, \*\*kwargs)

drop: Remove variables, days, months, years or time steps from a dataset

Time-based methods#

DataSet.set_date(self[, year, month, day, …])

Set the date in a dataset

DataSet.set_day(self, x)

Set the day for each time step in a dataset

DataSet.shift(self, \*\*kwargs)

shift: Shift times in dataset by a number of hours, days, months, or years.

Interpolation, matching and resampling methods#

DataSet.regrid(self[, grid, method, …])

regrid: Regrid a dataset to a target grid

DataSet.to_latlon(self[, lon, lat, res, …])

to_latlon: Regrid a dataset to a regular latlon grid

DataSet.match_points(self[, df, variables, …])

match_points: Match dataset to a spatiotemporal points dataframe

DataSet.resample_grid(self[, factor])

resample_grid: Resample the horizontal grid of a dataset

DataSet.time_interp(self[, start, end, …])

time_interp: Temporally interpolate variables based on date range and time resolution

DataSet.timestep_interp(self[, steps])

timestep_interp: Temporally interpolate a dataset to given number of time steps between existing time steps

DataSet.fill_na(self[, n])

fill_na: Fill missing values with a distance-weighted average.

DataSet.box_mean(self[, x, y])

box_mean: Calculate the grid box mean for all variables.

DataSet.box_max(self[, x, y])

box_max: Calculate the grid box max for all variables.

DataSet.box_min(self[, x, y])

box_min: Calculate the grid box min for all variables.

DataSet.box_sum(self[, x, y])

box_sum: Calculate the grid box sum for all variables.

DataSet.box_range(self[, x, y])

box_range: Calculate the grid box range for all variables.

Masking methods#

DataSet.mask_box(self[, lon, lat])

mask_box: Mask a lon/lat box

Anomaly methods#

DataSet.annual_anomaly(self[, baseline, …])

annual_anomaly: Calculate annual anomalies for each variable based on a baseline period.

DataSet.monthly_anomaly(self[, baseline])

monthly:anomaly: Calculate monthly anomalies based on a baseline period.

Statistical methods#

DataSet.tmean(self[, over, align])

tmean: Calculate the temporal mean of all variables.

DataSet.tmin(self[, over, align])

tmin: Calculate the temporal minimum of all variables.

DataSet.tmedian(self[, over, align])

tmedian: Calculate the temporal median of all variables.

DataSet.tpercentile(self[, p, over, align])

tpercentile: Calculate the temporal percentile of all variables Useful for monthly percentile, annual/yearly percentile, seasonal percentile, daily percentile, daily climatology, monthly climatology, seasonal climatology

DataSet.tmax(self[, over, align])

tmax: Calculate the temporal maximum of all variables.

DataSet.tsum(self[, over, align])

tsum: Calculate the temporal sum of all variables.

DataSet.trange(self[, over, align])

trange: Calculate the temporal range of all variables Useful for: monthly range, annual/yearly range, seasonal range, daily range, daily climatology, monthly climatology, seasonal climatology

DataSet.tstdev(self[, over, align])

tstdev: Calculate the temporal standard deviation of all variables Useful for: monthly standard deviation, annual/yearly standard deviation, seasonal standard deviation, daily standard deviation, daily climatology, monthly climatology, seasonal climatology

DataSet.tcumsum(self[, align])

tcumsum: Calculate the temporal cumulative sum of all variables

DataSet.tvar(self[, over, align])

tvar: Calculate the temporal variance of all variables Useful for: monthly variance, annual/yearly variance, seasonal variance, daily variance, daily climatology, monthly climatology, seasonal climatology

DataSet.cor_space(self[, var1, var2])

cor_space: Calculate the correlation correct between two variables in space.

DataSet.cor_time(self[, var1, var2])

cor_time: Calculate the correlation correct in time between two variables


spatial_mean: Calculate the area weighted spatial mean for all variables.


spatial_min: Calculate the spatial minimum for all variables.


spatial_max: Calculate the spatial maximum for all variables.

DataSet.spatial_percentile(self[, p])

spatial_percentile: Calculate the spatial percentile for all variables


spatial_range: Calculate the spatial range for all variables.

DataSet.spatial_sum(self[, by_area])

spatial_sum: Calculate the spatial sum for all variables.


spatial_stdev: Calculate the spatial standard deviation for all variables.


spatial_var: Calculate the spatial variance for all variables.

DataSet.centre(self[, by, by_area])

centre: Calculate the latitudinal or longitudinal centre for each year/month combination in files.


zonal_mean: Calculate the zonal mean for each time step


zonal_min: Calculate the zonal minimum for each time step


zonal_max: Calculate the zonal maximum for each time step


zonal_range: Calculate the zonal range for each time step

DataSet.zonal_sum(self[, by_area])

zonal_sum: Calculate the zonal sum for each time step


meridonial_mean: Calculate the meridonial mean for each year/month combination in files.


meridonial_min: Calculate the meridonial minimum for each year/month combination in files.


meridonial_max: Calculate the meridonial maximum for each year/month combination in files.


meridonial_range: Calculate the meridonial range for each year/month combination in files.

Merging methods#

DataSet.merge(self[, join, match, check])

merge: Merge a multi-file ensemble into a single file

Splitting methods#

DataSet.split(self[, by])

split: Split the dataset

Output and formatting methods#

DataSet.to_nc(self, out[, zip, overwrite])

to_nc: Save a dataset to a named file.

DataSet.to_xarray(self[, decode_times])

to_xarray: Open a dataset as an xarray object

DataSet.to_dataframe(self[, decode_times])

to_dataframe: Convert a dataset to a pandas data frame


zip: Zip the dataset

DataSet.format(self[, ext])

format: Change the netCDF format of a dataset.

Miscellaneous methods#

DataSet.na_count(self[, over, align])

na_count: Calculate the number of missing values.

DataSet.na_frac(self[, over, align])

na_frac: Calculate the fraction of missing values in each grid cell across all time steps.

DataSet.distribute(self[, m, n])

distribute: Split the dataset into multiple evenly sized horizontal and vertical new files


Collect a dataset that has been split using distribute

DataSet.cell_area(self[, join])

cell_area: Calculate the area of grid cells.

DataSet.first_above(self[, x])

first_above: Identify the time step when a value is first above a threshold.

DataSet.first_below(self[, x])

first_below: Identify the time step when a value is first below a threshold This will do the comparison with either a number, a Dataset or a netCDF file.

DataSet.last_above(self[, x])

last_above: Identify the final time step when a value is above a threshold This will do the comparison with either a number, a Dataset or a netCDF file.

DataSet.last_below(self[, x])

last_below: Identify the last time step when a value is below a threshold This will do the comparison with either a number, a Dataset or a netCDF file.

DataSet.cdo_command(self[, command, …])

cdo_command: Apply a cdo command

DataSet.nco_command(self[, command, ensemble])

Apply an nco command

DataSet.compare(self[, expression])

Compare all variables to a constant

DataSet.gt(self, x)

Method to calculate if variable in dataset is greater than that in another file or dataset This currently only works with single file datasets

DataSet.lt(self, x)

Method to calculate if variable in dataset is less than that in another file or dataset This currently only works with single file datasets


reduce_dims: Reduce dimensions of data

DataSet.reduce_grid(self[, mask])

reduce_grid: Reduce the dataset to non-zero locations in a mask

DataSet.set_precision(self, x)

Set the precision in a dataset


check: Check contents of files for common data problems.


is_corrupt: Check if files are corrupt


A quick hack to change the grid file in North West European shelf Nemo grids.

DataSet.set_gridtype(self, grid)

Set the grid type.


surface_mask: Create a mask identifying the shallowest cell without missing values.

DataSet.strip_variables(self[, vars])

strip_variables: Remove any variables, such as bnds etc., from variables.


Remove leap years.

DataSet.as_double(self, x)

Set a variable/dimension to double This is mostly useful for cases when time is stored as an int, but you need a double

DataSet.as_type(self, x)

Set a variable/dimension to double This is mostly useful for cases when time is stored as an int, but you need a double


Simple method to fully reset a datset

Ecological methods#

DataSet.phenology(self[, var, metric, p])

phenology: Calculate phenologies from a dataset