API Reference

Session options

options(\*\*kwargs)

Define session options.

Opening/copying data

open_data([x, checks])

Read netCDF data as a DataSet object

open_url([x, ftp_details, wait, file_stop])

Read netCDF data from a url as a DataSet object

open_thredds([x, wait, checks])

Read thredds data as a DataSet object

open_geotiff([x])

Read geotiff and convert to nctoolkit dataset

from_xarray(ds)

Convert an xarray dataset to an nctoolkit dataset This will first save the xarray dataset as a temporary netCDF file.

DataSet.copy(self)

Make a deep copy of an DataSet object.

Merging or analyzing multiple datasets

merge(\*datasets[, match])

Merge datasets

cor_time([x, y])

Calculate the temporal correlation coefficient between two datasets This will calculate the temporal correlation coefficient, for each time step, between two datasets.

cor_space([x, y])

Calculate the spatial correlation coefficient between two datasets This will calculate the spatial correlation coefficient, for each time step, between two datasets.

Adding and removing files to a dataset

append

remove

Accessing attributes

DataSet.variables

List variables contained in a dataset

DataSet.contents

Detailed list of variables contained in a dataset.

DataSet.times

List times contained in a dataset

DataSet.years

List years contained in a dataset

DataSet.months

List months contained in a dataset

DataSet.levels

List levels contained in a dataset

DataSet.size

The size of an object This will print the number of files, total size, and smallest and largest files in an DataSet object.

DataSet.current

The current file or files in the DataSet object

DataSet.history

The history of operations on the DataSet

DataSet.start

The starting file or files of the DataSet object

DataSet.calendar

List calendars of dataset files

DataSet.ncformat

List formats of files contained in a dataset

Plotting

DataSet.plot(self[, vars, autoscale, out, coast])

Variable modification

DataSet.assign(self[, drop])

Create new variables Existing columns that are re-assigned will be overwritten.

DataSet.rename(self, newnames)

Rename variables in a dataset

DataSet.as_missing(self[, value])

Set the missing value for a single number or a range

DataSet.sum_all(self[, drop])

Calculate the sum of all variables for each time step

netCDF file attribute modification

DataSet.set_longnames(self[, name_dict])

Set the long names of variables

DataSet.set_units(self[, unit_dict])

Set the units for variables

Vertical/level methods

DataSet.top(self)

Extract the top/surface level from a dataset This extracts the first vertical level from each file in a dataset.

DataSet.bottom(self)

Extract the bottom level from a dataset This extracts the bottom level from each netCDF file.

DataSet.vertical_interp(self[, levels, …])

Verticaly interpolate a dataset based on given vertical levels This is calculated for each time step and grid cell Note: This requires consistent vertical levels in space.

DataSet.vertical_mean(self[, thickness, …])

Calculate the depth-averaged mean for each variable This is calculated for each time step and grid cell

DataSet.vertical_min(self)

Calculate the vertical minimum of variable values This is calculated for each time step and grid cell

DataSet.vertical_max(self)

Calculate the vertical maximum of variable values This is calculated for each time step and grid cell

DataSet.vertical_range(self)

Calculate the vertical range of variable values This is calculated for each time step and grid cell

DataSet.vertical_sum(self)

Calculate the vertical sum of variable values This is calculated for each time step and grid cell

DataSet.vertical_integration(self[, …])

Calculate the vertically integrated sum over the water column This calculates the sum of the variable multiplied by the cell thickness

DataSet.vertical_cumsum(self)

Calculate the vertical sum of variable values This is calculated for each time step and grid cell

DataSet.invert_levels(self)

Invert the levels of 3D variables This is calculated for each time step and grid cell

DataSet.bottom_mask(self)

Create a mask identifying the deepest cell without missing values.

Rolling methods

DataSet.rolling_mean(self[, window, align])

Calculate a rolling mean based on a window

DataSet.rolling_min(self[, window, align])

Calculate a rolling minimum based on a window

DataSet.rolling_max(self[, window, align])

Calculate a rolling maximum based on a window

DataSet.rolling_sum(self[, window, align])

Calculate a rolling sum based on a window

DataSet.rolling_range(self[, window, align])

Calculate a rolling range based on a window

DataSet.rolling_stdev(self[, window, align])

Calculate a rolling standard deviation based on a window

DataSet.rolling_var(self[, window, align])

Calculate a rolling variance based on a window

Evaluation setting

DataSet.run(self)

Run all stored commands in a dataset

Cleaning functions


Ensemble creation

create_ensemble([path, recursive])

Generate an ensemble

Arithemetic methods

DataSet.abs(self)

Method to get the absolute value of variables

DataSet.add(self[, x, var])

Add to a dataset This will add a constant, another dataset or a netCDF file to the dataset.

DataSet.assign(self[, drop])

Create new variables Existing columns that are re-assigned will be overwritten.

DataSet.exp(self)

Method to get the exponential of variables

DataSet.log(self)

Method to get the natural log of variables

DataSet.log10(self)

Method to get the base 10 log of variables

DataSet.multiply(self[, x, var])

Multiply a dataset This will multiply a dataset by a constant, another dataset or a netCDF file.

DataSet.power(self[, x])

Powers of variables in dataset

DataSet.sqrt(self)

Method to get the square root of variables

DataSet.square(self)

Method to get the square of variables

DataSet.subtract(self[, x, var])

Subtract from a dataset This will subtract a constant, another dataset or a netCDF file from the dataset.

DataSet.divide(self[, x, var])

Divide the data This will divide the dataset by a constant, another dataset or a netCDF file.

Ensemble statistics

DataSet.ensemble_mean(self[, nco, ignore_time])

Calculate an ensemble mean

DataSet.ensemble_min(self[, nco, ignore_time])

Calculate an ensemble min

DataSet.ensemble_max(self[, nco, ignore_time])

Calculate an ensemble maximum

DataSet.ensemble_percentile(self[, p])

Calculate an ensemble percentile This will calculate the percentles for each time step in the files.

DataSet.ensemble_range(self)

Calculate an ensemble range The range is calculated for each time step; for example, if each file in the ensemble has 12 months of data the statistic will be calculated for each month.

DataSet.ensemble_stdev(self)

Calculate an ensemble standard deviation

DataSet.ensemble_sum(self)

Calculate an ensemble sum The sum is calculated for each time step; for example, if each file in the ensemble has 12 months of data the statistic will be calculated for each month.

DataSet.ensemble_var(self)

Calculate an ensemble variance

Subsetting operations

DataSet.subset(self, \*\*kwargs)

A method for subsetting datasets to specific variables, years, longitudes etc.

DataSet.crop(self[, lon, lat, nco, nco_vars])

Crop to a rectangular longitude and latitude box

DataSet.drop(self, \*\*kwargs)

Remove variables This will remove stated variables from files in the dataset.

Time-based methods

DataSet.set_date(self[, year, month, day, …])

Set the date in a dataset You should only do this if you have to fix/change a dataset with a single, not multiple dates.

DataSet.set_year(self, x)

Set the year in a dataset

DataSet.shift(self, \*\*kwargs)

Shift method.

Interpolation, matching and resampling methods

DataSet.regrid(self[, grid, method, recycle])

Regrid a dataset to a target grid

DataSet.to_latlon(self[, lon, lat, res, …])

Regrid a dataset to a regular latlon grid

DataSet.match_points(self[, df, variables, …])

Match dataset to a spatiotemporal points dataframe

DataSet.resample_grid(self[, factor])

Resample the horizontal grid of a dataset

DataSet.time_interp(self[, start, end, …])

Temporally interpolate variables based on date range and time resolution

DataSet.timestep_interp(self[, steps])

Temporally interpolate a dataset to given number of time steps between existing time steps

DataSet.fill_na(self[, n])

Fill missing values with a distance-weighted average.

DataSet.box_mean(self[, x, y])

Calculate the grid box mean for all variables This is performed for each time step.

DataSet.box_max(self[, x, y])

Calculate the grid box max for all variables This is performed for each time step.

DataSet.box_min(self[, x, y])

Calculate the grid box min for all variables This is performed for each time step.

DataSet.box_sum(self[, x, y])

Calculate the grid box sum for all variables This is performed for each time step.

DataSet.box_range(self[, x, y])

Calculate the grid box range for all variables This is performed for each time step.

Masking methods

DataSet.mask_box(self[, lon, lat])

Mask a lon/lat box

Anomaly methods

DataSet.annual_anomaly(self[, baseline, …])

Calculate annual anomalies for each variable based on a baseline period The anomaly is derived by first calculating the climatological annual mean for the given baseline period.

DataSet.monthly_anomaly(self[, baseline])

Calculate monthly anomalies based on a baseline period The anomaly is derived by first calculating the climatological monthly mean for the given baseline period.

Statistical methods

DataSet.tmean(self[, over, align])

Calculate the temporal mean of all variables

DataSet.tmin(self[, over, align])

Calculate the temporal minimum of all variables

DataSet.tmedian(self[, over, align])

Calculate the temporal median of all variables

DataSet.tpercentile(self[, p, over, align])

Calculate the temporal percentile of all variables

DataSet.tmax(self[, over, align])

Calculate the temporal maximum of all variables

DataSet.tsum(self[, over, align])

Calculate the temporal sum of all variables

DataSet.trange(self[, over, align])

Calculate the temporal range of all variables

DataSet.tstdev(self[, over, align])

Calculate the temporal standard deviation of all variables

DataSet.tcumsum(self[, align])

Calculate the temporal cumulative sum of all variables

DataSet.tvar(self[, over, align])

Calculate the temporal variance of all variables

DataSet.cor_space(self[, var1, var2])

Calculate the correlation correct between two variables in space This is calculated for each time step.

DataSet.cor_time(self[, var1, var2])

Calculate the correlation correct in time between two variables The correlation is calculated for each grid cell, ignoring missing values.

DataSet.spatial_mean(self)

Calculate the area weighted spatial mean for all variables This is performed for each time step.

DataSet.spatial_min(self)

Calculate the spatial minimum for all variables This is performed for each time step.

DataSet.spatial_max(self)

Calculate the spatial maximum for all variables This is performed for each time step.

DataSet.spatial_percentile(self[, p])

Calculate the spatial sum for all variables This is performed for each time step.

DataSet.spatial_range(self)

Calculate the spatial range for all variables This is performed for each time step.

DataSet.spatial_sum(self[, by_area])

Calculate the spatial sum for all variables This is performed for each time step.

DataSet.spatial_stdev(self)

Calculate the spatial range for all variables This is performed for each time step.

DataSet.spatial_var(self)

Calculate the spatial range for all variables This is performed for each time step.

DataSet.centre(self[, by, by_area])

Calculate the latitudinal or longitudinal centre for each year/month combination in files.

DataSet.zonal_mean(self)

Calculate the zonal mean for each year/month combination in files.

DataSet.zonal_min(self)

Calculate the zonal minimum for each year/month combination in files.

DataSet.zonal_max(self)

Calculate the zonal maximum for each year/month combination in files.

DataSet.zonal_range(self)

Calculate the zonal range for each year/month combination in files.

DataSet.zonal_sum(self[, by_area])

Calculate the zonal sum for each year/month combination in files.

DataSet.meridonial_mean(self)

Calculate the meridonial mean for each year/month combination in files.

DataSet.meridonial_min(self)

Calculate the meridonial minimum for each year/month combination in files.

DataSet.meridonial_max(self)

Calculate the meridonial maximum for each year/month combination in files.

DataSet.meridonial_range(self)

Calculate the meridonial range for each year/month combination in files.

Merging methods

DataSet.merge(self[, join, match, check])

Merge a multi-file ensemble into a single file 2 methods are available.

Splitting methods

DataSet.split(self[, by])

Split the dataset Each file in the ensemble will be separated into new files based on the splitting argument.

Output and formatting methods

DataSet.to_nc(self, out[, zip, overwrite])

Save a dataset to a named file This will only work with single file datasets.

DataSet.to_xarray(self[, decode_times])

Open a dataset as an xarray object

DataSet.to_dataframe(self[, decode_times])

Open a dataset as a pandas data frame

DataSet.zip(self)

Zip the dataset This will compress the files within the dataset.

DataSet.format(self[, ext])

Zip the dataset This will compress the files within the dataset.

Miscellaneous methods

DataSet.na_count(self[, over, align])

Calculate the number of missing values

DataSet.na_frac(self[, over, align])

Calculate the number of missing values

DataSet.distribute(self[, m, n])

Split the dataset into multiple evenly sized horizontal and vertical new files

DataSet.collect(self)

Collect a dataset that has been split using distribute

DataSet.cell_area(self[, join])

Calculate the area of grid cells.

DataSet.first_above(self[, x])

Identify the time step when a value is first above a threshold This will do the comparison with either a number, a Dataset or a netCDF file.

DataSet.first_below(self[, x])

Identify the time step when a value is first below a threshold This will do the comparison with either a number, a Dataset or a netCDF file.

DataSet.last_above(self[, x])

Identify the final time step when a value is above a threshold This will do the comparison with either a number, a Dataset or a netCDF file.

DataSet.last_below(self[, x])

Identify the last time step when a value is below a threshold This will do the comparison with either a number, a Dataset or a netCDF file.

DataSet.cdo_command(self[, command, ensemble])

Apply a cdo command

DataSet.nco_command(self[, command, ensemble])

Apply an nco command

DataSet.compare(self[, expression])

Compare all variables to a constant

DataSet.gt(self, x)

Method to calculate if variable in dataset is greater than that in another file or dataset This currently only works with single file datasets

DataSet.lt(self, x)

Method to calculate if variable in dataset is less than that in another file or dataset This currently only works with single file datasets

DataSet.reduce_dims(self)

Reduce dimensions of data This will remove any dimensions with only one value.

DataSet.reduce_grid(self[, mask])

Reduce the dataset to non-zero locations in a mask

DataSet.set_precision(self, x)

Set the precision in a dataset

DataSet.check(self)

Check contents of files for common data problems.

DataSet.is_corrupt(self)

Check if files are corrupt

DataSet.fix_nemo_ersem_grid(self)

A quick hack to change the grid file in North West European shelf Nemo grids.

DataSet.set_gridtype(self, grid)

Set the grid type.

DataSet.surface_mask(self)

Create a mask identifying the shallowest cell without missing values.

DataSet.strip_variables(self[, vars])

Remove any variables, such as bnds etc., from variables.

DataSet.no_leaps(self)

Remove leap years.

DataSet.as_double(self, x)

Set a variable/dimension to double This is mostly useful for cases when time is stored as an int, but you need a double

Ecological methods

DataSet.phenology(self[, var, metric, p])

Calculate phenologies from a dataset Each file in an ensemble must only cover a single year, and ideally have all days.