API Reference¶

Session options¶

options(\*\*kwargs)

Define session options.

Opening/copying data¶

`open_data`([x, checks])	Read netCDF data as a DataSet object
`open_url`([x, ftp_details, wait, file_stop])	Read netCDF data from a url as a DataSet object
`open_thredds`([x, wait, checks])	Read thredds data as a DataSet object
`open_geotiff`([x])	Read geotiff and convert to nctoolkit dataset
`from_xarray`(ds)	Convert an xarray dataset to an nctoolkit dataset This will first save the xarray dataset as a temporary netCDF file.
`DataSet.copy`(self)	Make a deep copy of an DataSet object Note: This will not make disk copies of the temporary files underlying datasets, so it will be disk-space efficient.

Merging or analyzing multiple datasets¶

`merge`(\*datasets[, match])	Merge datasets
`cor_time`([x, y])	Calculate the temporal correlation coefficient between two datasets This will calculate the temporal correlation coefficient, for each time step, between two datasets.
`cor_space`([x, y])	Calculate the spatial correlation coefficient between two datasets This will calculate the spatial correlation coefficient, for each time step, between two datasets.

Adding and removing files to a dataset¶

`append`
`remove`

Accessing attributes¶

`DataSet.variables`	List variables contained in a dataset
`DataSet.contents`	Detailed list of variables contained in a dataset.
`DataSet.times`	List times contained in a dataset
`DataSet.years`	List years contained in a dataset
`DataSet.months`	List months contained in a dataset
`DataSet.levels`	List levels contained in a dataset
`DataSet.size`	The size of an object This will print the number of files, total size, and smallest and largest files in an DataSet object.
`DataSet.current`	The current file or files in the DataSet object
`DataSet.history`	The history of operations on the DataSet
`DataSet.start`	The starting file or files of the DataSet object
`DataSet.calendar`	List calendars of dataset files
`DataSet.ncformat`	List formats of files contained in a dataset

Plotting¶

DataSet.plot(self[, vars, autoscale, out])

Variable modification¶

`DataSet.assign`(self[, drop])	Create new variables Existing columns that are re-assigned will be overwritten. :param drop: Set to True if you want existing variables to be removed once the new ones have been created. Defaults to False.
`DataSet.rename`(self, newnames)	Rename variables in a dataset
`DataSet.set_missing`(self[, value])	Set the missing value for a single number or a range
`DataSet.sum_all`(self[, drop])	Calculate the sum of all variables for each time step

netCDF file attribute modification¶

`DataSet.set_longnames`(self[, name_dict])	Set the long names of variables
`DataSet.set_units`(self[, unit_dict])	Set the units for variables

Vertical/level methods¶

`DataSet.top`(self)	Extract the top/surface level from a dataset This extracts the first vertical level from each file in a dataset.
`DataSet.bottom`(self)	Extract the bottom level from a dataset This extracts the bottom level from each netCDF file.
`DataSet.vertical_interp`(self[, levels])	Verticaly interpolate a dataset based on given vertical levels This is calculated for each time step and grid cell
`DataSet.vertical_mean`(self[, thickness, …])	Calculate the depth-averaged mean for each variable This is calculated for each time step and grid cell
`DataSet.vertical_min`(self)	Calculate the vertical minimum of variable values This is calculated for each time step and grid cell
`DataSet.vertical_max`(self)	Calculate the vertical maximum of variable values This is calculated for each time step and grid cell
`DataSet.vertical_range`(self)	Calculate the vertical range of variable values This is calculated for each time step and grid cell
`DataSet.vertical_sum`(self)	Calculate the vertical sum of variable values This is calculated for each time step and grid cell
`DataSet.vertical_integration`(self[, …])	Calculate the vertically integrated sum over the water column This calculates the sum of the variable multiplied by the cell thickness
`DataSet.vertical_cumsum`(self)	Calculate the vertical sum of variable values This is calculated for each time step and grid cell
`DataSet.invert_levels`(self)	Invert the levels of 3D variables This is calculated for each time step and grid cell
`DataSet.bottom_mask`(self)	Create a mask identifying the deepest cell without missing values.

Rolling methods¶

`DataSet.rolling_mean`(self[, window])	Calculate a rolling mean based on a window Time output is the middle time of the window.
`DataSet.rolling_min`(self[, window])	Calculate a rolling minimum based on a window Time output is the middle time of the window.
`DataSet.rolling_max`(self[, window])	Calculate a rolling maximum based on a window Time output is the middle time of the window.
`DataSet.rolling_sum`(self[, window])	Calculate a rolling sum based on a window Time output is the middle time of the window.
`DataSet.rolling_range`(self[, window])	Calculate a rolling range based on a window Time output is the middle time of the window.

Evaluation setting¶

DataSet.run(self)

Run all stored commands in a dataset

Cleaning functions¶

Ensemble creation¶

create_ensemble([path, recursive])

Generate an ensemble

Arithemetic methods¶

`DataSet.abs`(self)	Method to get the absolute value of variables
`DataSet.add`(self[, x, var])	Add to a dataset This will add a constant, another dataset or a netCDF file to the dataset.
`DataSet.assign`(self[, drop])	Create new variables Existing columns that are re-assigned will be overwritten. :param drop: Set to True if you want existing variables to be removed once the new ones have been created. Defaults to False.
`DataSet.exp`(self)	Method to get the exponential of variables
`DataSet.log`(self)	Method to get the natural log of variables
`DataSet.log10`(self)	Method to get the base 10 log of variables
`DataSet.multiply`(self[, x, var])	Multiply a dataset This will multiply a dataset by a constant, another dataset or a netCDF file. :param x: An int, float, single file dataset or netCDF file to multiply the dataset by. If multiplying by a dataset or single file there must only be a single variable in it, unless var is supplied. The grids must be the same. :type x: int, float, DataSet or netCDF file :param var: A variable in the x to multiply the dataset by :type var: str.
`DataSet.power`(self[, x])	Powers of variables in dataset :param x: An int or float to take the variables to the power of :type x: int, float
`DataSet.sqrt`(self)	Method to get the square root of variables
`DataSet.square`(self)	Method to get the square of variables
`DataSet.subtract`(self[, x, var])	Subtract from a dataset This will subtract a constant, another dataset or a netCDF file from the dataset. :param x: An int, float, single file dataset or netCDF file to subtract from the dataset. If a dataset or netCDF is supplied this must only have one variable, unless var is provided. The grids must be the same. :type x: int, float, DataSet or netCDF file :param var: A variable in the x to use for the operation :type var: str.
`DataSet.divide`(self[, x, var])	Divide the data This will divide the dataset by a constant, another dataset or a netCDF file. :param x: An int, float, single file dataset or netCDF file to divide the dataset by. If a dataset or netCDF file is supplied, this must have only one variable, unless var is provided. The grids must be the same. :type x: int, float, DataSet or netCDF file :param var: A variable in the x to use for the operation :type var: str.

Ensemble statistics¶

`DataSet.ensemble_mean`(self[, nco, ignore_time])	Calculate an ensemble mean
`DataSet.ensemble_min`(self[, nco, ignore_time])	Calculate an ensemble min
`DataSet.ensemble_max`(self[, nco, ignore_time])	Calculate an ensemble maximum
`DataSet.ensemble_percentile`(self[, p])	Calculate an ensemble percentile This will calculate the percentles for each time step in the files.
`DataSet.ensemble_range`(self)	Calculate an ensemble range The range is calculated for each time step; for example, if each file in the ensemble has 12 months of data the statistic will be calculated for each month.
`DataSet.ensemble_stdev`(self)	Calculate an ensemble standard deviation
`DataSet.ensemble_sum`(self)	Calculate an ensemble sum The sum is calculated for each time step; for example, if each file in the ensemble has 12 months of data the statistic will be calculated for each month.
`DataSet.ensemble_var`(self)	Calculate an ensemble variance

Subsetting operations¶

`DataSet.crop`(self[, lon, lat, nco, nco_vars])	Crop to a rectangular longitude and latitude box
`DataSet.select`(self, \\kwargs)	A method for subsetting datasets to specific variables, years, longitudes etc.
`DataSet.drop`(self, \\kwargs)	Remove variables This will remove stated variables from files in the dataset.

Time-based methods¶

`DataSet.set_date`(self[, year, month, day, …])	Set the date in a dataset You should only do this if you have to fix/change a dataset with a single, not multiple dates.
`DataSet.shift`(self, \\kwargs)	Shift method.

Interpolation, matching and resampling methods¶

`DataSet.regrid`(self[, grid, method, recycle])	Regrid a dataset to a target grid
`DataSet.to_latlon`(self[, lon, lat, res, …])	Regrid a dataset to a regular latlon grid
`DataSet.match_points`(self[, df, variables, …])	Match dataset to a spatiotemporal points dataframe :param df: The column names must be made up of a subset of “lon”, “lat”, “year”, “month”, “day” and “depth” :type df: pandas dataframe containing the spatiotemporal points to match with. :param variables: Str or list of variables. All variables are matched up if this is not supplied. :type variables: str or list :param depths: If each cell has different vertical levels, this must be provided as a dataset. If each cell has the same vertical levels, provide it as a list. If this is not supplied nctoolkit will try to figure out what they are. Only required if carrying out vertical matchups. :type depths: nctoolkit dataset or list giving depths :param tmean: Set to True or False, depending on whether you want temporal averaging at the temporal resolution given by df. For example, if you only had months in df, but had daily data in ds, you might want to calculate a daily average in the monthly dataset. This is equivalent to apply ds.tmean(..) to the dataset. :type tmean: bool :param top: Set to True if you want only the top/surface level of the dataset to be selected for matching. :type top: bool :param nan: Value or range of values to set to nan. Defaults to 0. Only required if values in dataset need changed to missing :type nan: float or list :param regrid: Regridding method. Defaults to “bil”. Options available are those in nctoolkit regrid method. “nn” for nearest neighbour. :type regrid: str.
`DataSet.resample_grid`(self[, factor])	Resample the horizontal grid of a dataset
`DataSet.time_interp`(self[, start, end, …])	Temporally interpolate variables based on date range and time resolution
`DataSet.timestep_interp`(self[, steps])	Temporally interpolate a dataset to given number of time steps between existing time steps
`DataSet.fill_na`(self[, n])	Fill missing values with a distance-weighted average.
`DataSet.box_mean`(self[, x, y])	Calculate the grid box mean for all variables This is performed for each time step.
`DataSet.box_max`(self[, x, y])	Calculate the grid box max for all variables This is performed for each time step.
`DataSet.box_min`(self[, x, y])	Calculate the grid box min for all variables This is performed for each time step.
`DataSet.box_sum`(self[, x, y])	Calculate the grid box sum for all variables This is performed for each time step.
`DataSet.box_range`(self[, x, y])	Calculate the grid box range for all variables This is performed for each time step.

Masking methods¶

DataSet.mask_box(self[, lon, lat])

Mask a lon/lat box

Anomaly methods¶

`DataSet.annual_anomaly`(self[, baseline, …])	Calculate annual anomalies for each variable based on a baseline period The anomaly is derived by first calculating the climatological annual mean for the given baseline period.
`DataSet.monthly_anomaly`(self[, baseline])	Calculate monthly anomalies based on a baseline period The anomaly is derived by first calculating the climatological monthly mean for the given baseline period.

Statistical methods¶

`DataSet.tmean`(self[, over])	Calculate the temporal mean of all variables
`DataSet.tmin`(self[, over])	Calculate the temporal minimum of all variables
`DataSet.tmedian`(self[, over])	Calculate the temporal median of all variables :param over: Time periods to average over.
`DataSet.tpercentile`(self[, p, over])	Calculate the temporal percentile of all variables
`DataSet.tmax`(self[, over])	Calculate the temporal maximum of all variables
`DataSet.tsum`(self[, over])	Calculate the temporal sum of all variables
`DataSet.trange`(self[, over])	Calculate the temporal range of all variables
`DataSet.tstdev`(self[, over])	Calculate the temporal standard deviation of all variables
`DataSet.tcumsum`(self)	Calculate the temporal cumulative sum of all variables
`DataSet.tvar`(self[, over])	Calculate the temporal variance of all variables
`DataSet.cor_space`(self[, var1, var2])	Calculate the correlation correct between two variables in space This is calculated for each time step.
`DataSet.cor_time`(self[, var1, var2])	Calculate the correlation correct in time between two variables The correlation is calculated for each grid cell, ignoring missing values.
`DataSet.spatial_mean`(self)	Calculate the area weighted spatial mean for all variables This is performed for each time step.
`DataSet.spatial_min`(self)	Calculate the spatial minimum for all variables This is performed for each time step.
`DataSet.spatial_max`(self)	Calculate the spatial maximum for all variables This is performed for each time step.
`DataSet.spatial_percentile`(self[, p])	Calculate the spatial sum for all variables This is performed for each time step.
`DataSet.spatial_range`(self)	Calculate the spatial range for all variables This is performed for each time step.
`DataSet.spatial_sum`(self[, by_area])	Calculate the spatial sum for all variables This is performed for each time step.
`DataSet.spatial_stdev`(self)	Calculate the spatial range for all variables This is performed for each time step.
`DataSet.spatial_var`(self)	Calculate the spatial range for all variables This is performed for each time step.
`DataSet.centre`(self[, by, by_area])	Calculate the latitudinal or longitudinal centre for each year/month combination in files. This applies to each file in an ensemble. by : str Set to ‘latitude’ if you want the latitiduinal centre calculated. ‘longitude’ for longitudinal. by_area : bool If the variable is a value/m2 type variable, set to True, otherwise set to False.
`DataSet.zonal_mean`(self)	Calculate the zonal mean for each year/month combination in files.
`DataSet.zonal_min`(self)	Calculate the zonal minimum for each year/month combination in files.
`DataSet.zonal_max`(self)	Calculate the zonal maximum for each year/month combination in files.
`DataSet.zonal_range`(self)	Calculate the zonal range for each year/month combination in files.
`DataSet.meridonial_mean`(self)	Calculate the meridonial mean for each year/month combination in files.
`DataSet.meridonial_min`(self)	Calculate the meridonial minimum for each year/month combination in files.
`DataSet.meridonial_max`(self)	Calculate the meridonial maximum for each year/month combination in files.
`DataSet.meridonial_range`(self)	Calculate the meridonial range for each year/month combination in files.

Merging methods¶

DataSet.merge(self[, join, match, check])

Merge a multi-file ensemble into a single file 2 methods are available.

Splitting methods¶

DataSet.split(self[, by])

Split the dataset Each file in the ensemble will be separated into new files based on the splitting argument.

Output and formatting methods¶

`DataSet.to_nc`(self, out[, zip, overwrite])	Save a dataset to a named file This will only work with single file datasets.
`DataSet.to_xarray`(self[, decode_times])	Open a dataset as an xarray object
`DataSet.to_dataframe`(self[, decode_times])	Open a dataset as a pandas data frame
`DataSet.zip`(self)	Zip the dataset This will compress the files within the dataset.
`DataSet.format`(self[, ext])	Zip the dataset This will compress the files within the dataset. This works lazily. :param ext: New format. Must be one of “nc”, “nc1”, “nc2”, “nc4” and “nc5” . netCDF = nc1 netCDF version 2 (64-bit offset) = nc2/nc netCDF4 (HDF5) = nc4 netCDF4-classi = nc4c netCDF version 5 (64-bit data) = nc5 :type ext: str.

Miscellaneous methods¶

`DataSet.na_count`(self[, over])	Calculate the number of missing values
`DataSet.na_frac`(self[, over])	Calculate the number of missing values
`DataSet.distribute`(self[, m, n])	Split the dataset into multiple evenly sized horizontal and vertical new files
`DataSet.collect`(self)	Collect a dataset that has been split using distribute
`DataSet.cell_area`(self[, join])	Calculate the area of grid cells.
`DataSet.first_above`(self[, x])	Identify the time step when a value is first above a threshold This will do the comparison with either a number, a Dataset or a netCDF file. :param x: An int, float, single file dataset or netCDF file to use for the threshold(s). If comparing with a dataset or single file there must only be a single variable in it. The grids must be the same. :type x: int, float, DataSet or netCDF file.
`DataSet.first_below`(self[, x])	Identify the time step when a value is first below a threshold This will do the comparison with either a number, a Dataset or a netCDF file. :param x: An int, float, single file dataset or netCDF file to use for the threshold(s). If comparing with a dataset or single file there must only be a single variable in it. The grids must be the same. :type x: int, float, DataSet or netCDF file.
`DataSet.last_above`(self[, x])	Identify the final time step when a value is above a threshold This will do the comparison with either a number, a Dataset or a netCDF file. :param x: An int, float, single file dataset or netCDF file to use for the threshold(s). If comparing with a dataset or single file there must only be a single variable in it. The grids must be the same. :type x: int, float, DataSet or netCDF file.
`DataSet.last_below`(self[, x])	Identify the last time step when a value is below a threshold This will do the comparison with either a number, a Dataset or a netCDF file. :param x: An int, float, single file dataset or netCDF file to use for the threshold(s). If comparing with a dataset or single file there must only be a single variable in it. The grids must be the same. :type x: int, float, DataSet or netCDF file.
`DataSet.cdo_command`(self[, command, ensemble])	Apply a cdo command
`DataSet.nco_command`(self[, command, ensemble])	Apply an nco command
`DataSet.compare`(self[, expression])	Compare all variables to a constant
`DataSet.gt`(self, x)	Method to calculate if variable in dataset is greater than that in another file or dataset This currently only works with single file datasets
`DataSet.lt`(self, x)	Method to calculate if variable in dataset is less than that in another file or dataset This currently only works with single file datasets
`DataSet.reduce_dims`(self)	Reduce dimensions of data This will remove any dimensions with only one value.
`DataSet.reduce_grid`(self[, mask])	Reduce the dataset to non-zero locations in a mask :param mask: single variable dataset or path to .nc file. The mask must have an identical grid to the dataset. :type mask: str or dataset.
`DataSet.set_precision`(self, x)	Set the precision in a dataset
`DataSet.check`(self)	Check contents of files for common data problems.
`DataSet.is_corrupt`(self)	Check if files are corrupt
`DataSet.fix_nemo_ersem_grid`(self)	A quick hack to change the grid file in North West European shelf Nemo grids.
`DataSet.set_gridtype`(self, grid)	Set the grid type.
`DataSet.surface_mask`(self)	Create a mask identifying the shallowest cell without missing values.
`DataSet.strip_variables`(self[, vars])	Remove any variables, such as bnds etc., from variables.

Ecological methods¶

DataSet.phenology(self[, var, metric, p])

Calculate phenologies from a dataset Each file in an ensemble must only cover a single year, and ideally have all days.