Release of v0.5.1¶
This was a minor release made on 30th June 2022. It includes method enhancements.
The subset method now allows negative time slicing.
The set_missing method is deprecated and replaced with a less ambiguously named as_missing method.
The plot method will no longer show a plot title by default to make things cleaner.
The vertical_integration method now works with multi-file datasets and will not calculate vertical integrations for the thickness variable.
Some improvements have been made to improve error messages, and the check method now checks for data type of time.
A new method
as_type has been added for changing data type of individual variables and coordinates.
Release of v0.5.0¶
This relase was made on 13th June 2022. The match_points method now allows extrapolation to vertical depths.
Release of v0.4.9¶
This relase was made on 9th June 2022. The subset method now accepts levels.
Release of v0.4.8¶
This release improves temporal merging of large datasets. Previously on some systems this would fail on datasets made up of more than 1,000 files due to system limits. Under the hood, nctoolkit now deals with this.
The merge method also now contains a check argument that can be used to speed up merging of large datasets when you know the files can be merged problem-free. Previously, merge always checked if files being merged had the same variables when doing a temporal merge. This can now be switched off if you are confident this does not need to happen.
Release of v0.4.7¶
Version 0.4.7 was released on June 5th 2022.
This release contained a new method called match_points that can do matchups with a spatiotemporal dataframe.
Release of v0.4.6¶
Version 0.4.6 was released on June 3rd 2022.
This release will enhance existing methods.
select method will be replaced by
subset. This behave in the way same way as
select, but will also allow users to subset data base on longitude and latitude using the
lat as args.
The export methods
to_dataframe now allow only a subset of the data to be exported. Additional arguments can be sent to the methods, which will then be sent to the
The new matchpoint methods for matching netCDF and point data have been smoothed out with additional options.
Minor bug fix: The weighted in datasets with recycled regridding weights were not copied properly. This is now fixed.
Release of v0.4.5¶
Version 0.4.5 was released in late May 2022. This was a minor release that fixed an issue with
ds.variables when there were a) many variables and b) CDO version above 2.0.0.
Release of v0.4.4¶
Version 0.4.4 was released in late May 2022.
This version introduces a new class called Matchpoint which will allow automated matchups between netCDF files and point observations in pandas dataframes. This class is created using
nc.open_matchpoint. Matchups are generated by using the
ds now provides a more informative summary of dataset contents.
split method now automatically sorts the files, so that they are sorted by date when temporal splitting occurs.
tvariance` have been removed after periods of deprecation. Use
Release of v0.4.3¶
Version 0.4.3 was released in May 2022. This is release with some new methods, improvements to internals some bug fixes. Code written for previous 0.4x versions of nctoolkit will be compatible.
This version will be compatible with CDO versions 2.0.5x.
A new function
open_geotiff will allow GeoTiff files to be opened. This is a wrapper around rioxarray, which will convert the GeoTiff to NetCDF. It will require rioxarray to be installed.
A new method
surface_mask has been added to enable identifying top levels with data in cases when there are missing values in the actual top level.
A new method
is_corrupt has been added. This can identify whether NetCDF files are likely to be corrupt. Under-the hood, methods will now suggest running
is_corrupt when system errors imply the files are corrupt.
to_dataframe no long accept the cdo_times argument, as this has essentially been redundant for a few nctoolkit versions.
plot method now lets users send kwargs to hvplot to make customizations, such as log-scales an option. This will require the latest version of ncplot.
select method now lets user select days of month, using
ds.select(day = 1).
split method now allows splitting by timestep using
Release of v0.4.2¶
Version 0.4.2 was released in March 2022.
This is a minor release with a couple of method enhancements. Plots can now be saved to html files using the out arguments. The
nco_command method now works over multiple cores when these are set using
Release of v0.4.1¶
Version 0.4.1 was released in March 2022. This is a minor release focusing on improving nctoolkit internals.
A new method, called
check is introduced that can be used to troubleshoot data problems and to ensure there are no obvious data issues (such as a lack of CF-compliance).
Users can now access dataset calendars using
drop method now lets you remove time steps using the
The dataset attribute variables_detailed is now removed after being replaced by contents in version 0.3.9.
This version will recommend CDO versions greater than 1.9.7, because ensuring nctoolkit compatibility with earlier versions was becoming difficult and likely of little need to users.
Some coding improvements have enhanced the performance of the
subtract etc. methods.
Bug fixes: The methods
multiply etc. failed when datasets did not have time as a dimension in version 0.4.0. This is now fixed. Previously, ds.contents always returned None for the number of time steps. Now fixed.
Release of v0.4.0¶
Version 0.4.0 was released in January 2022. This is a major release that features some breaking changes. Methods for adding, subtracting, multipling and substracting datasets from each other will be enhanced. Until now these methods used a simplistic approach values from matching time steps were added to each other, etc. So if you are subtracting a 12 time step file from a dataset, only the first 12 time steps were subtracted from. However, often this is not what you want. For example, you might want to subtract yearly months from a file which contains montly values for each year.
This version of nctoolkit updates these methods so that it can figure out what kind of addition etc. it should carry out. For example, if you have a dataset which has monthly values for each year from 1950 to 1999, and use
subtract to subtract the values from a file which contains annual means for each year from 1950, it will subtract the annual mean for 1950 from each month in 1950 and the the annual mean for 1951 from each month in 1951, and so on.
Users are now able to specify the numeric precision of datasets using
ds.set_precision. By default uses the underlying netCDF file’s data type. This is normally not a problem. However, when the data type is integer, this can cause problems.
nc.open_data has been updated with this issue in mind. It will now warn users when the data type of the netCDF is integer, and it suggested switching to float ‘F64’ or ‘F32’.
drop method has been enhanced. It now accepts day, month and year as arguments to enable dropping specific time periods. For example
ds.drop(month = 2, day = 29) will remove leap days. Code written to use the old
drop method will now fail, as keywords are now required.
surface has now been renamed
top for consistency with
surface is deprecated and will be removed in a few months.
split method now allows users to split datasets into multiple files by variable.
ds.times now returns a datetime object, not a str as before.
Release of v0.3.9¶
Version 0.3.9 was released in November 2021. This is minor release focusing on under-the-hood improvements and new methods.
A new method,
from_xarray is added for converting xarray datasets to nctoolkit datasets.
Methods for identifying how many missing values appear in datasets have been added: na_count and na_frac. These will identify the number or fraction of values that are missing values in each grid cell. The methods operate the same way as the temporal methods. So ds.na_frac(“year”) will result in what fraction of values are missing values each year.
Methods for better upscaling of datasets will be added:
box_max. This will allow you to upscale to, for example, each 10 by 10 grid box using the mean of that grid box. This is useful for upscaling things like population data where you want the upscaled grid boxes to represent the entirety of the grid box, not the centre.
merge have been made. When variables are not included in all files nctoolkit will now only merge those in each file in a multi-file dataset. Previously it threw an error.
Functions for finding the times and months in netCDF files are now available:
nc_years and ``nc_months`.
variables_detailed has been changed to
contents. It will also now give the number of time steps available for each variable.
cdo_command now allows users to specify whether the CDO command used is an ensemble method. Previously methods applied on a file by file basis.
Release of v0.3.8¶
Version 0.3.8 was released in October 2021. This is a minor release, focusing on under-the-hood improvements and introducing better handling of files with varying vertical layers.
vertical_integration for calculating vertically integrated totals for netCDF data of the likes of oceanic data, where the vertical levels vary spatially, were introduced.
vertical_mean has been improved and can now calculate vertical mean in cases where the cell thickness varies in space.
merge_time is deprecated, and its functionality will be incorporated into
merge. So, following this release ensemble merging should use
open_url is now able to handle multiple urls. Previously it could only handle one.
Some under-the-hood improvements have been made to
assign to ensure that truth statements do not occassionally throw an error.
Release of v0.3.7¶
Version 0.3.7 was released in August 2021. This is a minor release.
New mathematical methods for simple operations on variables were added:
log10. These methods match numpy names.
assign previously did not work with
log10. Now fixed.
compare_all was deleted after a period of deprecation.
Release of v0.3.6¶
Version 0.3.6 was released in July 2021. This was a minor release.
ensemble_stdev were introduced for calculating variance and standard deviation across ensembles. The method
tvariance will be deprecated and is now renamed
tvar for naming consistency.
Release of v0.3.5¶
Version 0.3.5 was released in May 2021.
This is a minor release focusing on some under-the-hood improvements in performance and a couple of new methods.
It drops support for CDO version 1.9.3, as this is becoming too time-consuming to continue given the increasingly low reward.
A couple of new methods have been added.
distribute enables files to be split up spatially into equally sized m by n rectangles.
collect is the reverse of
distribute. It will collect distributed data into one file.
In prior releases
assign calls could not be split over multiple lines. This is now fixed.
There was a bug in previous releases where
regrid did not work with multi-file datasets. This was due to the enabling of parallel processing with nctoolkit. The issue is now fixed.
The deprecated methods
assign have now been removed. Variable creation should use
Release of v0.3.4¶
Version 0.3.3 was released in April 2021.
This was a minor release focusing on performance improvements, removal of deprecated methods and introduction of one new method.
A new method
fill_na has been introduced that allows missing values to be filled with the distanced weighted average.
cell_areas have been removed and are replaced permanently by
Release of v0.3.2¶
Version 0.3.2 was released in March 2021. This was a quick release to fix a bug causing
to_nc to not save output in the base directory.
Release of v0.3.1¶
Version 0.3.1 was released in March 2021. This is a minor release that includes new methods, under-the-hood improvements and the removal of deprecated methods.
New methods are introduced for identifying the first time step will specific numerical thresholds are first exceeded or fallen below etc:
last_below. The thresholds are either single numbers or can come from a gridded dataset
for grid-cell specific thresholds.
Methods to compare a dataset with another dataset or netCDF file have been added:
lt, which stand for ‘greater than’ and ‘less than’.
Users are be able to recycle the weights calculated when interpolating data. This can enable much faster interpolation of multiple files with the same grid.
The temporal methods replaced by
tmean etc. have now been removed from the package. So
monthly_mean etc. can no longer be used.
Release of v0.3.0¶
Version 0.3.0 was released in February 2021. This will be a major release introducing major improvements to the package.
A new method
assign is now available for generating new variables. This replaces the
transmute, which were
place-holder functions in the early releases of nctoolkit until a proper method for creating variables was put in place.
assign operates in the same way as the
assign method in Pandas. Users can generate new variables using lambda functions.
A major-change in this release is that evaluation is now lazy by default. The previous default of non-lazy evaluation was designed to make life slightly easier for new users of the package, but it is probably overly annoying for users to have to set evaluation to lazy each time they use the package.
This release features a subtle shift in how datasets work, so that they have consistent list-like properties. Previously, the
files in a dataset given by the
`current` attribute could be both a str or a list, depending on whether there was one or
more files in the dataset. This now always gives a list. As a result datasets in nctoolkit have list-like properties, with
remove methods available for adding and removing files.
remove is a new method in this release. As before datasets are iterable.
This release will also allow users to run nctoolkit in parallel. Previous releases allowed files in multi-file datasets to be processed in parallel. However, it was not possible to create processing chains and process files in parallel. This is now possible in version thanks to under-the-hood changes in nctoolkit’s code base.
Users are now able to add a configuration file, which means global settings do not need to be set in every session or in every script.