ecmwf_models.era5 package
Submodules
ecmwf_models.era5.download module
Module to download ERA5 from terminal in netcdf and grib format.
- class ecmwf_models.era5.download.CDSStatusTracker(logger=<RootLogger root (WARNING)>)[source]
Bases:
objectTrack the status of the CDS download by using the CDS callback functions
- statuscode_error = -1
- statuscode_ok = 0
- statuscode_terms_not_accepted = 11
- ecmwf_models.era5.download.download_and_move(target_path, startdate, enddate, product='era5', variables=None, keep_original=False, h_steps=(0, 6, 12, 18), grb=False, bbox=None, dry_run=False, grid=None, remap_method='bil', cds_kwds=None, stepsize='month', n_max_request=1000, keep_prelim=True, cds_token=None) int[source]
Downloads the data from the ECMWF servers and moves them to the target path. This is done in 30 day increments between start and end date.
The files are then extracted into separate grib files per parameter and stored in yearly folders under the target_path.
- Parameters:
target_path (str) – Path where the files are stored to
startdate (datetime) – first date to download
enddate (datetime) – last date to download
product (str, optional (default: era5)) – Either era5 or era5-land
variables (list, optional (default: None)) – Name of variables to download, see the documentation for all variable names. If None is chosen, then the ‘default’ variables are downloaded.
keep_original (bool (default: False)) – If True, keep the original downloaded data stack as received from CDS after slicing individual time stamps.
h_steps (tuple, optional (default: (0, 6, 12, 18))) – List of full hours to download data for at the selected dates e.g [0, 12] would download at 0:00 and 12:00. Only full hours are possible.
grb (bool, optional (default: False)) – Download data as grib files instead of netcdf. Note that downloading in grib format, does not allow on-the-fly resampling (grid argument)
bbox (Tuple[int,int,int,int], optional (default: None)) – Bounding box of the area to download (min_lon, min_lat, max_lon, max_lat) - wgs84. None will download global images.
dry_run (bool) – Do not download anything, this is just used for testing the functions
grid (dict, optional (default: None)) –
A grid on which to remap the data using CDO. This must be a dictionary using CDO’s grid description format, e.g.:
- grid = {
“gridtype”: “lonlat”, “xsize”: 720, “ysize”: 360, “xfirst”: -179.75, “yfirst”: 89.75, “xinc”: 0.5, “yinc”: -0.5,
}
Default is to use no regridding. To use this option, it is necessary that CDO is installed.
remap_method (str, optional (dafault: 'bil')) – Method to be used for regridding. Available methods are: - “bil”: bilinear (default) - “bic”: bicubic - “nn”: nearest neighbour - “dis”: distance weighted - “con”: 1st order conservative remapping - “con2”: 2nd order conservative remapping - “laf”: largest area fraction remapping
cds_kwds (dict, optional (default: None)) – Additional keyword arguments to be passed to the CDS API request. This might be useful in the future, when new server-side options are added which are not yet directly supported by this package.
n_max_request (int, optional (default: 1000)) – Maximum size that a request can have to be processed by CDS. At the moment of writing this is 1000 (N_timstamps * N_variables in a request) but as this is a server side settings, it can change.
keep_prelim (bool, optional (default: True)) – Keep preliminary data from ERA5T under a different file name. These data are not yet final and might change if an issue is detected. If False is chosen, then the preliminary data will be discarded and not stored.
cds_token (str, optional (default: None)) –
To identify with the CDS. Required if no .cdsapirc file exists in the home directory (see documentation). You can find your token/key
on your CDS user profile page. Alternatively, the CDSAPI_KEY environment variable can be set manually instead of passing the token here.
- Returns:
status_code – Status code summary from all requests: 0 : All Downloaded data ok -1 : Error in at least one request -10 : No data available for requested time period
- Return type:
- ecmwf_models.era5.download.download_era5(c, years, months, days, h_steps, variables, target, grb=False, bbox=None, product='era5', dry_run=False, cds_kwds={})[source]
Download era5 reanalysis data for single levels of a defined time span
- Parameters:
c (cdsapi.Client) – Client to pass the request to
years (list) – Years for which data is downloaded ,e.g. [2017, 2018]
months (list) – Months for which data is downloaded, e.g. [4, 8, 12]
days (list) – Days for which data is downloaded (range(31)=All days) e.g. [10, 20, 31]
h_steps (list) – List of full hours to download data at the selected dates e.g [0, 12]
variables (list, optional (default: None)) – List of variables to pass to the client, if None are passed, the default variables will be downloaded.
target (str) – File name, where the data is stored.
grb (bool, optional (default: False)) – Download data in grib format instead of netcdf
bbox (Tuple[int,int,int,int], optional (default: None)) – Bounding box of the area to download (min_lon, min_lat, max_lon, max_lat) - wgs84. None will download global images.
product (str) – ERA5 data product to download, either era5 or era5-land
dry_run (bool, optional (default: False)) – Do not download anything, this is just used for testing the functionality
cds_kwds (dict, optional) – Additional arguments to be passed to the CDS API retrieve request.
- Returns:
success – Return True after downloading finished
- Return type:
- ecmwf_models.era5.download.download_record_extension(path, dry_run=False, cds_token=None)[source]
Uses information from an existing record to download additional data from CDS.
- Parameters:
path (str) – Path where the image data to extend is stored. Must also contain a summary.yml file.
dry_run (bool, optional) – Do not download anything, this is just used for testing the functions
cds_token (str, optional (default: None)) – To identify with the CDS. Required if no .cdsapirc file exists in the home directory (see documentation). You can find your token/key on your CDS user profile page. Alternatively, the CDSAPI_KEY environment variable can be set manually instead of passing the token here.
- Returns:
status_code – Status code summary from all requests: 0 : All Downloaded data ok -1 : Error in at least one request -10 : No data available for requested time period
- Return type:
- ecmwf_models.era5.download.split_chunk(timestamps, n_vars, n_hsteps, max_req_size=1000, reduce=False, daily_request=False)[source]
Split the passed time stamps into chunks for a valid request. One chunk can at most hold data for one month or one day, but cannot be larger than the maximum request size.
- Parameters:
timestamps (pd.DatetimeIndex) – List of daily timestamps to split into chunks
n_vars (int) – Number of variables in each request.
max_req_size (int, optional (default: 1000)) – Maximum size of a request that the CDS API can handle
reduce (bool, optional (default: False)) – Return only the start and end of each subperiod instead of all time stamps.
daily_request (bool, optional (default: False)) – Only submit daily requests, otherwise monthly requests are allowed (if the max_req_size is not reached).
- Returns:
chunks – List of start and end dates that contain a chunk that the API can handle.
- Return type:
ecmwf_models.era5.img module
This module contains ERA5/ERA5-Land specific child classes of the netcdf and grib base classes, that are used for reading all ecmwf products.
- class ecmwf_models.era5.img.ERA5GrbDs(root_path: str, parameter: Collection[str] = None, h_steps: Collection[int] = (0, 6, 12, 18), product: str = 'era5', subgrid: CellGrid | None = None, mask_seapoints: bool | None = False, array_1D: bool | None = False)[source]
Bases:
ERAGrbDs
- class ecmwf_models.era5.img.ERA5GrbImg(filename: str, parameter: Collection[str] = None, subgrid: CellGrid | None = None, mask_seapoints: bool | None = False, array_1D=False)[source]
Bases:
ERAGrbImg
- class ecmwf_models.era5.img.ERA5NcDs(root_path: str, parameter: Collection[str] = None, product: str = 'era5', h_steps: Collection[int] = (0, 6, 12, 18), subgrid: CellGrid | None = None, mask_seapoints: bool | None = False, array_1D: bool | None = False)[source]
Bases:
ERANcDsReader for a stack of ERA5 netcdf image files.
- Parameters:
root_path (str) – Path to the image files to read.
parameter (list[str] or str, optional (default: None)) – Name of parameters to read from the image file. None means all parameter
product (str, optional (default: 'era5')) – What era5 product, either era5 or era5-land.
h_steps (list, optional (default: (0,6,12,18))) – List of full hours to read images for.
subgrid (pygeogrids.CellGrid, optional (default: None)) – Read only data for points of this grid and not global values.
mask_seapoints (bool, optional (default: False)) – Read the land-sea mask to mask points over water and set them to nan. This option needs the ‘lsm’ parameter to be in the file!
array_1D (bool, optional (default: False)) – Read data as list, instead of 2D array, used for reshuffling.
ecmwf_models.era5.reshuffle module
Image to time series conversion tools.
- class ecmwf_models.era5.reshuffle.Reshuffler(input_root, outputpath, variables=None, h_steps=(0, 6, 12, 18), product=None, land_points=False)[source]
Bases:
object- find_first_last_file_date() Tuple[str, str][source]
Derive time stamp of the first and last available image
- get_img_reader(grid: CellGrid | None, h_steps: tuple) ERA5GrbDs | ERA5NcDs[source]
Set up the Multi Image reader class
- load_grid(bbox: Tuple = None, cellsize=5.0) CellGrid[source]
Generate ERA5 and ERA5-Land grid in the given bounding box. Ensures that GPIs are consistent with global grid.
- reshuffle(startdate=None, enddate=None, bbox=None, cellsize=5.0, imgbuffer=50)[source]
Reshuffle method applied to ERA images for conversion into netcdf time series format. Note: ERA5 and ERA5-Land files are preferred over their T-counterpart, in case that multiple files for a time stamp are present!
- Parameters:
startdate (str) – Start date, from which images are read and time series are generated. Format YYYY-mm-dd
enddate (str, optional (default: None)) – End date, from which images are read and time series are generated. Format YYYY-mm-dd. If None is passed, then the last available image date is used.
variables (tuple or str) – Variables to read from the passed images and convert into time series format.
product (str, optional (default: None)) – Either era5 or era5-land, if None is passed we guess the product from the downloaded image files.
bbox (tuple optional (default: None)) – (min_lon, min_lat, max_lon, max_lat) - wgs84. To load only a subset of the global grid / file.
cellsize (float, optional (default: 5.0)) – Cell chunking of the time series in degrees.
imgbuffer (int, optional (default: 50)) – How many images to read at once before writing time series. This number affects how many images are stored in memory and should be chosen according to the available amount of memory and the size of a single image.
- ecmwf_models.era5.reshuffle.extend_ts(ts_path, **img2ts_kwargs)[source]
Append any new data from the image path to the time series data. This function is only applied to time series file that were created using the img2ts function. This will use the start from the previously written metadata, and process only parameters that are already present in the time series files.
- Parameters:
ts_path (str) – Directory containing time series files to extend. It is also expected that there is a overview.yml file in this directory.
img2ts_kwargs – All kwargs are optional, if they are not given, we use them from the previously created overview.yml file. If they are passed here, they will override the values from the yml file.