Get Started#
Installation#
Install xarray-eop using pip:
$ pip install xarray-eop
A simple extended Path class#
To handle both local filesystem path as well as distributed cloud storage or fsspec-like chain URL, a simple class EOPath is implemented deriving from pathlib.Path
In [1]: from xarray_eop.path import EOPath
In [2]: p = EOPath("s3://bucket/prod/file")
In [3]: print(p)
s3://bucket/prod/file
In [4]: print(p.path)
bucket/prod/file
In [5]: print(p.name)
file
In [6]: print(p.protocol)
s3
In [7]: p = EOPath("zip::s3//bucket/prod/file.zip")
In [8]: print(p)
zip::s3/bucket/prod/file.zip
In [9]: print(p.zip)
Cloud-storage credentials#
In xarray_eop, functions to open dataset/datatree can also access products in S3 object storage. The credentials can be explicitely passed to the function using the addition kwargs backend_kwargs, whose value is a dictionary in the form {“s3”:{“key”: access-key,”secret”: secret-key,”endpoint_url”: endpoint_url}}. In the case where no credentials are explicitely set, the function tried to retrieve it from environment variable.
Additionaly xarray_eop provides few functions among which:
|
Get credentials given an url path For S3 protocol, it looks first for the AWS-like ( |
|
Get S3 File System from an url. |
In [10]: from xarray_eop import EOPath, get_s3filesystem
In [11]: url = EOPath("s3://buc-acaw-dpr/Samples")
In [12]: s3fs = get_s3filesystem(url)
In [13]: s3fs.exists(url.path)
Out[13]: True
In [14]: s3fs.ls(url.path)
Out[14]:
['buc-acaw-dpr/Samples/CADUs',
'buc-acaw-dpr/Samples/SAFE',
'buc-acaw-dpr/Samples/Zarr']
Open Sentinel-3 SAFE product#
Open a specific dataset using xarray.dataset#
The sentinel-3 backend mainly consists of organizing the structure following the new EOP zarr Product Structure specification. Simple datasets can be created from a specific netcdf file or a group corresponding to a valid EOP zarr path.
- xr.open_dataset(product, file_or_group, engine='sentinel-3', decoded_times=True, backend_kwargs={})#
- Parameters:
product (str or Path) – Path to the SEN3 product
file_or_group (str or Path) – Netcdf files from the product to be opened or equivalent EOP zarr group
backend_kwargs (Dict) – kwargs specific to the backend. Specifically, it is used to passe the credentials to a S3 bucket, as backend_kwargs = {“storage_options”: {“s3”: {“key”=…,”secret”=…,”endpoint_url”=…}}}
decoded_times (boolean)
- Returns:
xarray dataset
- Return type:
xarray.dataset
For instance, opening the instrument_data.nc file from OLCI L1 ERR product will be as follows.
In [15]: import xarray as xr
In [16]: from xarray_eop.path import EOPath
In [17]: buc_name = EOPath("s3://buc-acaw-dpr")
In [18]: product = buc_name / "Samples/SAFE/S3B_OL_1_ERR____20230506T015316_20230506T015616_20230711T065804_0179_079_117______LR1_D_NR_003.SEN3"
In [19]: ds = xr.open_dataset(
....: product,
....: file_or_group="instrument_data.nc",
....: engine="sentinel-3",
....: )
....:
In [20]: print(ds)
<xarray.Dataset> Size: 11MB
Dimensions: (rows: 1023, columns: 1217, bands: 21,
detectors: 3700, bands1: 21, bands2: 21)
Dimensions without coordinates: rows, columns, bands, detectors, bands1, bands2
Data variables:
detector_index (rows, columns) float32 5MB ...
frame_offset (rows, columns) float32 5MB ...
lambda0 (bands, detectors) float32 311kB ...
FWHM (bands, detectors) float32 311kB ...
solar_flux (bands, detectors) float32 311kB ...
relative_spectral_covariance (bands1, bands2) float32 2kB ...
Attributes: (12/17)
netCDF_version: 4.2 of Jul 8 2020 09:28:41 $
product_name: S3B_OL_1_ERR____20230506T015316_20230506T015616_2...
title: OLCI Level 1b Product, Instrument Data Set
institution: LR1
source: IPF-OL-1-EO 06.17
history:
... ...
start_time: 2023-05-06T01:53:16.029027Z
stop_time: 2023-05-06T01:56:16.040795Z
processing_baseline: OL__L1_.003.03.00
comment:
ac_subsampling_factor: 16
al_subsampling_factor: 1
In [21]: %xmode minimal
Exception reporting mode: Minimal
Note
The credentials to the cloud storage can be passed to the function using the backend_kwargs arguments as
{ "storage_options":
{ "s3":
{ "key"="","secret"="","endpoint_url"=""}
}
}
If storage_options are not passed, the function looks for the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_ENDPOINT_URL or S3_KEY, S3_SECRET, S3_URL
Open the whole SAFE product#
The whole product can be represented by the experimental DataTree hierarchical structure, using the high-level function:
|
Open a tree data (zarr sentinel product) from bucket or local file system, based on |
Now let’s open the whole OLCI L1 ERR product from s3 bucket:
In [22]: import xarray as xr
In [23]: from xarray_eop import open_datatree
In [24]: from xarray_eop import EOPath
In [25]: buc_name = EOPath("s3://buc-acaw-dpr")
In [26]: product = buc_name / "Samples/SAFE/S3B_OL_1_ERR____20230506T015316_20230506T015616_20230711T065804_0179_079_117______LR1_D_NR_003.SEN3"
In [27]: dt = open_datatree(product)
In [28]: print(dt)
DataTree('None', parent=None)
├── DataTree('measurements')
│ Dimensions: (rows: 1023, columns: 1217)
│ Coordinates:
│ latitude (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ longitude (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ time_stamp (rows) datetime64[ns] 8kB dask.array<chunksize=(1023,), meta=np.ndarray>
│ Dimensions without coordinates: rows, columns
│ Data variables: (12/21)
│ oa01_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ oa02_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ oa03_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ oa04_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ oa05_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ oa06_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ ... ...
│ oa16_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ oa17_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ oa18_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ oa19_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ oa20_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ oa21_radiance (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
├── DataTree('conditions')
│ ├── DataTree('image')
│ │ Dimensions: (rows: 1023, columns: 1217)
│ │ Coordinates:
│ │ latitude (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ │ longitude (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ │ Dimensions without coordinates: rows, columns
│ │ Data variables:
│ │ frame_offset (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ │ detector_index (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ │ altitude (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│ ├── DataTree('geometry')
│ │ Dimensions: (tie_rows: 1023, tie_columns: 77)
│ │ Coordinates:
│ │ latitude (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│ │ longitude (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│ │ Dimensions without coordinates: tie_rows, tie_columns
│ │ Data variables:
│ │ oaa (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│ │ oza (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│ │ saa (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│ │ sza (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│ ├── DataTree('meteorology')
│ │ Dimensions: (tie_rows: 1023, tie_columns: 77,
│ │ tie_pressure_levels: 25, wind_vectors: 2)
│ │ Coordinates:
│ │ * tie_pressure_levels (tie_pressure_levels) float32 100B 1e+03...
│ │ Dimensions without coordinates: tie_rows, tie_columns, wind_vectors
│ │ Data variables:
│ │ atmospheric_temperature_profile (tie_rows, tie_columns, tie_pressure_levels) float32 8MB dask.array<chunksize=(1023, 77, 25), meta=np.ndarray>
│ │ horizontal_wind (tie_rows, tie_columns, wind_vectors) float32 630kB dask.array<chunksize=(1023, 77, 2), meta=np.ndarray>
│ │ humidity (tie_rows, tie_columns) float32 315kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│ │ sea_level_pressure (tie_rows, tie_columns) float32 315kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│ │ total_columnar_water_vapour (tie_rows, tie_columns) float32 315kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│ │ total_ozone (tie_rows, tie_columns) float32 315kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│ └── DataTree('instrument')
│ Dimensions: (bands: 21, detectors: 3700, bands1: 21,
│ bands2: 21)
│ Dimensions without coordinates: bands, detectors, bands1, bands2
│ Data variables:
│ fwhm (bands, detectors) float32 311kB dask.array<chunksize=(21, 3700), meta=np.ndarray>
│ lambda0 (bands, detectors) float32 311kB dask.array<chunksize=(21, 3700), meta=np.ndarray>
│ relative_spectral_covariance (bands1, bands2) float32 2kB dask.array<chunksize=(21, 21), meta=np.ndarray>
│ solar_flux (bands, detectors) float32 311kB dask.array<chunksize=(21, 3700), meta=np.ndarray>
└── DataTree('quality')
Dimensions: (rows: 1023, columns: 1217)
Coordinates:
latitude (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
longitude (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
Dimensions without coordinates: rows, columns
Data variables: (12/22)
quality_flags (rows, columns) uint32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
oa01_radiance_unc (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
oa02_radiance_unc (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
oa03_radiance_unc (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
oa04_radiance_unc (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
oa05_radiance_unc (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
... ...
oa16_radiance_unc (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
oa17_radiance_unc (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
oa18_radiance_unc (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
oa19_radiance_unc (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
oa20_radiance_unc (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
oa21_radiance_unc (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
Verification Tool#
xarray_eop comes with a verification tool which compares to input products (SAFE or Zarr) The CLI tool can be run and displays option by:
$ compare --help