Get Started#

Installation#

Install xarray-eop using pip:

$ pip install xarray-eop

A simple extended Path class#

To handle both local filesystem path as well as distributed cloud storage or fsspec-like chain URL, a simple class EOPath is implemented deriving from pathlib.Path

In [1]: from xarray_eop.path import EOPath

In [2]: p = EOPath("s3://bucket/prod/file")

In [3]: print(p)
s3://bucket/prod/file

In [4]: print(p.path)
bucket/prod/file

In [5]: print(p.name)
file

In [6]: print(p.protocol)
s3

In [7]: p = EOPath("zip::s3//bucket/prod/file.zip")

In [8]: print(p)
zip::s3/bucket/prod/file.zip

In [9]: print(p.zip)

Cloud-storage credentials#

In xarray_eop, functions to open dataset/datatree can also access products in S3 object storage. The credentials can be explicitely passed to the function using the addition kwargs backend_kwargs, whose value is a dictionary in the form {“s3”:{“key”: access-key,”secret”: secret-key,”endpoint_url”: endpoint_url}}. In the case where no credentials are explicitely set, the function tried to retrieve it from environment variable.

Additionaly xarray_eop provides few functions among which:

xarray_eop.get_credentials(url[, profile])

Get credentials given an url path For S3 protocol, it looks first for the AWS-like (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_ENDPOINT_URL) or S3-like (S3_KEY, S3_SECRET, S3_URL) environment variables, or if a profile is given

xarray_eop.get_s3filesystem(url[, profile, anon])

Get S3 File System from an url.

In [10]: from xarray_eop import EOPath, get_s3filesystem

In [11]: url = EOPath("s3://buc-acaw-dpr/Samples")

In [12]: s3fs = get_s3filesystem(url)

In [13]: s3fs.exists(url.path)
Out[13]: True

In [14]: s3fs.ls(url.path)
Out[14]: 
['buc-acaw-dpr/Samples/CADUs',
 'buc-acaw-dpr/Samples/SAFE',
 'buc-acaw-dpr/Samples/Zarr']

Open Sentinel-3 SAFE product#

Open a specific dataset using xarray.dataset#

The sentinel-3 backend mainly consists of organizing the structure following the new EOP zarr Product Structure specification. Simple datasets can be created from a specific netcdf file or a group corresponding to a valid EOP zarr path.

xr.open_dataset(product, file_or_group, engine='sentinel-3', decoded_times=True, backend_kwargs={})#
Parameters:
  • product (str or Path) – Path to the SEN3 product

  • file_or_group (str or Path) – Netcdf files from the product to be opened or equivalent EOP zarr group

  • backend_kwargs (Dict) – kwargs specific to the backend. Specifically, it is used to passe the credentials to a S3 bucket, as backend_kwargs = {“storage_options”: {“s3”: {“key”=…,”secret”=…,”endpoint_url”=…}}}

  • decoded_times (boolean)

Returns:

xarray dataset

Return type:

xarray.dataset

For instance, opening the instrument_data.nc file from OLCI L1 ERR product will be as follows.

In [15]: import xarray as xr

In [16]: from xarray_eop.path import EOPath

In [17]: buc_name = EOPath("s3://buc-acaw-dpr")

In [18]: product = buc_name / "Samples/SAFE/S3B_OL_1_ERR____20230506T015316_20230506T015616_20230711T065804_0179_079_117______LR1_D_NR_003.SEN3"

In [19]: ds = xr.open_dataset(
   ....:     product,
   ....:     file_or_group="instrument_data.nc",
   ....:     engine="sentinel-3",
   ....: )
   ....: 

In [20]: print(ds)
<xarray.Dataset> Size: 11MB
Dimensions:                       (rows: 1023, columns: 1217, bands: 21,
                                   detectors: 3700, bands1: 21, bands2: 21)
Dimensions without coordinates: rows, columns, bands, detectors, bands1, bands2
Data variables:
    detector_index                (rows, columns) float32 5MB ...
    frame_offset                  (rows, columns) float32 5MB ...
    lambda0                       (bands, detectors) float32 311kB ...
    FWHM                          (bands, detectors) float32 311kB ...
    solar_flux                    (bands, detectors) float32 311kB ...
    relative_spectral_covariance  (bands1, bands2) float32 2kB ...
Attributes: (12/17)
    netCDF_version:         4.2 of Jul  8 2020 09:28:41 $
    product_name:           S3B_OL_1_ERR____20230506T015316_20230506T015616_2...
    title:                  OLCI Level 1b Product, Instrument Data Set
    institution:            LR1
    source:                 IPF-OL-1-EO 06.17
    history:                 
    ...                     ...
    start_time:             2023-05-06T01:53:16.029027Z
    stop_time:              2023-05-06T01:56:16.040795Z
    processing_baseline:    OL__L1_.003.03.00
    comment:                 
    ac_subsampling_factor:  16
    al_subsampling_factor:  1

In [21]: %xmode minimal
Exception reporting mode: Minimal

Note

The credentials to the cloud storage can be passed to the function using the backend_kwargs arguments as

{ "storage_options":
    { "s3":
        { "key"="","secret"="","endpoint_url"=""}
    }
}

If storage_options are not passed, the function looks for the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_ENDPOINT_URL or S3_KEY, S3_SECRET, S3_URL

Open the whole SAFE product#

The whole product can be represented by the experimental DataTree hierarchical structure, using the high-level function:

xarray_eop.api.open_datatree(product_urlpath, *)

Open a tree data (zarr sentinel product) from bucket or local file system, based on datatree.DataTree

Now let’s open the whole OLCI L1 ERR product from s3 bucket:

In [22]: import xarray as xr

In [23]: from xarray_eop import open_datatree

In [24]: from xarray_eop import EOPath

In [25]: buc_name = EOPath("s3://buc-acaw-dpr")

In [26]: product = buc_name / "Samples/SAFE/S3B_OL_1_ERR____20230506T015316_20230506T015616_20230711T065804_0179_079_117______LR1_D_NR_003.SEN3"

In [27]: dt = open_datatree(product)

In [28]: print(dt)
DataTree('None', parent=None)
├── DataTree('measurements')
│       Dimensions:        (rows: 1023, columns: 1217)
│       Coordinates:
│           latitude       (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           longitude      (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           time_stamp     (rows) datetime64[ns] 8kB dask.array<chunksize=(1023,), meta=np.ndarray>
│       Dimensions without coordinates: rows, columns
│       Data variables: (12/21)
│           oa01_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           oa02_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           oa03_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           oa04_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           oa05_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           oa06_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           ...             ...
│           oa16_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           oa17_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           oa18_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           oa19_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           oa20_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│           oa21_radiance  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
├── DataTree('conditions')
│   ├── DataTree('image')
│   │       Dimensions:         (rows: 1023, columns: 1217)
│   │       Coordinates:
│   │           latitude        (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│   │           longitude       (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│   │       Dimensions without coordinates: rows, columns
│   │       Data variables:
│   │           frame_offset    (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│   │           detector_index  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│   │           altitude        (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
│   ├── DataTree('geometry')
│   │       Dimensions:    (tie_rows: 1023, tie_columns: 77)
│   │       Coordinates:
│   │           latitude   (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│   │           longitude  (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│   │       Dimensions without coordinates: tie_rows, tie_columns
│   │       Data variables:
│   │           oaa        (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│   │           oza        (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│   │           saa        (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│   │           sza        (tie_rows, tie_columns) float64 630kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│   ├── DataTree('meteorology')
│   │       Dimensions:                          (tie_rows: 1023, tie_columns: 77,
│   │                                             tie_pressure_levels: 25, wind_vectors: 2)
│   │       Coordinates:
│   │         * tie_pressure_levels              (tie_pressure_levels) float32 100B 1e+03...
│   │       Dimensions without coordinates: tie_rows, tie_columns, wind_vectors
│   │       Data variables:
│   │           atmospheric_temperature_profile  (tie_rows, tie_columns, tie_pressure_levels) float32 8MB dask.array<chunksize=(1023, 77, 25), meta=np.ndarray>
│   │           horizontal_wind                  (tie_rows, tie_columns, wind_vectors) float32 630kB dask.array<chunksize=(1023, 77, 2), meta=np.ndarray>
│   │           humidity                         (tie_rows, tie_columns) float32 315kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│   │           sea_level_pressure               (tie_rows, tie_columns) float32 315kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│   │           total_columnar_water_vapour      (tie_rows, tie_columns) float32 315kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│   │           total_ozone                      (tie_rows, tie_columns) float32 315kB dask.array<chunksize=(1023, 77), meta=np.ndarray>
│   └── DataTree('instrument')
│           Dimensions:                       (bands: 21, detectors: 3700, bands1: 21,
│                                              bands2: 21)
│           Dimensions without coordinates: bands, detectors, bands1, bands2
│           Data variables:
│               fwhm                          (bands, detectors) float32 311kB dask.array<chunksize=(21, 3700), meta=np.ndarray>
│               lambda0                       (bands, detectors) float32 311kB dask.array<chunksize=(21, 3700), meta=np.ndarray>
│               relative_spectral_covariance  (bands1, bands2) float32 2kB dask.array<chunksize=(21, 21), meta=np.ndarray>
│               solar_flux                    (bands, detectors) float32 311kB dask.array<chunksize=(21, 3700), meta=np.ndarray>
└── DataTree('quality')
        Dimensions:            (rows: 1023, columns: 1217)
        Coordinates:
            latitude           (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            longitude          (rows, columns) float64 10MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
        Dimensions without coordinates: rows, columns
        Data variables: (12/22)
            quality_flags      (rows, columns) uint32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            oa01_radiance_unc  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            oa02_radiance_unc  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            oa03_radiance_unc  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            oa04_radiance_unc  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            oa05_radiance_unc  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            ...                 ...
            oa16_radiance_unc  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            oa17_radiance_unc  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            oa18_radiance_unc  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            oa19_radiance_unc  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            oa20_radiance_unc  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>
            oa21_radiance_unc  (rows, columns) float32 5MB dask.array<chunksize=(1023, 1217), meta=np.ndarray>

Verification Tool#

xarray_eop comes with a verification tool which compares to input products (SAFE or Zarr) The CLI tool can be run and displays option by:

$ compare --help