# Python Client

Note

You need to become part of the openEO Platform "early adopter" program (opens new window) to access the processing infrastructure.

This Getting Started guide will give you just a simple overview of the capabilities of the openEO Python client library. More in-depth information can be found in its official documentation (opens new window).

# Installation

The openEO Python client library is available on PyPI (opens new window) and can easily be installed with a tool like pip, for example:

pip install openeo

It's recommended to work in a virtual environment of some kind (venv, conda, ...), containing Python 3.6 or higher.

TIP

For more details, alternative installation procedures or troubleshooting tips: see the official openeo package installation documentation (opens new window).

# Connect to openEO Platform and explore

First we need to establish a connection to the openEO Platform back-end, which is available at connection URL https://openeo.cloud, or just in short:

import openeo

connection = openeo.connect("openeo.cloud")

The Connection object (opens new window) is your central gateway to

  • list data collections, available processes, file formats and other capabilities of the back-end
  • start building your openEO algorithm from the desired data on the back-end
  • execute and monitor (batch) jobs on the back-end
  • etc.

# Collections

The EO data available at a back-end is organised in so-called collections. For example, a back-end might provide fundamental satellite collections like "Sentinel 1" or "Sentinel 2", or preprocessed collections like "NDVI". Collections are used as input data for your openEO jobs.

Note

More information on how openEO "collections" relate to terminology used in other systems can be found in the openEO glossary (opens new window).

Let's list all available collections on the back-end, using list_collections (opens new window):

print(connection.list_collections())

which returns list of collection metadata dictionaries, e.g. something like:

[{'id': 'AGERA5', 'title': 'ECMWF AGERA5 meteo dataset', 'description': 'Daily surface meteorolociga datal ...', ...},
 {'id': 'SENTINEL2_L2A_SENTINELHUB', 'title': 'Sentinel-2 top of canopy', ...},
 {'id': 'SENTINEL1_GRD', ...},
 ...]

This listing includes basic metadata for each collection. If necessary, a more detailed metadata listing for a given collection can be obtained with describe_collection (opens new window).

TIP

Programmatically listing collections is just a very simple usage example of the Python client. In reality, you probably want to look up or inspect available collections on handy webpage. Check out the openEO Platform collections overview or the openEO Hub (opens new window) for collection listings of other back-ends.

# Processes

Processes in openEO are operations that can be applied on (EO) data (e.g. calculate the mean of an array, or mask out observations outside a given polygon). The output of one process can be used as the input of another process, and by doing so, multiple processes can be connected that way in a larger "process graph": a new (user-defined) processes that implements a certain algorithm.

Note

Check the openEO glossary (opens new window) for more details on pre-defined, user-defined processes and process graphs.

Let's list the (pre-defined) processes available on the back-end with list_processes (opens new window):

print(connection.list_processes())

which returns a list of dictionaries describing the process (including expected arguments and return type), e.g.:

[{'id': 'absolute', 'summary': 'Absolute value', 'description': 'Computes the absolute value of a real number `x`, which is th...', 
 {'id': 'mean', 'summary': 'Arithmetic mean(average)', ...}
 ...]

Like with collections, instead of programmatic exploration you'll probably prefer a web-based overview of the available processes on openEO Platform. You can also use the openEO Hub (opens new window) for back-end specific process descriptions or browse the reference specifications of openEO processes (opens new window).

# Authentication

In the code snippets above we did not need to log in since we just queried publicly available back-end information. However, to run non-trivial processing queries one has to authenticate so that permissions, resource usage, etc. can be managed properly.

To handle authentication, openEO leverages OpenID Connect (OIDC) (opens new window). It offers some interesting features (e.g. a user can securely reuse an existing account), but is a fairly complex topic, discussed in more depth in the general authentication documentation for openEO Platform.

The openEO Python client library tries to make authentication as streamlined as possible. For example, the following snippet illustrates how you can authenticate from a Python script or notebook:

connection.authenticate_oidc()

This statement will reuse a previously authenticated session, when available, and otherwise print a web link and a short user code because user interaction is required. After successfully visiting the web link in a browser, authenticating there and entering the user code, the connection in the script will also be authenticated. This means that subsequent requests from the connection object will be properly identified by the back-end as coming from your user.

import openeo
connection = openeo.connect("openeo.cloud").authenticate_oidc()

More detailed information on authentication can be found here (opens new window).

# Working with Datacubes

Now that we know how to discover the capabilities of the back-end and how to authenticate, let's do some real work and process some EO data in a batch job. We'll build the desired algorithm by working on so-called "Datacubes", which is the central concept in openEO to represent EO data, as discussed in great detail here (opens new window).

# Creating a Datacube

The first step is loading the desired slice of a data collection with Connection.load_collection (opens new window):

datacube = connection.load_collection(
  "SENTINEL1_GRD",
  spatial_extent={"west": 16.06, "south": 48.06, "east": 16.65, "north": 48.35},
  temporal_extent=["2017-03-01", "2017-04-01"],
  bands=["VV", "VH"]
)

This results in a Datacube object (opens new window) containing the "SENTINEL1_GRD" data restricted to the given spatial extent, the given temporal extend and the given bands .

TIP

You can also filter the datacube step by step or at a later stage by using the following filter methods:

datacube = datacube.filter_bbox(west=16.06, south=48.06, east=16.65, north=48.35)
datacube = datacube.filter_temporal(start_date="2017-03-01", end_date="2017-04-01")
datacube = datacube.filter_bands(["VV", "VH"])

Still, it is recommended to always use the filters directly in load_collection (opens new window) to avoid loading too much data upfront.

# Applying processes

By applying an openEO process on a datacube, we create a new datacube object that represents the manipulated data. The standard way to do this with the Python client is to call the appropriate Datacube object (opens new window) method. The most common or popular openEO processes have a dedicated Datacube method (e.g. mask, aggregate_spatial, filter_bbox, ...). Other processes without a dedicated method can still be applied in a generic way. An on top of that, there are also some convenience methods that implement openEO processes is a compact, Pythonic interface.

For example, the min_time (opens new window) method implements a reduce_dimension process along the temporal dimension, using the min process as reducer function:

datacube = datacube.min_time()

This creates a new datacube (we overwrite the existing variable), where the time dimension is eliminated and for each pixel we just have the minimum value of the corresponding timeseries in the original datacube.

See the Python client Datacube API (opens new window) for a more complete listing of methods that implement openEO processes.

openEO processes that are not supported by a dedicated Datacube method can be applied in a generic way with the process method (opens new window), e.g.:

datacube = datacube.process(
    process_id="ndvi", 
    arguments={
        "data": datacube, 
        "nir": "B8", 
        "red": "B4"}
)

This applies the ndvi process to the datacube with the arguments of "data", "nir" and "red" (This example assumes a datacube with bands B8 and B4).

Note

Still unsure on how to make use of processes with the Python client? Visit the official documentation on working with processes (opens new window).

# Defining output format

After applying all processes you want to execute, we need to tell the back-end to export the datacube, for example as GeoTiff:

result = datacube.save_result("GTiff")

# Execution

It's important to note that all the datacube processes we applied up to this point are not actually executed yet, neither locally nor remotely on the back-end. We just built an abstract representation of the algorithm (input data and processing chain), encapsulated in a local Datacube object (e.g. the result variable above). To trigger an actual execution (on the back-end) we have to explicitly send this representation to the back-end.

openEO defines several processing modes (opens new window), but for this introduction we'll focus on batch jobs, which is a good default choice.

# Batch job execution

The result datacube object we built above describes the desired input collections, processing steps and output format. We can now just send this description to the back-end to create a batch job with the send_job method (opens new window) like this:

# Creating a new job at the back-end by sending the datacube information.
job = result.send_job()

TIP

It can be annoying to manage and monitor batch jobs via code. If you want to use an interface for your batch jobs (or other resources) that is easier to use, you can also open the openEO Platform Editor (opens new window). After login, you'll be able to manage and monitor your batch jobs in a near-realtime interactive environment; Look out for the "Data Processing" tab.

The batch job, which is referenced by the returned job object, is just created at the back-end, it is not started yet. To start the job and let your Python script wait until the job has finished then download it automatically, you can use the start_and_wait method.

# Starts the job and waits until it finished to download the result.
job.start_and_wait()
job.get_results().download_files("output")

When everything completes successfully, the processing result will be downloaded as a GeoTIFF file in a folder "output".

TIP

The official openEO Python Client documentation has more information on batch job basics (opens new window) or more detailed batch job (result) management (opens new window)

# Additional Information

Additional information and resources about the openEO Python Client Library:

Last Updated: 11/24/2021, 4:49:05 PM