# Get started with the openEO Python Client

Important

You need to get an openEO Platform account (opens new window) to access the processing infrastructure.

This Getting Started guide will simply give you an overview of the capabilities of the openEO Python client library. More in-depth information and documentation can be found on the official documentation (opens new window) website.

The High level Interface (opens new window) of the Python client is designed to provide an opinionated, Pythonic, API to interact with openEO Platform.

# Installation

The openEO Python client library can easily be installed with a tool like pip, for example:

pip install openeo

The client library is also available on Conda Forge (opens new window).

It's recommended to work in a virtual environment of some kind (venv, conda, ...), containing Python 3.6 or higher.

TIP

For more details, alternative installation procedures or troubleshooting tips: see the official openeo package installation documentation (opens new window).

# Connect to openEO Platform and explore

First, establish a connection to the openEO Platform back-end:

import openeo

connection = openeo.connect("openeo.cloud")

The Connection object (opens new window) is your central gateway to

  • list data collections, available processes, file formats and other capabilities of the back-end
  • start building your openEO algorithm from the desired data on the back-end
  • execute and monitor (batch) jobs on the back-end
  • etc.

# Collections

The Earth observation data (the input of your openEO jobs) is organised in so-called collections, e.g. fundamental satellite collections like "Sentinel 1" or "Sentinel 2", or preprocessed collections like "NDVI".

Note

More information on how openEO "collections" relate to terminology used in other systems can be found in the openEO glossary (opens new window).

While it's recommended to browse the available EO collections on the
openEO Platform collections overview webpage, it's possible to list and inspect them programmatically. As a very simple usage example of openEO Python client, let's use the list_collection_ids (opens new window) and describe_collection (opens new window) methods on the connection object we just created:

>>> # Get all collection ids
>>> print(connection.list_collection_ids())
['AGERA5', 'SENTINEL1_GRD', 'SENTINEL2_L2A', ...

>>> # Get metadata of a single collection
>>> print(connection.describe_collection("SENTINEL2_L2A"))
{'id': 'SENTINEL2_L2A', 'title': 'Sentinel-2 top of canopy ...', 'stac_version': '0.9.0', ...

TIP

The openEO Python client library comes with Jupyter (notebook) integration in a couple of places. For example, put connection.describe_collection("SENTINEL2_L2A") (without print()) as last statement in a notebook cell and you'll get a nice graphical rendering of the collection metadata.

TIP

Find out more about data discovery, loading and filtering in the official openEO Python client documentation (opens new window)

# Processes

Processes in openEO are operations that can be applied on (EO) data (e.g. calculate the mean of an array, or mask out observations outside a given polygon). The output of one process can be used as the input of another process, and by doing so, multiple processes can be connected that way in a larger "process graph" that implements a certain algorithm.

Note

Check the openEO glossary (opens new window) for more details on pre-defined, user-defined processes and process graphs.

Let's list the available pre-defined processes with list_processes (opens new window):

>>> print(connection.list_processes())
[{'id': 'absolute', 'summary': 'Absolute value', 'description': 'Computes the absolute value of ... 
 {'id': 'mean', 'summary': 'Arithmetic mean(average)', ...
 ...

Like with collections, instead of programmatic exploration you'll probably prefer a more graphical, interactive interface. Use the Jupyter notebook integration (put connection.list_processes() without print() as last statement in a notebook cell) or visit a web-based overview of the available processes on openEO Platform.

TIP

Find out more about process discovery and usage official openEO Python client documentation (opens new window)

# Authentication

In the code snippets above we did not need to log in as a user since we just queried publicly available back-end information. However, to run non-trivial processing queries one has to authenticate so that permissions, resource usage, etc. can be managed properly.

To handle authentication, openEO leverages OpenID Connect (OIDC) (opens new window). It offers some interesting features (e.g., a user can securely reuse an existing account), but is a fairly complex topic, discussed in more depth on the Free Trial page.

The openEO Python client library tries to make authentication as streamlined as possible. In most cases for example, the following snippet is enough to obtain an authenticated connection:

import openeo

connection = openeo.connect("openeo.cloud").authenticate_oidc()

This statement will automatically reuse a previously authenticated session, when available. Otherwise, e.g. the first time you do this, some user interaction is required and it will print a web link and a short user code. Visit this web page in a browser, log in there with an existing account and enter the user code. If everything goes well, the connection object in the script will be authenticated and the back-end will be able to identify you in subsequent requests.

More detailed information on authentication can be found in the openEO Python client documentation (opens new window).

# Working with Datacubes

Now that we know how to discover the capabilities of the back-end and how to authenticate, let's do some real work and process some EO data in a batch job. We'll first build the desired algorithm by working on so-called "Datacubes", which is the central concept in openEO to represent EO data, as discussed in great detail here (opens new window).

# Creating a Datacube

Note

Please note that the following code only creates a process graph representation of your EO analysis and does not take major processing power from you device.

The first step is loading the desired slice of a data collection with Connection.load_collection (opens new window):

datacube = connection.load_collection(
  "SENTINEL1_GRD",
  spatial_extent={"west": 16.06, "south": 48.06, "east": 16.65, "north": 48.35},
  temporal_extent=["2017-03-01", "2017-04-01"],
  bands=["VV", "VH"]
)

This results in a DataCube object (opens new window) containing the "SENTINEL1_GRD" data restricted to the given spatial extent, temporal extent and bands.

# Applying processes

By applying an openEO process on a datacube, we create a new datacube object that represents the manipulated data. The openEO Python client allows to do this by calling DataCube object methods (opens new window). The most common or popular openEO processes have a dedicated DataCube method (e.g. mask, aggregate_spatial, filter_bbox, ...).

There are also some convenience methods that implement more complex openEO processes constructs is a compact, Pythonic interface. For example, the DataCube.min_time (opens new window) method implements a reduce_dimension process along the temporal dimension, using the min process as reducer function:

datacube = datacube.min_time()

This creates a new datacube (we overwrite the existing variable), where the time dimension is eliminated and for each pixel we just have the minimum value of the corresponding timeseries in the original datacube.

See the Python client's DataCube API documentation (opens new window) for a more complete listing of methods that implement openEO processes.

openEO processes that are not supported by a dedicated DataCube method can be applied in a generic way with the process method (opens new window), e.g.:

datacube = datacube.process(
    process_id="ndvi", 
    arguments={
        "data": datacube, 
        "nir": "B8", 
        "red": "B4"}
)

This applies the ndvi process to the datacube with the arguments of "data", "nir" and "red" (This example assumes a datacube with bands B8 and B4).

Note

Still unsure on how to make use of processes with the Python client? Visit the official documentation on working with processes (opens new window).

# Defining output format

After applying all processes you want to execute, we need to tell the back-end to export the datacube, for example as GeoTiff:

result = datacube.save_result("GTiff")

TIP

You can list the available file formats using:

connection.list_file_formats()

# Execution

It's important to note that all the datacube processes we applied up to this point are not actually executed yet, neither locally nor remotely on the back-end. We just built an abstract representation of the algorithm (input data and processing chain), encapsulated in a local DataCube object (e.g. the result variable above). To trigger actual execution on the back-end we have to explicitly send this representation to the back-end.

openEO defines several processing modes (opens new window), but for this introduction we'll focus on batch jobs, which is a good default choice.

# Batch job execution

The result datacube object we built above describes the desired input collections, processing steps and output format. We can now just send this description to the back-end to create a batch job with the create_job method (opens new window) like this:

# Creating a new job at the back-end by sending the datacube information.
job = result.create_job()

TIP

It can be annoying to manage and monitor batch jobs via code. If you want to use an interface for your batch jobs (or other resources) that is easier to use, you can also open the openEO Platform Editor (opens new window). After login, you'll be able to manage and monitor your batch jobs in a near-realtime interactive environment; Look out for the "Data Processing" tab.

The batch job, which is referenced by the returned job object, is only created at the back-end, it is not started yet. To start the job and let your Python script wait until the job has finished then download it automatically, you can use the start_and_wait method.

# Starts the job and waits until it finished to download the result.
job.start_and_wait()
job.get_results().download_files("output")

When everything completes successfully, the processing result will be downloaded as a GeoTIFF file in a folder "output".

TIP

You may shut down your device or log out during the job runs on the backend. You can retrieve the status and results later and from any client.

The official openEO Python Client documentation has more information on batch job management and downloading results (opens new window)

# Additional Information

Additional information and resources about the openEO Python Client Library:

Last Updated: 12/17/2024, 12:59:54 PM