# Get started with the openEO Python Client
Important
You need to get an openEO Platform account (opens new window) to access the processing infrastructure.
This Getting Started guide will simply give you an overview of the capabilities of the openEO Python client library. More in-depth information and documentation can be found on the official documentation (opens new window) website.
The High level Interface (opens new window) of the Python client is designed to provide an opinionated, Pythonic, API to interact with openEO Platform.
# Installation
The openEO Python client library can easily be installed with a tool like pip
, for example:
pip install openeo
The client library is also available on Conda Forge (opens new window).
It's recommended to work in a virtual environment of some kind (venv
, conda
, ...),
containing Python 3.6 or higher.
TIP
For more details, alternative installation procedures or troubleshooting tips:
see the official openeo
package installation documentation (opens new window).
# Connect to openEO Platform and explore
First, establish a connection to the openEO Platform back-end:
import openeo
connection = openeo.connect("openeo.cloud")
The Connection
object (opens new window)
is your central gateway to
- list data collections, available processes, file formats and other capabilities of the back-end
- start building your openEO algorithm from the desired data on the back-end
- execute and monitor (batch) jobs on the back-end
- etc.
# Collections
The Earth observation data (the input of your openEO jobs) is organised in so-called collections, e.g. fundamental satellite collections like "Sentinel 1" or "Sentinel 2", or preprocessed collections like "NDVI".
Note
More information on how openEO "collections" relate to terminology used in other systems can be found in the openEO glossary (opens new window).
While it's recommended to browse the available EO collections on the
openEO Platform collections overview webpage,
it's possible to list and inspect them programmatically.
As a very simple usage example of openEO Python client,
let's use the
list_collection_ids
(opens new window)
and describe_collection
(opens new window)
methods on the connection
object we just created:
>>> # Get all collection ids
>>> print(connection.list_collection_ids())
['AGERA5', 'SENTINEL1_GRD', 'SENTINEL2_L2A', ...
>>> # Get metadata of a single collection
>>> print(connection.describe_collection("SENTINEL2_L2A"))
{'id': 'SENTINEL2_L2A', 'title': 'Sentinel-2 top of canopy ...', 'stac_version': '0.9.0', ...
TIP
The openEO Python client library comes with Jupyter (notebook) integration in a couple of places.
For example, put connection.describe_collection("SENTINEL2_L2A")
(without print()
)
as last statement in a notebook cell
and you'll get a nice graphical rendering of the collection metadata.
TIP
Find out more about data discovery, loading and filtering in the official openEO Python client documentation (opens new window)
# Processes
Processes in openEO are operations that can be applied on (EO) data (e.g. calculate the mean of an array, or mask out observations outside a given polygon). The output of one process can be used as the input of another process, and by doing so, multiple processes can be connected that way in a larger "process graph" that implements a certain algorithm.
Note
Check the openEO glossary (opens new window) for more details on pre-defined, user-defined processes and process graphs.
Let's list the available pre-defined processes
with list_processes
(opens new window):
>>> print(connection.list_processes())
[{'id': 'absolute', 'summary': 'Absolute value', 'description': 'Computes the absolute value of ...
{'id': 'mean', 'summary': 'Arithmetic mean(average)', ...
...
Like with collections, instead of programmatic exploration you'll probably prefer a
more graphical, interactive interface.
Use the Jupyter notebook integration (put connection.list_processes()
without print()
as last statement in a notebook cell)
or visit a web-based overview of the available processes on openEO Platform.
TIP
Find out more about process discovery and usage official openEO Python client documentation (opens new window)
# Authentication
In the code snippets above we did not need to log in as a user since we just queried publicly available back-end information. However, to run non-trivial processing queries one has to authenticate so that permissions, resource usage, etc. can be managed properly.
To handle authentication, openEO leverages OpenID Connect (OIDC) (opens new window). It offers some interesting features (e.g., a user can securely reuse an existing account), but is a fairly complex topic, discussed in more depth on the Free Trial page.
The openEO Python client library tries to make authentication as streamlined as possible. In most cases for example, the following snippet is enough to obtain an authenticated connection:
import openeo
connection = openeo.connect("openeo.cloud").authenticate_oidc()
This statement will automatically reuse a previously authenticated session, when available.
Otherwise, e.g. the first time you do this, some user interaction is required
and it will print a web link and a short user code.
Visit this web page in a browser, log in there with an existing account and enter the user code.
If everything goes well, the connection
object in the script will be authenticated
and the back-end will be able to identify you in subsequent requests.
More detailed information on authentication can be found in the openEO Python client documentation (opens new window).
# Working with Datacubes
Now that we know how to discover the capabilities of the back-end and how to authenticate, let's do some real work and process some EO data in a batch job. We'll first build the desired algorithm by working on so-called "Datacubes", which is the central concept in openEO to represent EO data, as discussed in great detail here (opens new window).
# Creating a Datacube
Note
Please note that the following code only creates a process graph representation of your EO analysis and does not take major processing power from you device.
The first step is loading the desired slice of a data collection
with Connection.load_collection
(opens new window):
datacube = connection.load_collection(
"SENTINEL1_GRD",
spatial_extent={"west": 16.06, "south": 48.06, "east": 16.65, "north": 48.35},
temporal_extent=["2017-03-01", "2017-04-01"],
bands=["VV", "VH"]
)
This results in a DataCube
object (opens new window)
containing the "SENTINEL1_GRD" data restricted to the given spatial extent, temporal extent and bands.
# Applying processes
By applying an openEO process on a datacube, we create a new datacube object that represents the manipulated data.
The openEO Python client allows to do this by calling DataCube
object methods (opens new window).
The most common or popular openEO processes have a dedicated DataCube
method (e.g. mask
, aggregate_spatial
, filter_bbox
, ...).
There are also some convenience methods that implement
more complex openEO processes constructs is a compact, Pythonic interface.
For example, the DataCube.min_time
(opens new window) method
implements a reduce_dimension
process along the temporal dimension, using the min
process as reducer function:
datacube = datacube.min_time()
This creates a new datacube (we overwrite the existing variable), where the time dimension is eliminated and for each pixel we just have the minimum value of the corresponding timeseries in the original datacube.
See the Python client's DataCube
API documentation (opens new window)
for a more complete listing of methods that implement openEO processes.
openEO processes that are not supported by a dedicated DataCube
method
can be applied in a generic way with the process
method (opens new window), e.g.:
datacube = datacube.process(
process_id="ndvi",
arguments={
"data": datacube,
"nir": "B8",
"red": "B4"}
)
This applies the ndvi
process to the datacube with the arguments of "data", "nir" and "red" (This example assumes a datacube with bands B8
and B4
).
Note
Still unsure on how to make use of processes with the Python client? Visit the official documentation on working with processes (opens new window).
# Defining output format
After applying all processes you want to execute, we need to tell the back-end to export the datacube, for example as GeoTiff:
result = datacube.save_result("GTiff")
TIP
You can list the available file formats using:
connection.list_file_formats()
# Execution
It's important to note that all the datacube processes we applied up to this point
are not actually executed yet, neither locally nor remotely on the back-end.
We just built an abstract representation of the algorithm (input data and processing chain),
encapsulated in a local DataCube
object (e.g. the result
variable above).
To trigger actual execution on the back-end we have to explicitly send this representation
to the back-end.
openEO defines several processing modes (opens new window), but for this introduction we'll focus on batch jobs, which is a good default choice.
# Batch job execution
The result
datacube object we built above describes the desired input collections, processing steps and output format.
We can now just send this description to the back-end to create a batch job with the create_job
method (opens new window) like this:
# Creating a new job at the back-end by sending the datacube information.
job = result.create_job()
TIP
It can be annoying to manage and monitor batch jobs via code. If you want to use an interface for your batch jobs (or other resources) that is easier to use, you can also open the openEO Platform Editor (opens new window). After login, you'll be able to manage and monitor your batch jobs in a near-realtime interactive environment; Look out for the "Data Processing" tab.
The batch job, which is referenced by the returned job
object, is only created at the back-end,
it is not started yet.
To start the job and let your Python script wait until the job has finished then
download it automatically, you can use the start_and_wait
method.
# Starts the job and waits until it finished to download the result.
job.start_and_wait()
job.get_results().download_files("output")
When everything completes successfully, the processing result will be downloaded as a GeoTIFF file in a folder "output".
TIP
You may shut down your device or log out during the job runs on the backend. You can retrieve the status and results later and from any client.
The official openEO Python Client documentation has more information on batch job management and downloading results (opens new window)
# Additional Information
Additional information and resources about the openEO Python Client Library: