Generation and use of geo-information data
1. Find geographic data in a reasonable dimension
For a simulation it is necessary to decide on a geographic space. This should not be too large and not too small. Therefore define a bounding box that limits the expansion.
For the usage with the tool QGIS see How to prepare a vector/raster layer file
2. Using OSMnx to generate layer data
OpenStreetMap (OSM) provides a Python library and associated API to store specific geographic areas in a suitable format. This library is called OSMnx. Generally it provides various geographic data, in fact it is mostly used for graph data.
- Installing the OSMnx package
- Examples of usage
On Windows additional packages are required for proper working: the GDAL package for your respective python version. Using an example with the Galapagos Island and the use of OSMnx.
import osmnx as ox
def save_place(place_name):
# Retrieve the vector boundary of the area we need
place = ox.geocode_to_gdf(place_name)
place.to_file("galapagos.geojson", driver='GeoJSON')
if __name__ == '__main__':
place_name = 'Galapagos Islands'
# Its also possible to become more than one boundary in the same vectorfile
# place_name = ['Isla Isabela, Ecuador', 'Isla Santiago, Ecuador']
save_place(place_name)
3. Using a bounding box to reduce dimension of given data from a base model
If data is available in a suitable format, it is possible to crop it using QGIS. This way it is easy to save the desired area with a few provided tools.
The model environment is already available (e.g., in the form of a base model) and a subsection (corresponding to the bounding box) is to be extracted
- Import a given layer file (base layer) in QGIS (drag-and-drop). This is our Input layer.
- Create a mask (bounding box) in a temporal layer that limits the geographic extent. This is our Overlay layer.
- Layer → Create Layer → New Temporary Scratch Layer
- Drop-down menu "Geometry type": MultiPolygon (most often sufficient)
- Ok
- Edit → Add "any polygon form that fits your requirements"
- Layer → Toggle Editing → Save
- Vector (possibily Vektor) → geo processing tools → Intersection...
- Drop-down menu "Input layer": base layer
- Drop-down menu "Overlay layer": bounding box (or the other way around)
- Click "Start"
- A new layer (intersect layer) containing the intersected geodata appears in the Layer window (bottom left)
- Export the intersect layer:
- Right-click the intersect layer
- Select "Export" and "Save as"
- Specify the file name and file format
- Click "OK"
Base data with red bounding box (above) and the new data in dimension of mask as result (below).
4. Using the bounding box with data from the EO Browser (ESA)
If the necessary data are not available, it is possible to obtain them via a browser (satellite data freely available from ESA). How to crop the data is explained in this section.
To download the files in a geo format, an account is required. Accounts are for free.
- Model environment is not available; data acquisition via external data sources
- EO Browser: freely available satellite images (globally) from the ESA (European Space Agency)
- Visit https://apps.sentinel-hub.com/eo-browser (a tutorial with first steps is available, if needed)
- Upload your bounding box:
- Click the pentagon icon (top right) and drag-and-drop the file containing the bounding box into the EO Browser
- Specify the time range for the desired data and click "Search"
- Select one of the available data set (The down-arrows provide additional information)
- Select the desired geodata from within the data set
- Select "Download image" (right)
- Switch to the tab "Analytical"
- Select the file format TIFF
- Activate "Clip extra bands" (if not already activated)
- Select "Download"
5. Generating a raster layer from a GeoTIFF file
The Mars Runtime System uses vector and raster layers. In order to make raster layers usable, it is necessary to bring them into a suitable format. GeoTiff files cannot be used directly. This section explains how to transform to ASC files that are readable for the runtime system.
1. Transform the GeoTiff file to an ASC file
- Import the desired
tiff
file into QGIS (drag-and-drop) - We require the layer in greyscale.
1.Raster → Raster Calculator
2. Enter the formula (R + G + B) / 3 into the raster calculator expression (R = chanel 1, G = chanel 2, B = chanel 3)
3. From "raster chanels", select the three chanels R, G, and B. These appear in the raster calculator expression. Enclose them with round parentheses and divide the expression by 3.
4. In "Result layer" → "Output layer" specify the file name and file format (GeoTIFF) 5. Select OK
- Save the newly generated layer (let's call it GrayscaleGEOTIFF) as an asc file
- Raster → Conversion → Translate (Convert Format)/ Convert (format conversion)...
- Input layer: GrayscaleGEOTIFF
- In the field "Converted", click "…" and "Save to File…"
- Specify the file name and file format (ASC)
- Click "Save"
- Click "Run"
- The geodata are exported from QGIS and saved as an asc file
An asc
file contains only one band. This example assumes a tiff
file with three bands (red, green, and blue). Generally, it is important to specify which band should be exported. For a geotiff
file with only one band, no additional steps are necessary.
2. Manipulation of GeoTIFF file
GeoTiff are widespread. The bands may represent different data. Sometimes it is useful to mark specific areas as invalid. In the upcoming example, we want to remove values for water.
- Recognize that water has the same value in all 3 different bands.
- Open Raster → Raster Calculator...
- Insert an expression that specifies only valid band data. In our case, we only want to have data, where all bands have different data (water has here the same data in all three bands / compare screenshot).
For example:
"geo@1" != "geo@2" AND "geo@1" != "geo@3" AND "geo@2" != "geo@3"
- Set Output layer path for layer creation. Ok.
- Open Raster → Raster Calculator...
- Make a new expression that combines your data with the created mask (in our example, the land geo tiff that is multiplied with the green band/geo@“).
"geo@2" * "land@1"
Set the Output layer. Ok.
Now the water areas all have the Band value 0
. If you even need it to be invalid, another transformation is necessary.
- Open Raster → Raster Calculator...
- Make a new expression that devides the value and multiplies it afterwards. All values that are
0
are now transformed tono data
."green_land@1"/"green_land@1" *"green_land@1"
Set the Output layer. Ok.
6. Public data sources
There are many freely available data sources. When researching on the internet, one comes across many portals and sites that offer geoinformation data, some of them are listed in the following.
Geographically limited
- Geoportal geodata provided by federal, state and local authorities.
- Geoportal Hamburg geodata provided by federal, state and local authorities. Limited to Hamburg.
- The Afica GeoPortal provides a variety of data from covid to desert locust.
Global
- OpenSteetMap holds a map of the world with street networks and additional data.
- ESA Copernicus Open Access Hub the Open Access Hub provides complete, free and open access to Sentinel-1, Sentinel-2, Sentinel-3 and Sentinel-5P user products.
- EO Browser provides satellite data.
- Natural Earth provides cultural, physical and raster categories.
- IPUMS TERRA integrates population and environmental data across disciplinary scientific domains.
- NASA Earth Observation provides satellite data (ocean, atmosphere, and land surfaces).
- Socioeconomic data and applications center SEDAC provides socioeconomic data sets.
- Open Topography provides high-resolution topography data and tools.
- FAO GeoNetwork provides interactive maps, satellite imagery and related spatial databases.
- meteostat offers world wide weather and climate data.
- WorldClim offers maps, graphs, tables, and data of the global climate.
- United Nations Environmental Data Explorer covers themes like Freshwater, Population, Forests, Emissions, Climate, Disasters, Health and GDP.
- ArcGIS Hub provides a broad collection of diverse data.
- ESRI provides a broad collection of diverse data.
However, the data offered is available in very similar formats. The format geotiff
, geojson
, or shapefile
can be interpreted by QGIS and must potentially be edited.
7. OpenStreetMap as data source
OpenStreetMap (OSM) is a free and rich data source for a wide variety of data. There are road networks, buildings, rivers and much more. Much of this data can be accessed via the website. It may be worthwhile to have a look here to see if there is something suitable (for your own question).
- Download a desired area via a QGIS plugin:
- Follow these steps for installation and usage: http://www.qgistutorials.com/de/docs/downloading_osm_data.html
- Download the desired area via an internet browser:
- Visit OpenStreetMap
- Click "Export" (located in the top bar)
- Specify the desired area (this can also be done by selecting "Select another area manually")
- Click "Export"
- An
osm
file is downloaded which can be imported into and visualized in QGIS (drag-and-drop)
If the area is too large, the import into QGIS might fail.
8. Time series layer
Time series layers are used to represent data that should change during the course of a simulation. In order to realise this, the desired data are provided with a time stamp to tell the system at which point in time which file has to be taken.
Time series layers are used by the MARS Runtime System to include time-dependent data in simulation runs. These data must be in the form of a raster layer (asc
file). Additionally, a metadata.csv file must be present. This csv
file specifies the point in time at which a particular file should be used as a layer in the simulation. The csv
file is structured as follows:
Date | File |
---|---|
2010-01-01T00:00:00 | wc2.1_2.5m_tmin_2010-01.asc |
2010-02-01T00:00:00 | wc2.1_2.5m_tmin_2010-02.asc |
2010-03-01T00:00:00 | wc2.1_2.5m_tmin_2010-03.asc |
The ASC file may look like the example below.
ncols
describes the number of columns for that data is storednrows
describes the amount of rows respectivelyxllcorner
is the left lower longitudinal corner of the whole rasteryllcorner
is the left lower latitudinal corner of the whole rastercellsize
defines dimension of a grid cell in minutesnodata_value
is the default value that symbolizes that for that cell no data is available
ncols 4
nrows 8
xllcorner 30.5
yllcorner -26
cellsize 0.5
nodata_value -9999
17.807290649414 20.5651184082032 24.8631225585938 26.8945098876953
18.5511718750001 23.3313079833985 26.29580078125 27.8574005126953
20.4051147460938 25.4935241699219 27.5471893310547 28.5036560058594
27.4903656005859 28.1041198730469 28.9388061523438 29.2906738281251
28.8996826171876 29.4988952636719 30.3407836914063 31.0566650390625
30.1418853759766 30.5633178710938 31.2210021972657 31.8191162109376
30.4041229248047 31.273828125 31.6893402099609 31.8025604248047
31.709634399414 31.8638244628907 31.8258911132813 32.0521179199219
The asc
files and the metadata.csv must be stored in the same directory (may also be zipped).
9. Data usage in MARS
Data is made usable in the MARS system via layers. These layers contain a wide variety of data with which an agent can interact.
How to use a layer file during a MARS model simulation can be found here.
Data from external sources should always be validated regarding completeness, integrity, and plausibility.
10 Data acquisition using the example of HVV
The utilisation of data obtained from external sources is an important step in the use of geoinformation data. This article uses the example of the HVV to show how to make S-Bahn lines available for a specific area of Hamburg.
- Find the source
- Open the side: https://geoportal-hamburg.de/geo-online/
- Click on "Themen" in the upper left-hand corner
- Click on "Fachdaten" in the dropdown-menue
- Click on "Transport und Verkehr"
- Click on "HVV Streckennetz für Bahn-, Bus- und Fährlinien Hamburg"
- Click on the icon "i"
- A window opens. Click on the "WMS-Adresse" tab
- Save the link to the clipboard
- Integrate source in QGIS
- Launch QGIS
- On the left in the browser window. Left click on "WFS / OGC API - Feature"
- select "new connection..."
- In the new dialogue window, give the connection a name (we use "HVV") and copy the link into the column for the URL
- Click "Ok"
- Now open the new "HVV" tab and drag "Strecken (Normalfahrplan)" into the Layer window
- Check that the coordinate system at the bottom right is set to WGS84 EPGS:4326
- Now load your mask as an additional layer
- Intersect the layer as in the article "Using a bounding box given data from a base model"
- Drop-down menu "Input layer": Strecken (Normalfahrplan)
- Drop-down menu "Overlay layer": The Mask
- Click "Start"
- To add the stations repeat the same steps as with "Strecken (Normalfahrplan)" only this time with "Haltestellen"
- Now intersect the stops with the output layer from the last intersection
- To save the data in one geojson, right-click on the first intersection (routes) and export it as a geojson
- Now do the same with the second intersection (stations) and save it in the same file
- A dialogue opens. Click on "Attach to Layer" and "Ok"
11 View attributes of the data
When using free data, the included attributes of data are very different. It is possible to check with QGIS after loading the data what information is in the data. We will take an example from Hamburg here. The data we use is a Hamburg vector layer. It contains all districts and boroughs of the city.
- Load the file into QGIS
- Right click on the layer. In the menu click on "Open attribute table"
- All the information contained in the layer is displayed here
It is possible to apply certain operations to the attributes. In this way it is also possible to use only certain parts of a layer. In the following example we will cut out the district of Altona.
- Double click on the layer
- in the "Source" window there is a tab "Object filter on data provider". At the bottom right click on the button "Query creation"
- In the dialogue under "Fields" we double-click on "Bezirk_Nam"
- You can also view the information contained in a field in the values window
- This name is now displayed in the Provider Specific Filter Expression field
- The Provider Specific Filter Expression window must contain this expression: "Bezirk_Nam" = 'Altona'
- It is possible to create a more complex query with logical operators
- Click on "Ok"
- Now we have chosen the district of Altona
Hint: Depending on the version of QGIS, the query may also look different. Here is an example as an SQL query with the HVV and the S-Bahn routes.
11 View attributes of the data / filter by data
When using free data, the included attributes of data are very different. It is possible to check with QGIS after loading the data what information is in the data. We will take an example from Hamburg here. The data we use is a Hamburg vector layer. It contains all districts and boroughs of the city.
- Load the file into QGIS
- Right click on the layer. In the menu click on "Open attribute table"
- All the information contained in the layer is displayed here
It is possible to apply certain operations to the attributes. In this way it is also possible to use only certain parts of a layer. In the following example we will cut out the district of Altona.
- Double click on the layer
- in the "Source" window there is a tab "Object filter on data provider". At the bottom right click on the button "Query creation"
- In the dialogue under "Fields" we double-click on "Bezirk_Nam"
- You can also view the information contained in a field in the values window
- This name is now displayed in the Provider Specific Filter Expression field
- The Provider Specific Filter Expression window must contain this expression: "Bezirk_Nam" = 'Altona'
- It is possible to create a more complex query with logical operators
- Click on "Ok"
- Now we have chosen the district of Altona
Depending on the version of QGIS, the query may also look different.
Here is an example as an SQL query with the HVV and the S-Bahn routes: