Skip to content

Latest commit

 

History

History
45 lines (25 loc) · 6.49 KB

filters.md

File metadata and controls

45 lines (25 loc) · 6.49 KB

Filtering OSM Data

Often one doesn't want to investigate the whole OSM data set at once, but only a specific part of it. For example, all the OSM data in a given region, or all OSM objects that have a given tag, type, or other property of the respective OSM entity.

For this, the MapReducer provides a variety of filtering methods which allow one to select any subset of the OSM data. Multiple filters can be applied after each other. The result will then contain any OSM elements that match all of the specified filters.

areaOfInterest

This defines the region where the query should be restricted on. It can be either a bounding box (OSHDBBoundingBox) or any (polygonal) JTS geometry such as a Polygon or MultiPolygon.

The output of this filter will keep only OSM entities whose geometry lie within or which intersect the given areaOfInterest. This included also OSM entities for which that none of their child elements lie within the given area of interest.

For example, a large forest polygon in OSM that completely encompasses a small area of interest is returned by the OSHDB API.

The resulting geometries produced by the different OSHDB views are by default clipped to the specified area of interest. This makes it possible to directly calculate the length or area of linear or polygonal OSM features within the given query region, without having to consider the fact that some features might only partially lie within the region. It is, at the same time, still possible to access full extent of the respective OSM features' unclipped geometries. You can find further information in the section about how the OSHDB builds geometries from OSM data.

The OSHDB is able to cope well even with complex polygons that have many vertices as areas of interest, but keep in mind that using simpler geometries will generally result in higher query performance: For example a bounding-box query is executed slightly faster than a polygon-areaOfInterest query with a rectangular polygon.

timestamps

This specifies the time range and time subdivisions for the OSHDB query. Accepts one or more ISO 8601 formatted dates (given in the UTC timezone). Depending on the used OSHDB view, these timestamps are interpreted slightly differently: When using the snapshot view, the given timestamps define the dates at which the snapshots of the OSM entities are taken. When using the contribution view, all modifications to the OSM entities are returned that lie within the time range defined by the given first and last timestamp, while any further timestamps can be used later to aggregate results into finer time intervals.

There exists also a method to define common regularly spaced time intervals within a time range, e.g. a monthly time interval between two dates.

OSHDB filter

An easy way to provide filters is through the functionality of OSHDB filters, which allow one to define osm data filters in a human-readable syntax. With these one can combine several tag-, type- and geometry-filters with arbitrary boolean operators.

Simple examples of filters are type:node and natural=tree to select trees, or geometry:polygon and building=* to filter for buildings. More examples and can be found on the dedicated filter documentation page.

By using the methods Filter.byOSMEntity and Filter.byOSHEntity one can define arbitrary callback functions to filter OSM or OSH entities, respectively.

lambda filter

It is possible to define filter functions that can sort out values after they already have been transformed in a map step.

Note that it is usually best to use the OSHDB filters described above wherever possible, as they can reduce the amount of data to be iterated over right from the start of the query. Lambda filter functions are only executed after the OSM data has already been computed and transformed.