{"id":282,"date":"2021-05-14T10:51:13","date_gmt":"2021-05-14T09:51:13","guid":{"rendered":"http:\/\/cartopop.com\/?p=282"},"modified":"2021-05-14T11:42:55","modified_gmt":"2021-05-14T10:42:55","slug":"querying-openstreetmap-with-amazon-athena","status":"publish","type":"post","link":"https:\/\/cartopop.com\/index.php\/2021\/05\/14\/querying-openstreetmap-with-amazon-athena\/","title":{"rendered":"Querying OpenStreetMap with Amazon Athena"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"282\" class=\"elementor elementor-282\">\n\t\t\t\t\t\t<div class=\"elementor-inner\">\n\t\t\t\t<div class=\"elementor-section-wrap\">\n\t\t\t\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-6dd7c44 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6dd7c44\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-7506739c\" data-id=\"7506739c\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a54bea6 elementor-widget elementor-widget-text-editor\" data-id=\"a54bea6\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t<div class=\"elementor-text-editor elementor-clearfix\">\n\t\t\t\t<p><!-- wp:paragraph {\"className\":\"\\u003cp style=\\u0022text-align: justify;\\u0022\\u003e\"} --><\/p>\n<p class=\"&lt;p style=\">OpenStreetMap (OSM) is a free, editable map of the world, created and maintained by volunteers and available for use under an open license. Companies and non-profits like Mapbox, Foursquare, Mapzen, the World Bank, the American Red Cross and others use OSM to provide maps, directions, and geographic context to users around the world.<\/p>\n<p><!-- \/wp:paragraph --><!-- wp:paragraph --><\/p>\n<p>As an open content project, regular OSM data archives are made available to the public via planet.openstreetmap.org in a few different formats (XML, PBF). This includes both snapshots of the current state of data in OSM as well as historical archives.<\/p>\n<p><!-- \/wp:paragraph --><!-- wp:paragraph --><\/p>\n<p>Working with \u201cthe planet\u201d (as the data archives are referred to) can be unwieldy. Because it contains data spanning the entire world, the size of a single archive is on the order of 50 GB. The format is bespoke and extremely specific to OSM. The data is incredibly rich, interesting, and useful, but the size, format, and tooling can often make it very difficult to even start the process of asking complex questions.<\/p>\n<p><!-- \/wp:separator --><!-- wp:paragraph --><\/p>\n<p><!-- \/wp:paragraph --><!-- wp:paragraph --><\/p>\n<p>However, Amazon can help you to work with this massive database. AWS has the right &#8220;tool&#8221; an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL and it&#8217;s called <strong>Amazon Athena<\/strong>!<\/p>\n<p><!-- \/wp:paragraph --><!-- wp:list --><\/p>\n<ul>\n<li>Amazon Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.<\/li>\n<li>Easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL.<\/li>\n<li>Fast: Most results are delivered within seconds. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.<\/li>\n<\/ul>\n<p><!-- \/wp:list --><!-- wp:paragraph --><\/p>\n<p>\u00a0<\/p>\n<p>The OSM Public Datasets consist of three tables: <em>planet<\/em>; <em>planet_history<\/em> and <em>changesets<\/em>. (In this <a href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/querying-openstreetmap-with-amazon-athena\/#more-2069\">blog<\/a> post you can find the way how to register the data tables):<\/p>\n<p><!-- \/wp:paragraph --><!-- wp:list --><\/p>\n<ul>\n<li><b>Planet<\/b>: Contains the current versions of all elements present in OSM.<\/li>\n<li><b>Planet_history<\/b>: Contains a historical record of all versions of all elements (even those that have been deleted).<\/li>\n<li><b>Changesets<\/b>: Contains information about changesets in which elements were modified (and which have a foreign key relationship to both the planet and planet_history tables).<br \/><br \/><\/li>\n<\/ul>\n<p><!-- \/wp:list --><!-- wp:image {\"id\":292,\"sizeSlug\":\"large\",\"linkDestination\":\"none\"} --><\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"420\" class=\"wp-image-292\" src=\"https:\/\/cartopop.com\/wp-content\/uploads\/2021\/05\/Athena-site-1024x420.jpg\" alt=\"\" srcset=\"https:\/\/cartopop.com\/wp-content\/uploads\/2021\/05\/Athena-site-1024x420.jpg 1024w, https:\/\/cartopop.com\/wp-content\/uploads\/2021\/05\/Athena-site-300x123.jpg 300w, https:\/\/cartopop.com\/wp-content\/uploads\/2021\/05\/Athena-site-768x315.jpg 768w, https:\/\/cartopop.com\/wp-content\/uploads\/2021\/05\/Athena-site-1536x630.jpg 1536w, https:\/\/cartopop.com\/wp-content\/uploads\/2021\/05\/Athena-site.jpg 1891w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<p><!-- \/wp:image --><!-- wp:separator --><\/p>\n<hr class=\"wp-block-separator\" \/>\n<p><!-- \/wp:separator --><!-- wp:paragraph --><\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<p><strong>Let&#8217;s see an example<\/strong><\/p>\n<p><!-- \/wp:paragraph --><!-- wp:paragraph --><\/p>\n<p>Tanzania Development Trust has a mapping project called Crowd2map Tanzania. Working with the HOT team, Crowd2map Tanzania (hereafter C2M) volunteers added millions of buildings to OpenStreetMap. Each &#8220;<a href=\"https:\/\/wiki.openstreetmap.org\/wiki\/Changeset\">changeset<\/a>&#8221; was uploaded with the <strong>#tanzaniadevelopmenttrust<\/strong> hashtag, which can help us in data management and querying. Now, the organisation needs all the data tagged as &#8220;amenity&#8221; and sent in a changeset in which the #tanzaniadevelopmenttrust hashtag was also used. The query will be something like this:<\/p>\n<p><!-- \/wp:paragraph --><!-- wp:image {\"align\":\"center\",\"id\":288,\"sizeSlug\":\"large\",\"linkDestination\":\"none\"} --><\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"598\" height=\"118\" class=\"wp-image-288\" src=\"https:\/\/cartopop.com\/wp-content\/uploads\/2021\/05\/Athena-query.jpg\" alt=\"\" srcset=\"https:\/\/cartopop.com\/wp-content\/uploads\/2021\/05\/Athena-query.jpg 598w, https:\/\/cartopop.com\/wp-content\/uploads\/2021\/05\/Athena-query-300x59.jpg 300w\" sizes=\"(max-width: 598px) 100vw, 598px\" \/><\/figure>\n<\/div>\n<p><!-- \/wp:image --><!-- wp:paragraph --><\/p>\n<p>The running time, in this case will be less than 23 seconds while the entire &#8220;OSM planet&#8221; data will be scanned &#8211; which is 78.83 GB in May 2021. The <a href=\"https:\/\/cartopop.com\/wp-content\/uploads\/2021\/05\/fc3bace1-7440-4455-8c29-060391d1b281.csv\">result<\/a> contains all nodes that meet the request criteria and all available information about it: its spatial position, when it was created, who created it, the number of changeset in which it was uploaded, what &#8220;tag&#8221; was used, and so on.<\/p>\n<p><!-- \/wp:paragraph --><!-- wp:separator --><\/p>\n<hr class=\"wp-block-separator\" \/>\n<p><!-- \/wp:separator --><!-- wp:paragraph --><\/p>\n<p>\u00a0<\/p>\n<p>Here are some queries that can be useful:<\/p>\n<p><!-- \/wp:paragraph --><!-- wp:code {\"fontSize\":\"small\"} --><\/p>\n<pre class=\"wp-block-code has-small-font-size\"><code><strong>Number of changeset per user, where the comment is #tanzaniadevelopmenttrust<\/strong>\n\nSELECT user,\n         count(*) AS changesets\nFROM changesets\nWHERE regexp_like(tags['comment'], '(?i)#tanzaniadevelopmenttrust')\nGROUP BY  user\nORDER BY  count(*) DESC;\n\n\n<strong>Number of user who added the at least 1 #tanzaniadevelopmenttrust related changeset<\/strong>\n\nSELECT count(DISTINCT(uid)) AS \"Number of volunteers\"\nFROM changesets\nWHERE lower(changesets.tags['comment']) LIKE '%tanzaniadevelopmenttrust%'\n\n\n<strong>Number of user who added the at least 1 #tanzaniadevelopmenttrust related changeset since 1st of January 2020<\/strong>\n\nSELECT count(distinct(uid))\nFROM changesets\nWHERE lower(changesets.tags['comment']) LIKE '%tanzaniadevelopmenttrust%'\n\tAND changesets.created_at &gt; date '2020-01-01'\n\nNote\nIf you need an intervall You can use in this way:\nAND changesets.created_at &gt;= date '2020-01-01' AND changesets.created_at &lt; date '2020-10-01'\n\n\n<strong>Number of user who added the more than 20 #tanzaniadevelopmenttrust related changeset <\/strong>\n\nSELECT count(uid) FROM (\n  SELECT uid, count(id) AS num_changesets\n  FROM changesets\n  WHERE lower(changesets.tags['comment']) LIKE '%tanzaniadevelopmenttrust%'\n  GROUP BY uid\n) WHERE num_changesets &gt; 20\n\n\n<strong>All user - name; first edit; last edit; calculated lifespan and total edits <\/strong>\n\n SELECT\n     changesets.user,\n     min(created_at) AS first_edit,\n     max(created_at) AS last_edit,\n     date_diff('day', min(created_at), max(created_at)) AS lifespan,\n     date_diff('day', max(created_at), current_date) AS offline_since,\n     sum(num_changes) AS total_edits\n FROM\n     changesets\n WHERE\n     lower(changesets.tags['comment']) LIKE '%tanzaniadevelopmenttrust%' -- hotosm changesets only\n GROUP BY\n     changesets.user\n ORDER BY lifespan DESC\n\n\n<strong>All user - name; first edit; last edit; calculated lifespan and total edits but searching the app was used aka \u201ccreated_by\u201d\n<\/strong>\n SELECT\n     changesets.user,\n     min(created_at) AS first_edit,\n     max(created_at) AS last_edit,\n     date_diff('day', min(created_at), max(created_at)) AS lifespan,\n     sum(num_changes) AS total_edits\n FROM\n     changesets\n WHERE\n     lower(changesets.tags['created_by']) LIKE '%maps.me%'\n GROUP BY\n     changesets.user\n ORDER BY lifespan DESC\n \n\n<strong>Number of village, city, etc. was added with \u201cTNZ_missing_villages\u201d hashtag<\/strong>\n\nSELECT COUNT(planet.changeset)\nFROM planet\nJOIN changesets ON planet.changeset = changesets.id\nWHERE regexp_like(changesets.tags['comment'], '(?i)#TNZ_missing_villages') AND planet.type = 'node' AND planet.tags['place'] IS NOT NULL;\n\n\n<strong>Get all data from OSM in csv, where is a tag \u201cplace\u201d and  was added under #TNZ_missing_villages <\/strong>\n\nSELECT planet.*, changesets.tags\nFROM planet\nJOIN changesets ON planet.changeset = changesets.id\nWHERE regexp_like(changesets.tags['comment'], '(?i)#TNZ_missing_villages') AND planet.tags['place'] IS NOT NULL;\n\n\n<strong>Number of building which was added under #tanzaniadevelopmenttrust<\/strong>\n\nSELECT COUNT(planet.changeset)\nFROM planet\nJOIN changesets ON planet.changeset = changesets.id\nWHERE regexp_like(changesets.tags['comment'], '(?i)#tanzaniadevelopmenttrust') AND planet.type = 'way' AND planet.tags['building'] IS NOT NULL;\n\n\n<strong>Get all building as csv which was added under #tanzaniadevelopmenttrust<\/strong>\n\nSELECT planet.*, changesets.tags\nFROM planet\nJOIN changesets ON planet.changeset = changesets.id\nWHERE regexp_like(changesets.tags['comment'], '(?i)#tanzaniadevelopmenttrust') AND planet.type = 'way' AND planet.tags['building'] IS NOT NULL;\n\n\n<strong>Get all amenity was added under #tanzaniadevelopmenttrust<\/strong>\n\nSELECT  planet.*,\n         changesets.tags\nFROM planet\nJOIN changesets\n    ON planet.changeset = changesets.id\nWHERE regexp_like(changesets.tags['comment'], '(?i)#tanzaniadevelopmenttrust') AND planet.tags['amenity'] IS NOT NULL;\n\n\n<strong>Get all amenity within the box enclosing Tanzania<\/strong>\n\nSELECT * from planet\nWHERE type = 'node'\n  AND tags['amenity'] IN ('hospital', 'clinic', 'doctors')\n  AND lon BETWEEN 29.3269 AND 40.6584\n  AND lat BETWEEN -11.7612 AND -0.9854;\n\nMARA BOX\n  AND lon BETWEEN 33.1411 AND 35.28625\n  AND lat BETWEEN -0.99878 AND -2.50865;\n\nNote: you can get spatial coordinates for a drawn box from here: http:\/\/bboxfinder.com\/\n\n\n<strong>Get all amenity within the box enclosing Tanzania box - used maps.me<\/strong>\n\nSELECT  planet.*,\n         changesets.tags\nFROM planet\nJOIN changesets\n    ON planet.changeset = changesets.id\nWHERE lower(changesets.tags['created_by']) LIKE '%maps.me%'\n        AND planet.tags['amenity'] IS NOT NULL\n        AND lon BETWEEN 29.3269 AND 40.6584\n        AND lat BETWEEN -11.7612 AND -0.9854;\n\n        \n<strong>List all changeset tags which contains #tanzaniadevelopmenttrust<\/strong>\n\nSELECT changesets.tags \nFROM changesets \nWHERE lower(changesets.tags['comment']) LIKE '%tanzaniadevelopmenttrust%'\n\n\n<strong>Get all comments + timeset and user, wich contains #tanzaniadevelopmenttrust<\/strong>\n\nSELECT changesets.tags,\n     created_at AS timestamp,\n     user AS user\nFROM changesets \nWHERE lower(changesets.tags['comment']) LIKE '%tanzaniadevelopmenttrust%'\n\n\n<strong>Look for the app like maps.me was used for mapping <\/strong>\n\nSELECT changesets.user,\n         min(created_at) AS first_edit,\n         max(created_at) AS last_edit,\n         date_diff('day', min(created_at), max(created_at)) AS lifespan, sum(num_changes) AS total_edits\nFROM changesets\nWHERE lower(changesets.tags['created_by']) LIKE '%maps.me%'\n    AND changesets.created_at &gt; date '2020-01-01'\n    AND changesets.min_lat decimal(9,7) = -11.7612, changesets.max_lat decimal(9,7) = -0.9854, changesets.min_lon decimal(10,7) = 29.3269, changesets.max_lon decimal(10,7) = 40.6584\nGROUP BY  changesets.user\nORDER BY  lifespan DESC \n<\/code><\/pre>\n<p><!-- \/wp:code --><!-- wp:paragraph --><\/p>\n<p><!-- \/wp:paragraph --><!-- wp:paragraph --><\/p>\n<p><strong>Guides<\/strong><\/p>\n<p><!-- \/wp:paragraph --><!-- wp:paragraph --><\/p>\n<p><a href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/querying-openstreetmap-with-amazon-athena\/#more-2069\">AWS Big Data Blog &#8211; Querying OpenStreetMap with Amazon Athena<\/a><\/p>\n<p><!-- \/wp:paragraph --><!-- wp:list --><\/p>\n<ul>\n<li><a href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/querying-openstreetmap-with-amazon-athena\/#more-2069\">AWS Big Data Blog &#8211; Querying OpenStreetMap with Amazon Athena<\/a><\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/athena\/latest\/ug\/querying-geospatial-data.html\">Querying Geospatial Data (Amazon Athena user guide)<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/d3netxer\/mapgive-metrics\">Tom Gertin: Querying OpenStreetMap with Amazon Athena<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/jenningsanderson\/aws-athena-workshop\">Jennings Anderson: aws-athena-workshop<\/a> (probably the best guide if you are too lazy)<\/li>\n<\/ul>\n<p><!-- \/wp:list --><!-- wp:paragraph --><\/p>\n<p><!-- \/wp:paragraph --><\/p>\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>OpenStreetMap (OSM) is a free, editable map of the world, created and maintained by volunteers and available for use under an open license. Companies and non-profits like Mapbox, Foursquare, Mapzen, the World Bank, the American Red Cross and others use OSM to provide maps, directions, and geographic context to users around the world. As an&hellip; <br \/> <a class=\"read-more\" href=\"https:\/\/cartopop.com\/index.php\/2021\/05\/14\/querying-openstreetmap-with-amazon-athena\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[1],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/cartopop.com\/index.php\/wp-json\/wp\/v2\/posts\/282"}],"collection":[{"href":"https:\/\/cartopop.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cartopop.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cartopop.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cartopop.com\/index.php\/wp-json\/wp\/v2\/comments?post=282"}],"version-history":[{"count":18,"href":"https:\/\/cartopop.com\/index.php\/wp-json\/wp\/v2\/posts\/282\/revisions"}],"predecessor-version":[{"id":306,"href":"https:\/\/cartopop.com\/index.php\/wp-json\/wp\/v2\/posts\/282\/revisions\/306"}],"wp:attachment":[{"href":"https:\/\/cartopop.com\/index.php\/wp-json\/wp\/v2\/media?parent=282"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cartopop.com\/index.php\/wp-json\/wp\/v2\/categories?post=282"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cartopop.com\/index.php\/wp-json\/wp\/v2\/tags?post=282"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}