logo
Spatial Search of Amazon S3 Express One Zone Data with Amazon Athena and Visualized It in QGIS

Spatial Search of Amazon S3 Express One Zone Data with Amazon Athena and Visualized It in QGIS

I tried a spatial search of Amazon S3 Express One Zone data with Amazon Athena and visualized it with QGIS.

Published Dec 18, 2023

I previously posted an article verified with S3 Standard. This time, I verified it with the new S3 Express One Zone announced at re:Invent 2023, focusing on the results of integrating with Athena, spatial search, and improving search speed when using S3 Express One Zone!
S3 Express One Zone is an Amazon S3 storage option focusing on high performance. This option is also available in the Tokyo Region and is designed to provide up to 10 times better performance than the S3 Standard storage class. In addition, the request fee is 50% less than S3 Standard. To use this service, a specific bucket type called "Directory Bucket" is used.

Prepare GIS data for use with Amazon Athena. This time, we created four types of sample data in QGIS in advance.
I prepared GIS data for points, lines, and polygons in CSV (TSV format).
I prepared an additional 1 million points of GIS data in CSV (TSV format).
I have registered this sample data on GitHub, so please feel free to use it.

Create buckets and register data with Amazon S3 Express One Zone.
Click AWS Management Console → S3.
Click "Create Bucket".
Set Region Bucket Type as Directory, Availability Zone, and Base Name.
A bucket with the specified name is created.
Select the target bucket → Click "Upload."
Select the file you want to register → Click "Upload.
Check the uploaded files.
The four types of CSV (TSV format) were saved in a directory bucket with an arbitrary name.
This completes the data registration for S3 Express One Zone!

This is how to set the query destination in Amazon Athena.
Prepare an S3 bucket with an arbitrary name for the query destination in advance.
Click AWS Management Console → Athena.
Click on "Check query editor for details."
Click on "View Settings."
Click "Manage."
Specify the S3 bucket where you want to save the query → Click "Save."
The query destination is set.
The setting of the query destination is now complete!

This is how to create a table in Amazon Athena.
Click the Athena editor → Create Table and View → "S3 Bucket Data."
Set table name, database selection, target S3 bucket specification, data format, and column settings. Check the preview → Click "Create Table."
S3 Express One Zone buckets are currently not displayed in the list, so you must enter the address directly. The address should be prefixed with "s3://".
This time we created four arbitrary tables. Target table → Click "Preview Table."
The retrieved records are displayed.
Now your table creation is complete! We have confirmed that the table can be read by Athena with no problem in S3 Express One Zone.

Finally, here is how to do a spatial search in Amazon Athena.
Let's get the center of gravity point from a polygon. Download the result data.
S3 Standard
Time in queue: 0.243 sec, Run time: 0.799 sec, Data scanned: 1.5KB
S3 Express One Zone
Time in queue: 0.120 sec, Run time: 0.899 sec, Data scanned: 1.5KB
1
2
3
SELECT "geospatial_database"."polygon_table"."name",
ST_Centroid(ST_GeometryFromText("geospatial_database"."polygon_table"."wkt"))
FROM "geospatial_database"."polygon_table";
Visualize the downloaded data in QGIS to confirm the processed data.
Try to get the starting point from the line. Download the result data.
S3 Standard
Time in queue: 0.175 sec, Run time: 0.601 sec, Data scanned: 1.05KB
S3 Express One Zone
Time in queue: 0.119 sec, Run time: 0.948 sec, Data scanned: 1.05KB
1
2
3
SELECT "geospatial_database"."line_table"."name",
ST_StartPoint(ST_GeometryFromText("geospatial_database"."line_table"."wkt"))
FROM "geospatial_database"."line_table";
Visualize the downloaded data in QGIS to confirm the processed data.
Try to get only the points included in the polygon. Download the result data.
S3 Standard
Time in queue: 0.313 sec, Run time: 1.230 sec, Data scanned: 2.01KB
S3 Express One Zone
Time in queue: 0.073 sec, Run time: 0.993 sec, Data scanned: 2.01KB
1
2
3
4
SELECT "geospatial_database"."point_table"."name", "geospatial_database"."point_table"."wkt"
FROM "geospatial_database"."point_table", "geospatial_database"."polygon_table"
WHERE ST_Within(ST_GeometryFromText("geospatial_database"."point_table"."wkt"),
ST_GeometryFromText("geospatial_database"."polygon_table"."wkt"));
Visualize the downloaded data in QGIS to confirm the processed data.
Try to get only the points included in 1 million polygons. The response time is fast even when searching a large amount of GIS data. Download the result data.
S3 Standard
Time in queue: 0.220 sec, Run time: 2.832 sec, Data scanned: 46.41MB
S3 Express One Zone
Time in queue: 0.117 sec, Run time: 2.843 sec, Data scanned: 46.41MB
1
2
3
4
SELECT "geospatial_database"."randompoint_table"."name", "geospatial_database"."randompoint_table"."wkt"
FROM "geospatial_database"."randompoint_table", "geospatial_database"."polygon_table"
WHERE ST_Within(ST_GeometryFromText("geospatial_database"."randompoint_table"."wkt"),
ST_GeometryFromText("geospatial_database"."polygon_table"."wkt"));
Visualize the downloaded data in QGIS to confirm the processed data.
By using Amazon Athena, a spatial search of data registered in S3 becomes possible!
This verification confirmed that even when using S3 Express One Zone, it is possible to link with Athena and realize spatial search.
As for spatial search, we saw a performance improvement of more than 120% for some searches, but no significant overall speed improvement was observed. This may be because S3 Express One Zone specializes in processing many small files and may not be suitable for the large spatial search data used in this verification.
However, in terms of storage cost reduction, we saw significant advantages in using S3 Express One Zone!
Related Articles