AWS Athena is a service that allows SQL queries to be executed against files held in S3. A variety of file formats are supported, including flat file formats such as CSV, JSON and more structured formats such as ORC, Apache Parquet and Avro. Athena comes already integrated with AWS Glue.
Getting the best out of Athena can require slightly more treatment of the data. However at its heart it's a system which allows you to dump files in a bucket and execute meaningful queries against them without having to provision a database such as an RDS instance or configure any ETL (Extract, Transform, Load) process.
- Allows you to query data stored in an S3 bucket
- Requires no up-front investment in database servers
- Cheap (\$5 per 1TB of data scanned)
- Fast for bulk
- Multiple avenues to improve on base performance
- Includes GIS libraries
- While Athena can be a good choice for querying static data, on the whole it's not suited to read/write transactional workloads. This is because its latency is relatively high and it can't mutate data in place.
- Warm-up overhead makes it unsuited to low-latency UI code
- No parameterized queries
- Querying subset of S3 data requires careful partition design or relatively expensive / slow full data scans. Amazon Kinesis Data Firehose can be used to automatically batch and partition data.