
Keeping Redshift happy is simple, but tedious. Make sure to VACUUM
your tables often, especially if you have incremental ETLs running. Every once in a while you’ll also need to run ANALZYE
too.
Whenever I give hush ’s Redshift cluster some love, I often get back 50GB+ of space. It’s glorious. Ideally you’ll have your ETL scripts run VACUUM
s for you as post jobs though, but for general maintenance, here is my workflow:
Outcomes
Reclaim disk space
When you delete rows in Redshift, it doesn’t actually reallocate that space for new data.
VACUUM
ing allows Redshift to reclaim it and thus frees up more storage capacity for usage.Query performance via sorted tables
VACUUM
also re-sorts your tables based on it’s sort key. This is great for query performance.Query performance via query planner
After several
VACUUM
cycles, Redshift’s stats (used for query planning) can get out of sync. RunningANALYZE
once in a while will allow the query planner to best optimize your queries for maximum performance.
What you’ll need
CREATE SCHEMA admin
in case you don’t have one yet- Create the following views
v_space_used_per_tbl
andv_extended_table_info
by executing the SQL snippets. You can find a bunch of other useful admin views here too. - Tables with sort keys or else
VACUUM
only reclaims disk space, but not sorting.
Workflow
v_space_used_per_tbl
is a great view that shows you how many unsorted rows you have as well as the percentage unsorted.- Any tables with a
pct_unsorted
greater than 15%, hit it with aVACUUM some_schema.tablename
v_extended_table_info
additionally will show you astats_off
column.- Any tables with
stats_off
greater than 20, hit it with anANALYZE some_schema.tablename