Whenever I give hush ’s Redshift cluster some love, I often get back 50GB+ of space. It’s glorious. Ideally you’ll have your ETL scripts run
VACUUMs for you as post jobs though, but for general maintenance, here is my workflow:
Reclaim disk space
When you delete rows in Redshift, it doesn’t actually reallocate that space for new data.
VACUUMing allows Redshift to reclaim it and thus frees up more storage capacity for usage.
Query performance via sorted tables
VACUUMalso re-sorts your tables based on it’s sort key. This is great for query performance.
Query performance via query planner
VACUUMcycles, Redshift’s stats (used for query planning) can get out of sync. Running
ANALYZEonce in a while will allow the query planner to best optimize your queries for maximum performance.
What you’ll need
CREATE SCHEMA adminin case you don’t have one yet
- Create the following views
v_extended_table_infoby executing the SQL snippets. You can find a bunch of other useful admin views here too.
- Tables with sort keys or else
VACUUMonly reclaims disk space, but not sorting.
v_space_used_per_tblis a great view that shows you how many unsorted rows you have as well as the percentage unsorted.
- Any tables with a
pct_unsortedgreater than 15%, hit it with a
v_extended_table_infoadditionally will show you a
- Any tables with
stats_offgreater than 20, hit it with an