Josh Peng

Full-Stack Data Scientist

github linkedin email
Happy Redshift = Happy Me: Optimization Maintenance
Jul 6, 2018
2 minutes read

Keeping Redshift happy is simple, but tedious. Make sure to VACUUM your tables often, especially if you have incremental ETLs running. Every once in a while you’ll also need to run ANALZYE too.

Whenever I give hush ’s Redshift cluster some love, I often get back 50GB+ of space. It’s glorious. Ideally you’ll have your ETL scripts run VACUUMs for you as post jobs though, but for general maintenance, here is my workflow:

Outcomes

  1. Reclaim disk space

    When you delete rows in Redshift, it doesn’t actually reallocate that space for new data. VACUUMing allows Redshift to reclaim it and thus frees up more storage capacity for usage.

  2. Query performance via sorted tables

    VACUUM also re-sorts your tables based on it’s sort key. This is great for query performance.

  3. Query performance via query planner

    After several VACUUM cycles, Redshift’s stats (used for query planning) can get out of sync. Running ANALYZE once in a while will allow the query planner to best optimize your queries for maximum performance.

What you’ll need

  1. CREATE SCHEMA admin in case you don’t have one yet
  2. Create the following views v_space_used_per_tbl and v_extended_table_info by executing the SQL snippets. You can find a bunch of other useful admin views here too.
  3. Tables with sort keys or else VACUUM only reclaims disk space, but not sorting.

Workflow

  1. v_space_used_per_tbl is a great view that shows you how many unsorted rows you have as well as the percentage unsorted.
  2. Any tables with a pct_unsorted greater than 15%, hit it with a VACUUM some_schema.tablename
  3. v_extended_table_info additionally will show you a stats_off column.
  4. Any tables with stats_off greater than 20, hit it with an ANALYZE some_schema.tablename

Tags: #redshift

Back to posts


comments powered by Disqus