As mentioned in my last post – Timestamps in Vertica – we had a query which caused our cluster of servers to run at over 90% CPU utilization, when the normally in 10-20 percent range. This query was slowing down all of the other queries that were running at the time, even to the point where the query SELECT 1 FROM dual
was taking 1200+ milliseconds.
In this case, the trick to finding the culprit was to find the queries that had been running for a long period of time.
SELECT *, (start_timestamp - statement_timestamp()) AS current_duration FROM query_requests WHERE is_executing = 't' ORDER BY start_timestamp;
We look at the query_requests
table, and find those queries that are still executing, and sort them by their start time. I also added in a column current_duration
so we could easily determine how long the queries have been running.
With that we found a query that was the culprit, killed that job, and the database cluster recovered quickly.
Hope this can help someone else when they have some runaway queries they need to find.
–Proctor
Pingback: Dew Drop – August 5, 2014 (#1829) | Morning Dew