Skip to main content
Ben Nadel at the NYC Tech Talk Meetup (Aug. 2010) with: John Britton
Ben Nadel at the NYC Tech Talk Meetup (Aug. 2010) with: John Britton ( @johndbritton )

Killing Slow MySQL Queries In An Emergency In Lucee CFML 5.3.7.47

By on
Tags: ,

At InVision, we recently upgraded our MySQL database servers to use MySQL 5.7.x. And, while I'm excited for access to new features like the native JSON column type and the sys performance schema, the upgrade did have some bumps. In particular, we had a subset of SQL queries that started running with terrible performance (typically those that have very large IN(id-list) clauses). I had to refactor a number of queries; but, while that code was being changed, I had to put together an emergency utility that would allow me to start killing SQL queries in production. I thought this was a fun little exercise that would be worth sharing in Lucee CFML 5.3.7.47.

ASIDE: Before I started killing queries manually, I did try adding a timeout to my <cfquery> tags. Unfortunately, this did not seem to have any impact. I am not sure why the timeout didn't kill long-running queries in Lucee CFML - I know that this mostly worked when the code was running on Adobe ColdFusion.

This emergency utility has two main aspects: the ProcessListGateway.cfc, which gives me access to the MySQL processlist that returns all the database thread-states; and, a View that lists the currently-running queries and allows me to kill them individually.

Let's look at my ColdFusion component gateway first. It has two main methods: one that returns the processlist and one that executes a KILL QUERY SQL statement against a given thread ID. Each of these queries takes a datasource against which to run the query since we have multiple replicas in production.

component
	output = false
	hint = "I provide low-level query methods for MySQL process-list interactions (for emergency use)."
	{

	/**
	* I return the datasources that can be inspected (and subsequently affected).
	*/
	public array function getDatasources() {

		return([ "primary", "replica", "replica2" ]);

	}


	/**
	* I get the process list info for the given datasource.
	* 
	* @datasource I am the ColdFusion datasource being inspected.
	* @minTime I am the time over which a query must be running in order to be included.
	*/
	public query function getProcessList(
		required string datasource,
		required numeric minTime
		) {

		```
		<cfquery name="local.results" datasource="#datasource#">
			/* DEBUG: processListGateway.getProcessList(). */
			SELECT
				pl.id,
				pl.time,
				pl.state,
				LEFT( pl.info, 300 ) AS info
			FROM
				INFORMATION_SCHEMA.PROCESSLIST pl
			WHERE
				pl.time > <cfqueryparam value="#minTime#" sqltype="integer" />
			AND
				pl.state != ''
			ORDER BY
				pl.time DESC
			;
		</cfquery>
		```

		return( results );

	}


	/**
	* I kill the query with the given ID in the given datasource.
	* 
	* CAUTION: Will throw an error if the given query ID / thread cannot be found.
	* 
	* @datasource I am the ColdFusion datasource being inspected.
	* @queryID I am the thread being killed.
	*/
	public void function killQuery(
		required string datasource,
		required numeric queryID
		) {

		```
		<cfquery name="local.results" datasource="#datasource#">
			/* DEBUG: processListGateway.killQuery(). */
			KILL QUERY <cfqueryparam value="#queryID#" sqltype="integer" />;
		</cfquery>
		```

	}

}

There's not too much going on here. I am using Tag Islands to run SQL inside <cfscript>, which is really just the best thing since sliced-bread! But, other than that, you can't do much. Since this is for emergency use only and running in production (behind an administrative login of course), I wanted to make this was as restrictive as possible.

With this ColdFusion gateway in toe, I then created a simple UI (User Interface) which lists out the running queries in the processlist and allows me to selectively run a KILL QUERY command against a given thread-ID:

<cfscript>

	// Param URL variables.
	param name="url.datasource" type="string" default="";
	param name="url.minTime" type="numeric" default="5";
	param name="url.queryID" type="numeric" default="0";

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	processListGateway = new ProcessListGateway();
	datasources = processListGateway.getDatasources();
	errorMessage = "";

	// Make sure that we are consuming one of the valid datasources. Since we are about
	// to start killing queries, we want to make sure this is as locked-down as possible.
	if ( ! url.datasource.len() ) {

		url.datasource = datasources.first();

	} else if ( ! datasources.contains( url.datasource ) ) {

		throw( type = "InvalidDatasource" );

	}

	// If we have a query ID / thread in the process-list, let's try to kill it.
	if ( url.queryID ) {

		try {

			processListGateway.killQuery( url.datasource, url.queryID );

		} catch ( "database" error ) {

			errorMessage = "Query with ID [#url.queryID#] could not be killed. Message: #error.message#";

		}

	}

	// NOTE: Since the goal here is to kill LONG-RUNNING queries, we're going to supply
	// a min-time that will only return queries that have run LONGER than the given
	// threshold. This way, our process-list won't be littered with common queries that
	// happen to be running at the time this ColdFusion script is run.
	processList = processListGateway.getProcessList( url.datasource, url.minTime );

</cfscript>
<cfoutput>

	<!doctype html>
	<html lang="en">
	<head>
		<meta charset="utf-8" />
		<title>
			Process List (Emergency Use Only)
		</title>
		<link rel="stylesheet" type="text/css" href="./styles.css" />
	</head>
	<body>

		<h1>
			Process List (Emergency Use Only)
		</h1>

		<p>
			<strong>Datasources:</strong>

			<cfloop value="datasource" array="#datasources#">
				<cfif ( datasource eq url.datasource )>					
					&laquo; <strong><a href="?datasource=#encodeForUrl( datasource )#">#encodeForHtml( datasource )#</a></strong> &raquo;
				<cfelse>
					<a href="?datasource=#encodeForUrl( datasource )#">#encodeForHtml( datasource )#</a>
				</cfif>
			</cfloop>
		</p>

		<h2>
			Current Process List For "#encodeForHtml( url.datasource )#"
		</h2>

		<cfif processList.recordCount>

			<table width="100%" border="1" cellpadding="5" cellspacing="0">
			<thead>
				<tr>
					<th scope="col"> ID </th>
					<th scope="col"> Time </th>
					<th scope="col"> State </th>
					<th scope="col"> Info (truncated) </th>
					<th scope="col"> Actions </th>
				</tr>
			</thead>
			<tbody>
				<cfloop query="#processList#">
					<tr>
						<td>
							#encodeForHtml( processList.id )#
						</td>
						<td>
							#numberFormat( processList.time )#s
						</td>
						<td>
							#encodeForHtml( processList.state )#
						</td>
						<td class="sql">
							#encodeForHtml( processList.info )#
						</td>
						<td class="actions">
							<a href="?datasource=#encodeForUrl( url.datasource )#&queryID=#encodeForUrl( processList.id )#">Kill query</a>
						</td>
					</tr>
				</cfloop>
			</tbody>
			</table>

		<cfelse>

			<p>
				<em>There are no running queries in the selected datasource.</em>
			</p>

		</cfif>

		<!--- Show any error from attempting to kill a query. --->
		<cfif errorMessage.len()>
			
			<h2>
				Error Message
			</h2>

			<p>
				#encodeForHtml( errorMessage )#
			</p>

		</cfif>

	</body>
	</html>

</cfoutput>

As you can see, this is just some old-school requests-response life-cycle style programming. And by, "old-school", what I really mean is solid, battle-tested, and just clever enough to get the job done. And, if I kick off some SLEEP() queries in MySQL and run this ColdFusion page, we get the following output:

Slow queries being killed in MySQL using the processlist, kill query, and Lucee CFML.

As you can see, the slow SQL queries start showing up; and, I'm able to kill the ones that I know will start crushing the database CPU.

Obviously, killing SQL queries in production is a "Bad Thing"™ since a prematurely-terminated query will almost certainly lead to a user-facing issue (data not loading, for example). However, in an emergency situation, I'll take the occasional user-facing issue over 100% CPU utilization every day of the week. And now, I have this release valve in Lucee CFML 5.3.7.47 that I can exercise if - and only if - I need to.

Epilogue: Why Not Just Run These SQL Queries Directly in the Database?

You may be asking yourself, why go through all the ColdFusion rigmarole? Why not just run the processlist and kill query SQL statements directly against the database. Well, for starters, doing that once is easy - doing that a number of times is challenging and tedious. Being able to point-and-click in a ColdFusion page makes this much faster and repeatable.

But, more importantly, you don't just give people WRITE-access to a production database. If I wanted to run a kill query statement in production without this code, I'd have to write the SQL query and then open a ticket for our database team to run it on my behalf. Given that our database team is busy, a ticket like this could take hours or event days to fulfill. And, given the fact that I needed to do this multiple times during the emergency window, this would not be a sustainable approach.

By embedding this in the ColdFusion application (behind a secure administrative login), I make it flexible and repeatable and secure. And, by locking the SQL down to accepting only a datasource and a queryID, it's significantly more secure when compared to an engineer with write-access to the database running the query on my behalf.

Want to use code from this post? Check out the license.

Reader Comments

15,674 Comments

@All,

In case this is interesting to anyone, here's the ColdFusion code I was using to spawn the long-running queries for testing:

<cfscript>

	names = [ "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k" ];

	times = names.each(
		( name ) => {

			```
			<cfquery name="local.result" datasource="primary">
				/* SLEEP FOR #name#. */
				SELECT SLEEP( #randRange( 10, 300 )# ) AS #name#;
			</cfquery>
			```

		},
		true // Run in parallel threads.
	);

</cfscript>

I just love love love the ability to run array iteration in parallel in Lucee CFML. So many great use-cases for it.

205 Comments

@Ben

ASIDE: Before I started killing queries manually, I did try adding a timeout to my <cfquery> tags. Unfortunately, this did not seem to have any impact. I am not sure why the timeout didn't kill long-running queries in Lucee CFML - I know that this mostly worked when the code was running on Adobe ColdFusion.

I could be wrong, but I believe this only defines how long the CF template should wait for a response, but the underlying MySQL process continues along completely unaware of any timing constraints.

2 Comments

TOTALLY enjoyed this post, about Killing SLOW MySQL queries, but alas, we use MSSQL (with Lucee 5.3.8.206)... and processList is not part of the INFORMATION_SCHEMA. Suggestions?

15,674 Comments

@Marilou,

I'm glad you enjoyed the post. Unfortunately, I haven't used MS SQL Server in over a decade, so most of what I used to know about it has slowly disappeared. I just tried to Google for some related information and nothing super helpful came up. Sorry :(

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel