Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Killing Slow MySQL Queries In An Emergency In Lucee CFML 5.3.7.47

By Ben Nadel on
Tags: ColdFusion, SQL

At InVision, we recently upgraded our MySQL database servers to use MySQL 5.7.x. And, while I'm excited for access to new features like the native JSON column type and the sys performance schema, the upgrade did have some bumps. In particular, we had a subset of SQL queries that started running with terrible performance (typically those that have very large IN(id-list) clauses). I had to refactor a number of queries; but, while that code was being changed, I had to put together an emergency utility that would allow me to start killing SQL queries in production. I thought this was a fun little exercise that would be worth sharing in Lucee CFML 5.3.7.47.

ASIDE: Before I started killing queries manually, I did try adding a timeout to my <cfquery> tags. Unfortunately, this did not seem to have any impact. I am not sure why the timeout didn't kill long-running queries in Lucee CFML - I know that this mostly worked when the code was running on Adobe ColdFusion.

This emergency utility has two main aspects: the ProcessListGateway.cfc, which gives me access to the MySQL processlist that returns all the database thread-states; and, a View that lists the currently-running queries and allows me to kill them individually.

Let's look at my ColdFusion component gateway first. It has two main methods: one that returns the processlist and one that executes a KILL QUERY SQL statement against a given thread ID. Each of these queries takes a datasource against which to run the query since we have multiple replicas in production.

component
	output = false
	hint = "I provide low-level query methods for MySQL process-list interactions (for emergency use)."
	{

	/**
	* I return the datasources that can be inspected (and subsequently affected).
	*/
	public array function getDatasources() {

		return([ "primary", "replica", "replica2" ]);

	}


	/**
	* I get the process list info for the given datasource.
	* 
	* @datasource I am the ColdFusion datasource being inspected.
	* @minTime I am the time over which a query must be running in order to be included.
	*/
	public query function getProcessList(
		required string datasource,
		required numeric minTime
		) {

		```
		<cfquery name="local.results" datasource="#datasource#">
			/* DEBUG: processListGateway.getProcessList(). */
			SELECT
				pl.id,
				pl.time,
				pl.state,
				LEFT( pl.info, 300 ) AS info
			FROM
				INFORMATION_SCHEMA.PROCESSLIST pl
			WHERE
				pl.time > <cfqueryparam value="#minTime#" sqltype="integer" />
			AND
				pl.state != ''
			ORDER BY
				pl.time DESC
			;
		</cfquery>
		```

		return( results );

	}


	/**
	* I kill the query with the given ID in the given datasource.
	* 
	* CAUTION: Will throw an error if the given query ID / thread cannot be found.
	* 
	* @datasource I am the ColdFusion datasource being inspected.
	* @queryID I am the thread being killed.
	*/
	public void function killQuery(
		required string datasource,
		required numeric queryID
		) {

		```
		<cfquery name="local.results" datasource="#datasource#">
			/* DEBUG: processListGateway.killQuery(). */
			KILL QUERY <cfqueryparam value="#queryID#" sqltype="integer" />;
		</cfquery>
		```

	}

}

There's not too much going on here. I am using Tag Islands to run SQL inside <cfscript>, which is really just the best thing since sliced-bread! But, other than that, you can't do much. Since this is for emergency use only and running in production (behind an administrative login of course), I wanted to make this was as restrictive as possible.

With this ColdFusion gateway in toe, I then created a simple UI (User Interface) which lists out the running queries in the processlist and allows me to selectively run a KILL QUERY command against a given thread-ID:

<cfscript>

	// Param URL variables.
	param name="url.datasource" type="string" default="";
	param name="url.minTime" type="numeric" default="5";
	param name="url.queryID" type="numeric" default="0";

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	processListGateway = new ProcessListGateway();
	datasources = processListGateway.getDatasources();
	errorMessage = "";

	// Make sure that we are consuming one of the valid datasources. Since we are about
	// to start killing queries, we want to make sure this is as locked-down as possible.
	if ( ! url.datasource.len() ) {

		url.datasource = datasources.first();

	} else if ( ! datasources.contains( url.datasource ) ) {

		throw( type = "InvalidDatasource" );

	}

	// If we have a query ID / thread in the process-list, let's try to kill it.
	if ( url.queryID ) {

		try {

			processListGateway.killQuery( url.datasource, url.queryID );

		} catch ( "database" error ) {

			errorMessage = "Query with ID [#url.queryID#] could not be killed. Message: #error.message#";

		}

	}

	// NOTE: Since the goal here is to kill LONG-RUNNING queries, we're going to supply
	// a min-time that will only return queries that have run LONGER than the given
	// threshold. This way, our process-list won't be littered with common queries that
	// happen to be running at the time this ColdFusion script is run.
	processList = processListGateway.getProcessList( url.datasource, url.minTime );

</cfscript>
<cfoutput>

	<!doctype html>
	<html lang="en">
	<head>
		<meta charset="utf-8" />
		<title>
			Process List (Emergency Use Only)
		</title>
		<link rel="stylesheet" type="text/css" href="./styles.css" />
	</head>
	<body>

		<h1>
			Process List (Emergency Use Only)
		</h1>

		<p>
			<strong>Datasources:</strong>

			<cfloop value="datasource" array="#datasources#">
				<cfif ( datasource eq url.datasource )>					
					&laquo; <strong><a href="?datasource=#encodeForUrl( datasource )#">#encodeForHtml( datasource )#</a></strong> &raquo;
				<cfelse>
					<a href="?datasource=#encodeForUrl( datasource )#">#encodeForHtml( datasource )#</a>
				</cfif>
			</cfloop>
		</p>

		<h2>
			Current Process List For "#encodeForHtml( url.datasource )#"
		</h2>

		<cfif processList.recordCount>

			<table width="100%" border="1" cellpadding="5" cellspacing="0">
			<thead>
				<tr>
					<th scope="col"> ID </th>
					<th scope="col"> Time </th>
					<th scope="col"> State </th>
					<th scope="col"> Info (truncated) </th>
					<th scope="col"> Actions </th>
				</tr>
			</thead>
			<tbody>
				<cfloop query="#processList#">
					<tr>
						<td>
							#encodeForHtml( processList.id )#
						</td>
						<td>
							#numberFormat( processList.time )#s
						</td>
						<td>
							#encodeForHtml( processList.state )#
						</td>
						<td class="sql">
							#encodeForHtml( processList.info )#
						</td>
						<td class="actions">
							<a href="?datasource=#encodeForUrl( url.datasource )#&queryID=#encodeForUrl( processList.id )#">Kill query</a>
						</td>
					</tr>
				</cfloop>
			</tbody>
			</table>

		<cfelse>

			<p>
				<em>There are no running queries in the selected datasource.</em>
			</p>

		</cfif>

		<!--- Show any error from attempting to kill a query. --->
		<cfif errorMessage.len()>
			
			<h2>
				Error Message
			</h2>

			<p>
				#encodeForHtml( errorMessage )#
			</p>

		</cfif>

	</body>
	</html>

</cfoutput>

As you can see, this is just some old-school requests-response life-cycle style programming. And by, "old-school", what I really mean is solid, battle-tested, and just clever enough to get the job done. And, if I kick off some SLEEP() queries in MySQL and run this ColdFusion page, we get the following output:

Slow queries being killed in MySQL using the processlist, kill query, and Lucee CFML.

As you can see, the slow SQL queries start showing up; and, I'm able to kill the ones that I know will start crushing the database CPU.

Obviously, killing SQL queries in production is a "Bad Thing"™ since a prematurely-terminated query will almost certainly lead to a user-facing issue (data not loading, for example). However, in an emergency situation, I'll take the occasional user-facing issue over 100% CPU utilization every day of the week. And now, I have this release valve in Lucee CFML 5.3.7.47 that I can exercise if - and only if - I need to.

Epilogue: Why Not Just Run These SQL Queries Directly in the Database?

You may be asking yourself, why go through all the ColdFusion rigmarole? Why not just run the processlist and kill query SQL statements directly against the database. Well, for starters, doing that once is easy - doing that a number of times is challenging and tedious. Being able to point-and-click in a ColdFusion page makes this much faster and repeatable.

But, more importantly, you don't just give people WRITE-access to a production database. If I wanted to run a kill query statement in production without this code, I'd have to write the SQL query and then open a ticket for our database team to run it on my behalf. Given that our database team is busy, a ticket like this could take hours or event days to fulfill. And, given the fact that I needed to do this multiple times during the emergency window, this would not be a sustainable approach.

By embedding this in the ColdFusion application (behind a secure administrative login), I make it flexible and repeatable and secure. And, by locking the SQL down to accepting only a datasource and a queryID, it's significantly more secure when compared to an engineer with write-access to the database running the query on my behalf.



Reader Comments

@All,

In case this is interesting to anyone, here's the ColdFusion code I was using to spawn the long-running queries for testing:

<cfscript>

	names = [ "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k" ];

	times = names.each(
		( name ) => {

			```
			<cfquery name="local.result" datasource="primary">
				/* SLEEP FOR #name#. */
				SELECT SLEEP( #randRange( 10, 300 )# ) AS #name#;
			</cfquery>
			```

		},
		true // Run in parallel threads.
	);

</cfscript>

I just love love love the ability to run array iteration in parallel in Lucee CFML. So many great use-cases for it.

Reply to this Comment

@Ben

ASIDE: Before I started killing queries manually, I did try adding a timeout to my <cfquery> tags. Unfortunately, this did not seem to have any impact. I am not sure why the timeout didn't kill long-running queries in Lucee CFML - I know that this mostly worked when the code was running on Adobe ColdFusion.

I could be wrong, but I believe this only defines how long the CF template should wait for a response, but the underlying MySQL process continues along completely unaware of any timing constraints.

Reply to this Comment

@All,

I tried to write up a case study of the refactoring that this KILL QUERY approach gave me time to figure out:

www.bennadel.com/blog/4001-case-study-removing-massive-in-id-list-clauses-for-performance-in-mysql-5-7-32-and-lucee-cfml-5-3-7-47.htm

I was able to bring the slowest execution times for a set of queries down from "catastrophic" to simply "poor user experience". Not a perfect solution; but, enough that it stopped the bleeding.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.