Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at CFUNITED 2010 (Landsdown, VA) with: Joshua Cyr and Brian Rinaldi
Ben Nadel at CFUNITED 2010 (Landsdown, VA) with: Joshua Cyr@jcyr ) and Brian Rinaldi@remotesynth )

Thought Experiment: Splitting A Single Data Table Up Into High-Writes And High-Reads For Better Performance In MySQL

By Ben Nadel on

CAUTION: This post is just me thinking out-loud about an interesting MySQL database performance problem that I have faced at work.

At InVision, we store a date/time stamp that represents the last request that each user makes to the application. We used to use this data-point to indicate "presence" within the application; and now, we use it to better understand the heartbeat of the system. For the sake of this thought experiment, imagine that we have a user table that has the following columns:

  • id (pkey)
  • name
  • email
  • lastRequestAt (date/time)

As you might imagine, in a thick-client, AJAX (Asynchronous JavaScript and JSON) driven application, the user makes a lot of requests to the server. As such, this lastRequestAt column gets written-to the database a lot. The problem is, this lastRequestAt column is part of the user table; which, as you might also guess, gets read-from a lot. In fact, every request during a logged-in Session would cause both a read-from and a write-to the user table.

For the sake of discussion, image that each request to the application results in the following two SQL queries:

UPDATE
	user u
SET
	u.lastRequestAt = UTC_TIMESTAMP()
WHERE
	u.id = ?
;

SELECT
	u.id,
	u.name,
	u.email,
	u.lastRequestAt
FROM
	user u
WHERE
	u.id = ?
;

In the early stages of the product, this wasn't a problem. We had a relatively small set of users and a reasonable amount of traffic. Reading-from and writing-to the user table was a non-issue. But, as the size of the database grew, and the traffic increased, we started to see a lot of row-level locking issues, specifically around UPDATE and SELECT calls to the user table.

Being in "start-up mode", we solved this problem with the lowest-level of effort possible (so that we could get back to Feature Development). To that effect, we simply lowered the fidelity of the date/time stamp. Instead of logging it on every request, we only logged it when the last recorded value was more than 60-seconds old.

Then, we changed it to 5-minutes old.

Then, we changed it to 10-minutes old.

We kept changing it because each change would reduce - but not eliminate - the row-level locking and latency problems on the user table.

Eventually, we started dual-writing the lastRequestAt value to both the MySQL database and to the Session object, which we store in Redis. And, since Redis is insanely fast, and doesn't have to maintain database indexes, the Redis value helped to minimize the number of concurrent Reads and Writes to the user table.

Last week, in our Architecture Office Hours meeting, I brought this topic to the group and asked them what kind of data-access pattern we might create if we could go back and rebuild the way that we store this lastRequestAt column. Talking with Iccha Sethi, one of our Principal Engineers, I think we came up with some interesting ideas.

First, we could split the user table up into high-read and high-write columns. So, instead of having the lastRequestAt column live in the user table, we could elevate the column to have its own domain concept: perhaps an additional table like, user_presence. This would leave us with two tables:

  • user - high-volume reads
    • id (pkey)
    • name
    • email
  • user_presence - high-volume reads and writes
    • userID (pkey)
    • lastRequestAt

This split would fix the user table, leaving it as a high-read-only table; but, we may still end up with the same locking issue on the new user_presence table. After all, we only moved the column - we didn't fundamentally change anything about the request architecture.

In addition to splitting the user table up into two tables, we could also use Redis to buffer some of that data churn. In much the same way as we are doing now, we could store the current lastRequestAt value in Redis such that Redis would act as a write-through cache. This way, every request to the application would read-from the Redis store, but write-to the MySQL table (as well as the Redis store - the way a write-through cache works).

Pseudo-code for this workflow might look like this:

<cfscript>
	
	// Read the session from Redis.
	userSession = redis.hgetall( "sessions:#userID#:#sessionToken#" );

	// Optimistically update the in-memory value for presence.
	userSession.lastRequestAt = now();

	// ASYNCHRONOUSLY log the request timestamp to both Redis and MySQL.
	runAsync(
		() => {

			// Update the write-through cache.
			redis.hset(
				"sessions:#userID#:#sessionToken#",
				"lastRequestAt",
				userSession.lastRequestAt
			);

			// Update the underlying database record.
			userPresenceGateway.update({
				userID: userSession.id,
				lastRequestAt: userSession.lastRequestAt
			});

		}
	);

</cfscript>

With this architecture, the new user_presence table becomes much more of a write-heavy table. The current user only needs to read-from it when the Session is initialized. And, other users only need to read-from it when they are rendering a list of users along with their respective presence states (which, in our application, is not nearly as often).

Of course, if the user_presence table starts to become its own performance problem, we could always put a write-through cache in front of the entire table. Meaning, we could cache the entire user_presence table, which only has two-columns per user, in Redis. Then, read from Redis as needed instead of ever reading from MySQL (except when populating the cache).

ASIDE: Caching an entire table could be tricky. How does that cache get populated in a horizontally scaled application? Are their TTL (Time To Live) expirations on the keys? Could you populate the cache just-in-time as different userID values are requested?

Or, maybe we could even store the user_presence data entirely within its own Redis instance; and then, configure that Redis instance to persist to disk. This way, we would completely isolate the data for "user presence" within its own incredibly fast RAM-based key-value store (eliminating the user_presence table from MySQL altogether).

I suspect that each one of these steps becomes progressively more expensive from both a logic and a dollars-and-cents point-of-view.

Right now, some of this stuff is just theory (for me). But, taken as a whole, what it represents is a shift in how I think about Redis. As performance becomes a problem, Redis isn't just a "nice to have" way of improving performance - it becomes a fundamental part of the Application Architecture. And, to be honest, I'm not used to thinking in a multi-persistence model. I was raised on relational databases; and, even thinking about storing specific-types of data in different types of specialized databases is not yet something I have a good instinct for.

Another general pattern that I am seeing here is the elevation of "domain concepts". What I mean by that is the promotion of an "entity property" into a full-on "entity". In this case, the "User" property lastRequestAt is being promoted to the domain-level concept of "Presence".

The more complex my applications become, the more I find myself doing - or wishing I had done - this (elevating entity properties). Creating a domain-level concept makes it easier to think about how data can be consumed and evolved within an application.

And, of course, if anyone has any suggestions here on how they've solved similar problems, I would love to hear about it. There's definitely no "one right answer" here. Different contexts demand different solutions. The trickiest part is sometimes just knowing when you have to stop and modify your current approach.



Reader Comments

@All,

It might be time for me to flip-through Designing Data-Intensive Applications by Martin Kleppmann. I read this book a few years ago:

https://www.bennadel.com/blog/3344-designing-data-intensive-applications-the-big-ideas-behind-reliable-scalable-and-maintainable-systems-by-martin-kleppmann.htm

... and it mostly went way over my head. But, this sounds exactly like one of the topics Martin would have discussed.

I might also want to flip-through High Performance MySQL by Baron Schwartz, Peter Zaitsev, and Vadim Tkachenko:

https://www.bennadel.com/blog/3168-high-performance-mysql-optimization-backups-and-replication-by-baron-schwartz-peter-zaitsev-and-vadim-tkachenko.htm

This is another book I read a few years ago (most of which went over my head). But, they would also have been likely to cover the topic of high read/write tables.

Reply to this Comment

Hi Ben,

Could you do something like multiple rows for each users presence so say for the table userPresence you had
UserId, Segment, Timestamp
1, 0, '2019-10-11 15:06:00',
2, 15, '2019-10-11 15:06:07',
3, 30, '2019-10-11 15:06:06',
3, 45, '2019-10-11 15:06:01',

Then when you want to read Bens last presence you would do something like SELECT * from userPresence where UserId = 2 ORDER by timestamp LIMIT 1; to get most recent.

The difficult thing being the how to divide up but perhaps if you could do it so consecutive writes were segmented based on time so 1st write went to row with segment 0, second segment 15 something etc where the segments where small enough for consecutive writes to go to different rows.

Not sure how that would perform or lock up when you read, but it would avoid you writing to the same row repeatedly.

Reply to this Comment

@Adam,

oops sorry mistyped meant each userId to be the same number so
1, 0, '2019-10-11 15:06:00',
1, 15, '2019-10-11 15:06:07',
1, 30, '2019-10-11 15:06:06',
1, 45, '2019-10-11 15:06:01',

Reply to this Comment

I wonder if you could solve this by queuing. When you process the queue every minute or so, you remove the duplicates and only write the most recent timestamp for each user. This would probably reduce the number of writes you do significantly if there is a high percentage of duplication.

Reply to this Comment

@Pete,

Having been reading up a bit on Redis lately, the idea of queuing actually popped to mind using their Lists. In their accompanying documentation, they have patterns for "delayed tasks", which could maybe used for the queuing mechanism:

https://redislabs.com/ebook/part-2-core-concepts/chapter-6-application-components-in-redis/6-4-task-queues/6-4-2-delayed-tasks/

The complexity with a queue is that you need an additional worker process to be consuming the queue; so, there is some complexity there. Though, we're already talking about a context where some increased complexity is already needed (hence this entire post).

I feel like if I was better with something like Redis or RabbitMQ or SQS, then such a solution would be easier for me to think about.

Reply to this Comment

@Adam,

It's an interesting idea. But, I think part of the problem with our AJAX / Single-Page Application is that a user may spawn many concurrent requests to the app. So, even with the bucketing, I suspect that you'd still get contention around a single record. Of course, I think having it in its own table would go a long way towards locking. Unfortunately, I don't know enough about database mechanics to know what return on investment is.

Reply to this Comment

The only issue I can think of, with delegating more of the responsibility to Redis, is that if the server goes down, you lose the data from Redis.

The other solution is to store the timestamp, alongside the userid, in a JWT/JWE. The web token is then passed back & forwards from client to server. During the JWT's validation process on the server, you could just update the timestamp. In fact, it is good practice to check the timestamp to see if the JWT has expired.

I generally issue a new JWT, when the user logs in and then instead of using session variables, I pass a custom JSON object inside the custom key of the JWT, which contains all the user related values for that session. Any values that I need to persist across page refreshes, which don't happen too often in an SPA, I store in a client cookie. You can further enhance security by using a JWE. I wrote a ColdFusion library for JWE, which you could use:

https://www.forgebox.io/view/JWTSignEncrypt

Actually, I think I wrote this, off the back of one of your articles on JWT.

Reply to this Comment

Hey Ben,

Great read; it's always interesting to hear about real-world problems in systems at scale.

Rather than fighting against the row locking, could you use dirty reads to avoid the locks?

With a database like MS SQL Server you would use the "WITH NOLOCK" hint on the select. I'm no MySQL specialist, but it looks like the equivalent functionality would be achieved using "READ UNCOMMITTED" at the transaction level:
https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html

Some more info talking this subject for MySQL:
http://itecsoftware.com/with-nolock-table-hint-equivalent-for-mysql

Reply to this Comment

Probably shouldn't have LastUpdatedAt field in the User table, or really any field that changes a lot as an attribute in a object table. Maybe better to have a log table and stitch together with a view. Some types of these tables, like a User table, can have insert/update/delete triggers on it for auditing (say into the UserAudit table).

Reply to this Comment

@Sean,

Definitely tending to agree with that these days. A learning experience for me. I tend to think primarily about read-access and what indexes I need to create. I am only now just starting to think more holistically about write access patterns.

Reply to this Comment

@Glenn,

That's a great question. I vaugely remember looking into the NOLOCK hint back in the day when I was on SQL SERVER. But, I don't remember much about it. I think it was all theory to me at that point. I also don't have enough insight to know how this would affect performance. I'll keep that in the back of my mind in case I find an easy way to A/B test something like this in production (where I wouldn't cause any harm).

Reply to this Comment

@Charles,

I'm fascinated with JTW; but, I am still very much a n00b when it comes to pragmatic applications. Right now, our systems half-run on JWT and half-run on old-school Session IDs that are mapped to persisted session data. So, to your point, if Redis goes down, "the system" goes down :D

I have a small book on JWT queued-up in my iBooks; but, I just haven't gotten around to reading it yet.

Reply to this Comment

Wait a second. You can't be that much of a noob:) You wrote an inspiring article on JWT, with an accompanying JWT library. I then extended this to create a JWE library!

Maybe, you forgot about it! It's easy to do!

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.