During the lifetime of an application, it's common to emit events when the system is mutated in some way. Propagation of these events might be done through a mechanism like Kafka streams, Redis Pub/Sub, Pusher, or even just an in-memory queue. These events help keep different parts of the system in sync, allowing materialized views to be updated and non-materialized data to be re-fetched. When it comes to mutating a single data-point within the system, I can wrap my head around emitting a single, corresponding event. However, I'm never sure what to do in the context of bulk operations, where a single request may end up changing a multitude of data-points within the application.
Consider an Amazon AWS S3 Bucket as a thought experiment. From what I have read, you can configure an S3 Bucket to emit an event when an Object in the Bucket is deleted. Over time, an S3 Bucket may accumulate millions, maybe even billions of Objects.
Now consider deleting this S3 Bucket. Let that represent our "bulk operation" in this thought experiment. What events should be emitted from the system? Clearly, we should emit some sort of
BucketDeleted event because that's "the thing" that happened. But, what about the collateral operations: the fact that inside that Bucket were Objects; and, those Objects are now gone?
ObjectDeleted event be emitted for every Object in the Bucket?
NOTE: I have no idea what AWS actually does in this case - this is just a thought experiment.
It seems unreasonable that deleting a Bucket should suddenly spawn millions, maybe even billions of events. Such a deluge of events could easily overwhelm and cripple a system.
There's also a semantic question to consider: is a "Delete Object" operation really the same thing in the context of a bulk operation? Meaning, should the system differentiate between these two types of events:
At the very least, having two flavors of "delete" event would allow remote systems to change the way they react to those events.
Again, knowing nothing about how AWS S3 actually manages events, I like that S3 represents absurd scale. Sometimes, it's easier to get at the truth when you can't wrap your head around the size of something. And, at least for me, when I think about the relationship between AWS Buckets and Objects, it seems crazy to even consider emitting Object events when a Bucket is deleted.
And, to create a generalization from this thought experiment, it seems crazy to emit "child" deleted events when a "parent container" is deleted. It just doesn't feel like it scales; and, it just doesn't feel like it is semantically correct.
Considering Database Replication
As I was writing this, it occurred to me that database replication might be fertile ground for further consideration. If I have a read-replica database that is being synchronized with a primary database, what happens when I run
DROP TABLE on the primary? Does that get replicated as a
DROP TABLE operation in the read-replica? Or, does the replication process have to run a
DELETE FROM operation for every row in the dropped table?
I don't really know much of anything about database replication; but, it seems absurd to manage the replication though anything other than the single
DROP TABLE "event".