Home > software development > Caching, virtual bits don’t rot

Caching, virtual bits don’t rot

October 28th, 2014 Leave a comment Go to comments
! This post is pretty old.

This is very common so I just wanted to address it because it annoys me.

Time based caching is a last resort, not a default go-to.  Virtual pages don’t turn yellow over time, data in a cache doesn’t start to slowly rot away. Caching isn’t a stop-gap solution against bad performance, it’s a layer or multiple layers in your application that you have to think about.

Proper caching strategies can improve the performance of your applications by a metric ton. The reason is obvious, instead of doing something every single time, you only do it when it’s needed. No more, no less.

But more often then I’d like I see caching thrown in as a stop-gap solution, where some part of the application couldn’t scale well enough and some caching is thrown in around it and it’s set to refresh every 5 minutes or every 10 hours or every 24 hours or what have you.  It’s ugly and it’s setting you up for a technical debt.

Caching should be a holistic solution. Applications have (spaghetti legacy code notwithstanding) natural separators between certain parts. The database model, some remote API, your controllers, etc.. These are natural places to add a caching layer. More importantly by adding caching in these places you can ensure that neither side of the code overly depends on the caching.  As opposed to slapping say a few lines of caching code around some bits of code but not others. That just adds to complexity, potentially creates unexpected behaviour, and it probably becomes impossible to do proper cache warming.

Now that we have this thin caching layer. Instead of setting a time to live and calling it a day, actually take a step back and try to get it to cache for as long as possible. Data doesn’t rot, and cached HTML output doesn’t turn yellow.  What you want is independent invalidation.

For the sake of having an example let’s say we have added some caching to our database model and when getting a User object from our repository we actually return a cached version instead of doing a database query. And we won’t invalidate that cache until the User object actually changes. We can detect when it changes by simply triggering the cache invalidation when the User object gets saved with changes.

You want to have the caching as a separate service not tightly integrated with your object model though. Because if you at some point want to do a bulk change on the Users in your database you want to be able to invalidate them all again, and perhaps more important apply cache warming so that the new users get put back into the cache even before the application actually needs it. Because nothing is worse then taking the “the first user to visit the page will trigger it” approach to things.

Another caching optimization step you can take is looking at the data and extracting data that isn’t dependant on each other into separate entities. The point here isn’t normalization, or necessarily looking at cohesion. It’s about cache strategy.  So say a User entity has a counter that keeps track of how often he has logged in. In short this means you’d have to invalidate the cache each time the user logs in, not exactly a perfect world.

So what you can do is extract that counter into it’s own entity and link it to the owning user and make it a property. Now don’t get me wrong, I’m not necessarily talking about moving about tables in your database. Just the internal object representation of the data. So before the User model had perhaps an integer loginCounter property, and now he has a LoginCounter loginCounter property.  Where the LoginCounter can be retrieved and saved by itself without disturbing the User entity, even though they might live in the same table in the database.  Objects aren’t tables and all that jazz.

Now there are unfortunately valid places where you might want time based caching.  Situations where no mater how you slice it it’s just a very expensive operation. And in those situations it’s perfectly valid to just have a cronjob or jobqueue or whatever solution and defer the entire thing to manage performance.

Anyway, if some part of your application is underperforming  take a step back instead of slapping some caching around it and calling it a day.



Categories: software development Tags: , ,
  1. May 5th, 2017 at 04:54 | #1

    I enjoy, cause I found exactly what I used to be having
    a look for. You’ve ended my four day long hunt!
    God Bless you man. Have a great day. Bye

  2. May 16th, 2017 at 20:35 | #2

    Artikcle marketing is one of the best ways that you
    might generate free leads for an business. You must have to usee a thicker
    carpet in these areas. Everyone for you to know easy methods to
    generate leas for their business.

  3. May 29th, 2017 at 23:20 | #3

    Teens Learn Best Online

  1. March 6th, 2017 at 17:23 | #1