Jerod Santo changelog.com/posts

You might as well timestamp it

future-you will be glad you stored a timestamp instead of that boolean

In my 15+ years of web development, there are very few things I can say are unequivocally a good idea. It almost always does depend.

Storing timestamps instead of booleans, however, is one of those things I can go out on a limb and say it doesn’t really depend all that much. You might as well timestamp it. There are plenty of times in my career when I’ve stored a boolean and later wished I’d had a timestamp. There are zero times when I’ve stored a timestamp and regretted that decision.

Why is that? Because any time a piece of data is important enough to store its truth/falsehood in your database, it’s likely important enough that you’ll eventually want to know when that state was set. It’s like the exact opposite of YAGNI. You Ain’t Gonna Regret It? 🤔

Even in the rare case that you never need that timestamp… what have you lost? Its cost is negligible, both in data storage and coding overhead. Your code merely needs to consider a NULL timestamp as false and any non-NULL timestamp as true and there’s your boolean use-case. An example, in JS:

// using a boolean
let is_published = true

if (is_published) console.log("it's true!")
if (!is_published) console.log("it's false!")

// using a timestamp
let published_at = new Date()

if (published_at) console.log("it's true!")
if (!published_at) console.log("it's false!")

The ergonomics are darn-near identical. And you can pretty this up with a helper function called isPublished() which checks the published_at attribute. Basic stuff you already know, I’m sure!

  • The next time you’re tempted to store an is_deleted boolean, reach for a deleted_at timestamp instead.
  • If you think you need an is_hidden field, try using hidden_at in its place. Live with that for awhile and then ask yourself if you regret the decision.
  • Or if you’re about to create a has_signed_in boolean? You’re gonna want to know when that happened: signed_in_at to the rescue.

Future-you will thank you. 🙏


Discussion

Sign in or Join to comment or subscribe

2021-04-23T22:51:53Z ago

This came up in the Django corner of Twitter recently: https://twitter.com/webology/status/1384588570594140160 (in reply to https://twitter.com/simonw/status/1384580075329179650).

Jerod Santo

Jerod Santo

Omaha, Nebraska

Jerod co-hosts The Changelog, crashes JS Party, and takes out the trash (his old code) once in awhile.

2021-04-25T13:25:07Z ago

Funny!

Speaking of frameworks, I think I slowly learned this from Rails’ convention of automatically creating and managing created_at and updated_at timestamps on your behalf. Turns out that updated_at timestamp is super handy to answer simple questions like, “has this record ever been edited?”

2021-04-25T07:09:26Z ago

I absolutely agree with the need to store timestamps instead of booleans…. to an extent. The question ‘did this happen’ is pretty much always followed up by ‘ok, and when did it happen?’ Especially when soemtong odd happens or something needs to be debugged. Something as simple as a person complaining about now getting a product in their order, a boolean can tell you that a product is out of stock, a timestamp can tell you that it ran out of stock before the costomer tried to add it to their cart.

On the flip-side however, a timestamp is 4 bytes long in a database, a boolean is one bit long. So if you replace eight booleans (created, modified,deleted…) by eight timestamps, you are adding 32 bytes of data to each record and to the indexes, and therefor also to the RAM required to process the queries. if your table is large and you really only use the timestamp for auditing then you may want to keep the booleans and store the timestamps seperately so they don’t interfere with the bulk of the work.

Jerod Santo

Jerod Santo

Omaha, Nebraska

Jerod co-hosts The Changelog, crashes JS Party, and takes out the trash (his old code) once in awhile.

2021-04-25T13:21:41Z ago

Fair point.

if your table is large and you really only use the timestamp for auditing

I would wait until it’s 100% clear that these two conditions are true before making this optimization, myself.

2021-05-03T18:41:41Z ago

While you might think that storing booleans would require less space in a database, I don’t know of any databases that do actually store them as bits packed in a bitfield, which is what you would have to do if you really wanted to minimize disk space. In practice, most seem to store them as short ints - either set to 1 for True and 0 for False or to -1 for True and 0 for False - so the savings is much less than you think. In addition, it is mostly much faster to test an int for zero/nonzero than to mask out and test bits.

2021-05-03T19:33:33Z ago

Sorry, I was confusing the boolean with a NULL value wich is stored as a single bit in PostreSQL. A boolean in PostgreSQL requires one byte. That is still 7 bytes less than a timestamp. Sure, it’s all relative but it’s something to keep in mind; storing a boolean may not give you enough information, storing a timestamp requires eight times the storage and that is something you must take into account.

2021-05-03T19:08:05Z ago

This works for a one-way toggle, like is_published, where it is false, until it is true, and then it stays true forever. But if it’s a true toggle, when it goes back & forth between true & false, (say “is_membership_active”) then you’re gonna want to know “when did it go false?” just as often as “when did it go true?” so, you’re back to storing a Boolean, with a timestamp.

Jerod Santo

Jerod Santo

Omaha, Nebraska

Jerod co-hosts The Changelog, crashes JS Party, and takes out the trash (his old code) once in awhile.

2021-05-03T20:12:49Z ago

I think in that case it does depend on the nature of the field. I’ve certainly used published_at in reverse and never cared when I unpublished the thing. Same with deleted_at and not cared when I undeleted the thing.

I agree it gets more complicated with memberships and other things that can go back and forth between different states, where each one is important. In this case I’d use a state machine instead of a boolean/timestamp pair.

2021-05-08T00:45:18Z ago

Not familiar with the term state machine and what it would provide instead of a Boolean + time stamp combo. Do you mean like an event sourced model where there’s like “ toggle on” and “toggle of” event types and you store a record of them and the current state is derived from the source of events?

0:00 / 0:00