You might as well timestamp it
future-you will be glad you stored a timestamp instead of that boolean
In my 15+ years of web development, there are very few things I can say are unequivocally a good idea. It almost always does depend.
Storing timestamps instead of booleans, however, is one of those things I can go out on a limb and say it doesn’t really depend all that much. You might as well timestamp it. There are plenty of times in my career when I’ve stored a boolean and later wished I’d had a timestamp. There are zero times when I’ve stored a timestamp and regretted that decision.
Why is that? Because any time a piece of data is important enough to store its truth/falsehood in your database, it’s likely important enough that you’ll eventually want to know when that state was set. It’s like the exact opposite of YAGNI. You Ain’t Gonna Regret It? 🤔
Even in the rare case that you never need that timestamp… what have you lost? Its cost is negligible, both in data storage and coding overhead. Your code merely needs to consider a NULL
timestamp as false
and any non-NULL
timestamp as true
and there’s your boolean use-case. An example, in JS:
// using a boolean
let is_published = true
if (is_published) console.log("it's true!")
if (!is_published) console.log("it's false!")
// using a timestamp
let published_at = new Date()
if (published_at) console.log("it's true!")
if (!published_at) console.log("it's false!")
The ergonomics are darn-near identical. And you can pretty this up with a helper function called isPublished()
which checks the published_at
attribute. Basic stuff you already know, I’m sure!
- The next time you’re tempted to store an
is_deleted
boolean, reach for adeleted_at
timestamp instead. - If you think you need an
is_hidden
field, try usinghidden_at
in its place. Live with that for awhile and then ask yourself if you regret the decision. - Or if you’re about to create a
has_signed_in
boolean? You’re gonna want to know when that happened:signed_in_at
to the rescue.
Future-you will thank you. 🙏
Discussion
Sign in or Join to comment or subscribe
Brett Cannon
Vancouver, BC, Canada
2021-04-23T22:51:53Z ago
This came up in the Django corner of Twitter recently: https://twitter.com/webology/status/1384588570594140160 (in reply to https://twitter.com/simonw/status/1384580075329179650).
Jerod Santo
Bennington, Nebraska
Jerod co-hosts The Changelog, crashes JS Party & takes out the trash (his old code) once in awhile.
2021-04-25T13:25:07Z ago
Funny!
Speaking of frameworks, I think I slowly learned this from Rails’ convention of automatically creating and managing
created_at
andupdated_at
timestamps on your behalf. Turns out thatupdated_at
timestamp is super handy to answer simple questions like, “has this record ever been edited?”Bartosz Pieńkowski
2021-05-06T16:52:33Z ago
I wrote a gem which takes the approach you described in the article to the next level. It generates a bunch of methods on top of a timestamp field to make it easy to use as a boolean flag. The gem is called active_record-events.
Vincent Elschot
2021-04-25T07:09:26Z ago
I absolutely agree with the need to store timestamps instead of booleans…. to an extent. The question ‘did this happen’ is pretty much always followed up by ‘ok, and when did it happen?’ Especially when soemtong odd happens or something needs to be debugged. Something as simple as a person complaining about now getting a product in their order, a boolean can tell you that a product is out of stock, a timestamp can tell you that it ran out of stock before the costomer tried to add it to their cart.
On the flip-side however, a timestamp is 4 bytes long in a database, a boolean is one bit long. So if you replace eight booleans (created, modified,deleted…) by eight timestamps, you are adding 32 bytes of data to each record and to the indexes, and therefor also to the RAM required to process the queries. if your table is large and you really only use the timestamp for auditing then you may want to keep the booleans and store the timestamps seperately so they don’t interfere with the bulk of the work.
Jerod Santo
Bennington, Nebraska
Jerod co-hosts The Changelog, crashes JS Party & takes out the trash (his old code) once in awhile.
2021-04-25T13:21:41Z ago
Fair point.
I would wait until it’s 100% clear that these two conditions are true before making this optimization, myself.
Vincent Elschot
2021-04-25T16:37:35Z ago
Absolutely, your DBA should keep an eye on this and complain when it becomes an issue.
2021-04-25T16:41:50Z ago
That is nice!
Peter R. Fletcher
2021-05-03T18:41:41Z ago
While you might think that storing booleans would require less space in a database, I don’t know of any databases that do actually store them as bits packed in a bitfield, which is what you would have to do if you really wanted to minimize disk space. In practice, most seem to store them as short ints - either set to 1 for True and 0 for False or to -1 for True and 0 for False - so the savings is much less than you think. In addition, it is mostly much faster to test an int for zero/nonzero than to mask out and test bits.
Vincent Elschot
2021-05-03T19:33:33Z ago
Sorry, I was confusing the boolean with a NULL value wich is stored as a single bit in PostreSQL. A boolean in PostgreSQL requires one byte. That is still 7 bytes less than a timestamp. Sure, it’s all relative but it’s something to keep in mind; storing a boolean may not give you enough information, storing a timestamp requires eight times the storage and that is something you must take into account.
Peter R. Fletcher
2021-05-03T20:00:51Z ago
Surely, a Null value, however it may be encoded, takes up the same space on disk in a database as a non-Null value in the same field would. Or am I missing something?
James M Curran
2021-05-03T19:08:05Z ago
This works for a one-way toggle, like is_published, where it is false, until it is true, and then it stays true forever. But if it’s a true toggle, when it goes back & forth between true & false, (say “is_membership_active”) then you’re gonna want to know “when did it go false?” just as often as “when did it go true?” so, you’re back to storing a Boolean, with a timestamp.
Jerod Santo
Bennington, Nebraska
Jerod co-hosts The Changelog, crashes JS Party & takes out the trash (his old code) once in awhile.
2021-05-03T20:12:49Z ago
I think in that case it does depend on the nature of the field. I’ve certainly used
published_at
in reverse and never cared when I unpublished the thing. Same withdeleted_at
and not cared when I undeleted the thing.I agree it gets more complicated with memberships and other things that can go back and forth between different states, where each one is important. In this case I’d use a state machine instead of a boolean/timestamp pair.
John W Wolfe
2021-05-08T00:45:18Z ago
Not familiar with the term state machine and what it would provide instead of a Boolean + time stamp combo. Do you mean like an event sourced model where there’s like “ toggle on” and “toggle of” event types and you store a record of them and the current state is derived from the source of events?
2021-05-08T08:33:03Z ago
This site is also very good
Jerod Santo
Bennington, Nebraska
Jerod co-hosts The Changelog, crashes JS Party & takes out the trash (his old code) once in awhile.
2021-05-10T13:58:50Z ago
No, I’m referring to a finite-state machine, which you can code yourself but there’s often packages that do the foundational stuff for you. Here’s one in JS, for instance.
John W Wolfe
2021-05-10T17:45:32Z ago
Ah got it - I have seen this used in projects before and wasn’t sure that it was called a “state machine” - makes perfect sense. Also I think this one is very commonly used in JS world - https://xstate.js.org/docs/