The picture below explains it better than I can with words. 👇
Today I’m excited to introduce a new place for devs to polish their Postgres skills, a Postgres playground from Crunchy Data. What is the playground? Put simply it is:
- Postgres running in your local web browser
- With canned datasets you can load
- Guided tutorials to follow along to learn about the power of Postgres
Since it’s running in the browser you can only connect to it using the embedded
psql interface provided. So, it’s not actually useful for real workloads, but it is useful for learning and experimenting.
Reading other people’s code is a sure-fire way to improve as a developer. But what about as a database designer? The same process applies!
This post by Shekhar Gulati is him sharing what he learned by applying that principle to GitLab’s schema.
I learnt a lot from the GitLab schema. They don’t blindly apply the same practices to all the table designs. Each table makes the best decision based on its purpose, the kind of data it stores, and its rate of growth.
More posts like this, please!
Miguel Grinberg built a SQLite-backed web app for the blogging team at Twilio to track their effectiveness that started to respond slugishly when it grew much bigger than he expected (6.5 million individual daily traffic records, and with a user base that grew to over 200 employees).
He thought this might be a good time to switch to Postgres, but he wasn’t sure if the wins he expected would be realized:
Having publicly professed my dislike of performance benchmarks, I resisted the urge to look up any comparisons online, and instead embarked on a series of experiments to accurately measure the performance of these two databases for the specific use cases of this application.
In this in-depth article on Twilio’s blog (no doubt being tracked by Miguel’s web app as we speak) he goes into the details of his efforts with lots of interesting findings along the way (even a plot twist!). I’ll leave you with this statement from the end:
If you are going to take one thing away from this article, I hope it is that the only benchmarks that are valuable are those that run on your own platform, with your own stack, with your own data, and with your own software. And even then, you may need to add custom optimizations to get the best performance.
When we had Paul Copplestone from Supabase on The Changelog I asked him what a cloud native Postgres would look like. He replied,
Decoupled compute and storage. So the idea is that you should be able to attach the compute part of it to a storage, hopefully like an infinite storage; you know, anything that is infinitely scalable. If you can do this, and in particular, if the compute can start up very fast, maybe in, say, a hundred milliseconds via some sort of HTTP response, then that’s cloud native, yeah.
Enter Neon, which sells itself as:
The multi-cloud fully managed Postgres with a generous free tier. We separated storage and compute to offer autoscaling, branching, and bottomless storage.
Behind this new business from MemSQL co-founder Nikita Shamgunov is the linked (Apache-licensed) server written in Rust.
CloudNativePG is distributed under the Apache License 2.0, and is now owned and governed by a newly formed community of contributors to the project, built on solid principles and values inspired by the Cloud Native Computing Foundation (CNCF).
Among these are openness, fairness, inclusivity, technical excellence, “community over product/company,” built-in quality and built-in security. I’m part of the initial group of maintainers of the project, currently made up by the top six committers of the project within EDB.
If you’ve been following Ship It’s Kaizen episodes, you know we had troubles running Postgres in K8s and recently moved to a managed (sorta) database. I wonder if Gerhard will be tempted to try this operator out anyhow…
Train and deploy industry-standard or completely custom machine learning models, directly powered by your business data, into your production stack, with an open source Postgres extension.
Is there anything Postgres can’t do?! 😉
Hot off the press from our friend Ben Johnson:
Postlite is a network proxy to allow access to remote SQLite databases over the Postgres wire protocol. This allows GUI tools to be used on remote SQLite databases which can make administration easier.
The proxy works by translating Postgres frontend wire messages into SQLite transactions and converting results back into Postgres response wire messages. Many Postgres clients also inspect the
pg_catalogto determine system information so Postlite mirrors this catalog by using an attached in-memory database with virtual tables. The proxy also performs minor rewriting on these system queries to convert them to usable SQLite syntax.
It appears that Ben is taking code contributions this time around, but only for bug & documentation fixes.
Where by “auditing” they mean change tracking, not security testing:
Auditing is particularly useful for historical analysis. To demonstrate, imagine you have a
userstable that tracks when a user is online. You might add a
statuscolumn which can have one of two values:
offline. How would you track how long a user is online for throughout an entire month? An auditing system would track every change with timestamps, and so you can measure the difference between each timestamp and sum them up for the entire month.
What really impresses me about the Supabase team is how much they’re leaning in to Postgres’ rich feature set to build high-quality-yet-low-level abstractions for their users. Paul Copplestone emphasized this when we had him on the show back in January.
Schema changes are usually critical operations to perform on a high volume database. One thing off, and you are looking at an outage.
pg-osc makes it easy and safe to run any
ALTER statement on a production database table with no locking.
One of the things we discussed with Paul Copplestone from Supabase was what, exactly, might a cloud native Postgres look like? Well, perhaps it will look like OrioleDB:
A new storage engine for PostgreSQL, bringing a modern approach to database capacity, capabilities and performance to the world’s most-loved database platform.
OrioleDB consists of an extension, building on the innovative table access method framework and other standard Postgres extension interfaces. By extending and enhancing the current table access methods, OrioleDB opens the door to a future of more powerful storage models that are optimized for cloud and modern hardware architectures.
If learning from incidents grabbed your attention, here’s a good one we can all learn from:
It was 5:17pm today, just as I was wrapping up work for the day, and my manager pinged me with the following chat:
<manager>: Hi Jeremy - we have a <other team> ticket - escalated to <leader>, <leader>, etc. <principal> is on trying to advise as well. Are you available this evening if needed for diagnostics? <coworker> is on the call now
No obligation; just checking in to see what my availability is. Quickly thinking it over – I didn’t have any plans tonight, nothing in particular on my agenda. Why not? If I can help then someone else on the team won’t have to, and I don’t have anything better to do tonight.
The game is afoot. It might be a long night…
PgBouncer is a popular connection pooler for Postgres. PgCat is that same functionality, only in Rust. This is alpha software and there is no worse place to run alpha software (IMHO) than in places that have direct connections to your database. Still, this is worth linking to because it’s yet another instance of open source infrastructure code getting Rusty.
This post by Shayon Mukherjee is in response to the recently logged post by Mike Coutermarsh in praise of MySQL for infra folks:
This is not a MySQL vs PostgreSQL post. This is just a small summary of what I have come to appreciate about PostgreSQL as an Infrastructure Engineer.
Always good to keep in mind whenever comparing technologies: trade-offs abound and certain people in certain situations will value certain things about different technologies differently. That’s for certain. 😉
From Mike Coutermarsh:
For years I’ve been noticing this pattern of infrastructure engineers I really respect preferring MySQL and product engineers preferring Postgres. It took quite a while for me to understand it. Especially coming from my background as a product engineer. Infrastructure engineers generally…
Read on to hear why (from Mike’s perspective).
This post covers 5 cases where the author uses the JSON data type for column definitions:
- Dumping request data for later processing
- Supporting extra fields
- Some one-to-many relationships
- Key-value use cases
- Simpler EAV design
Craig Kerstiens has also weighed in on this topic for us.
Reshape… automatically handles complex migrations that would normally require downtime or manual multi-step changes. During a migration, Reshape ensures both the old and new schema are available at the same time, allowing you to gradually roll out your application.
Designed for Postgres 12+ and experimental for now. Do not use in production until it shapes up.
we host free-tier projects on VMs with 1 GB of memory. After tallying the resources reserved for PostgreSQL, PostgREST, Kong, GoTrue, and a handful of smaller services, we were left with a total memory budget of … 0 MB 😬. Unsurprisingly, our pathological memory target disqualified any option that required launching another process in those VMs.
By their estimates, this effort will reduces the platform’s memory requirements by 525 TB/hours every month. Seems worth it!
Mitchell Hashimoto is partaking in Advent of Code this year and he’s added an extra facet to the challenge: implementing each solution with a single SQL statement.
I’m not an expert at SQL (or PostgreSQL) by any means. One of the reasons I decided to use SQL was to regain some proficiency while also learning some new concepts. I expect that many of my solutions are suboptimal and there are likely much better ways (by various definitions) to reach the same answer. I’d love to hear about those (make an issue) but I won’t be merging any changes since this represents my approach.
Sounds like a fun challenge! But how does he find the time? By timeboxing each problem to just 30 minutes.
The team’s goal is to become “the de-facto open-source alternative to MongoDB.” Here’s why:
MongoDB is a life-changing technology for many developers, empowering them to build applications faster than using relational databases. Its easy-to-use and well-documented drivers make MongoDB one of the easiest to use database solutions available. However, MongoDB abandoned its open-source roots, changing the license to SSPL - making it unusable for many open source and commercial projects.
Most MongoDB users are not in need of many of the advanced features offered by MongoDB; however, they are in need of an open-source database solution. Recognizing this, MangoDB is here to fill the gap by providing an alternative.
This is currently a proof of concept. The concept: a proxy that converts Mongo’s wire protocol to SQL and stores everything in PostgreSQL. Fascinating idea! Will it work?
Earlier this year, we took Notion down for five minutes of scheduled maintenance. While our announcement gestured at “increased stability and performance,” behind the scenes was the culmination of months of focused, urgent teamwork: sharding Notion’s PostgreSQL monolith into a horizontally-partitioned database fleet.
This is a deep-dive with lots of lessons along the way.
Kubegres is fully integrated with Kubernetes’ lifecycle as it runs as an operator written in Go. It is minimalist in terms of codebase compared to other open-source Postgres operators. It has the minimal and yet robust required features to manage a cluster of PostgreSql on Kubernetes. We aim keeping this project small and simple.
We’ve struggled with running Postgres inside K8s around these parts. Maybe this would’ve helped?
Craig Kerstiens on the little wins coming in Postgres 14, which is scheduled for the end of September:
Postgres is, and for some time will continue to be, the first database I turn to. As Postgres focuses on the little things, it just deepens my commitment to it. Why look elsewhere when the bond just grows over time? So today I wanted to call some extra attention to those little things, the ones that don’t get the spotlight, but simply make a developer’s life better.
He goes on to highlight the JSON syntax improvements, read-only roles, what’s new in
psql, and more.
This is a ~15 minute presentation (with transcript) by Álvaro Hernández at a Data on Kubernetes Community event about why he believes Kubernetes solves a big problem with running Postgres in production.
Running a Postgres installation, with or without containers, is trivial. However, setting up a production environment is a whole different matter. Postgres is not by itself a production-ready software: it requires a set of side tools to complement its functionality: connection pooling, monitoring, backup tools, high availability software, you name it. This is called the “Stack Problem”. This brief talk discusses the Stack Problem, understanding how Kubernetes is the platform that best solves it, and what the main advantages (and disadvantages!) are of running Postgres on Kubernetes.
Not only does Drew DeVault have a lot of insightful and interesting things to say about Postgres, I’m just generally a fan of people praising good software. Many people are quick to offer criticism, but slow to heap some praise. Let’s change that, maybe?
After 25 years of persistence, and a better logo design, Postgres stands today as one of the most significant pillars of profound achievement in free software, alongside the likes of Linux and Firefox. PostgreSQL has taken a complex problem and solved it to such an effective degree that all of its competitors are essentially obsolete, perhaps with the exception of SQLite.
For even more Postgres praise, listen to our episode of The Changelog with Craig Kerstiens.