Databases Icon

Databases

Databases, structured data, data stores, etc.
123 Stories
All Topics

Databases github.com

PRQL – a modern language for transforming data

PRQL (pronounced “Prequel”) aims to be “a simpler and more powerful SQL”

Like SQL, it’s readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it transpiles to SQL.

To get an idea on PRQL’s design, they provide this SQL statement as an example:

SELECT TOP 20
    title,
    country,
    AVG(salary) AS average_salary,
    SUM(salary) AS sum_salary,
    AVG(salary + payroll_tax) AS average_gross_salary,
    SUM(salary + payroll_tax) AS sum_gross_salary,
    AVG(salary + payroll_tax + benefits_cost) AS average_gross_cost,
    SUM(salary + payroll_tax + benefits_cost) AS sum_gross_cost,
    COUNT(*) as count
FROM employees
WHERE salary + payroll_tax + benefits_cost > 0 AND country = 'USA'
GROUP BY title, country
ORDER BY sum_gross_cost
HAVING count > 200

And then translate it to PRQL, which looks like:

from employees
filter country = "USA"                           # Each line transforms the previous result.
let gross_salary = salary + payroll_tax          # This _adds_ a column / variable.
let gross_cost   = gross_salary + benefits_cost  # Variables can use other variables.
filter gross_cost > 0
aggregate by:[title, country] [                  # `by` are the columns to group by.
    average salary,                              # These are the calcs to run on the groups.
    sum     salary,
    average gross_salary,
    sum     gross_salary,
    average gross_cost,
    sum     gross_cost,
    count,
]
sort sum_gross_cost                              # Uses the auto-generated column name.
filter count > 200
take 20

Changelog Interviews Changelog Interviews #476

Supabase is all in on Postgres

This week Paul Copplestone, CEO of Supabase joined us to catch us up on the next big thing happening in the world of Postgres. Supabase might be best known as “the open source Firebase alternative,” a tagline they might be reluctant to maintain. But from Adam’s perspective, he’s never been more excited about what they’re bringing to market for Postgres fans. In the last year, Supabase has gone from 0 to more than 80,000 databases on their platform — and they’re still in beta…and it’s open source. Hopefully today’s show sheds some light on why everyone is talking about Supabase.

Founders Talk Founders Talk #85

Making the last database you’ll ever need

This week Adam is joined by Sam Lambert, CEO of PlanetScale. Now that PlanetScale is in general availability, Adam had to get Sam on the show to talk about the behind the scenes of building this database platform, how this is the last database you’ll ever need and what that means for developers, why serverless, its open source underpinnings with Vitess, and a preview of what’s to come.

Konrad kolaente.dev

Simple, zero-fuss docker database backups

Back in the olden days, I would just put a mysqldump > dump.sql in a crontab and called it a day. When I started to host more and more stuff with docker, I first just migrated that approach to docker and put it all in a container. That still required me to mess around with config files. Once I started to host postgres containers it all got even more complicated. Thus, I needed a new solution.

I built this tool to make backups easy: Simply point it to a host running docker containers and it will automatically inspect and find all mysql/mariadb and postgres containers and do backups of them on a schedule. No configuration required, it “just works”.

Databases github.com

A collaborative IDE for your databases, right in your browser

Slashbase is an open-source collaborative IDE for your databases in your browser. Connect to your database, browse data, run a bunch of SQL commands or share SQL queries with your team, right from your browser!

It’s written in Golang and Nextjs React Framework (SPA) and runs as a single Linux binary with PostgreSQL. Documentation is currently WIP.

It’s early days and security will be a major concern to get right, but this has a lot of potential to unlock some cool use cases.

Simon Eskildsen sirupsen.com

Careful trading complexity for 'improvements'

Simon Eskildsen (of napkin math) shares a word of warning about one possible decision-making trap:

Whenever you find yourself arguing for improving infrastructure by yanking up complexity, you need to be very careful.

He applies this thinking to a common technical proposal of switching from a general-purpose RDBMS to a specialty database to account for growth and scale.

I’m a proponent of mastering and abusing existing tools, rather than chasing greener pastures. The more facility you gain with first-principle reasoning and napkin math, the closer I’d wager you’ll inch towards this conclusion as well. A new system theoretically having better guarantees is not enough of an argument. Adding a new system to your stack is a huge deal and difficult to undo.

Databases rachelbythebay.com

A terrible schema from a clueless programmer

Rachel by the Bay:

There’s a post going around tonight about how someone forgot to put an index on some database thing and wound up doing full table scans (or something like that). The rub is that instead of just being slow, it also cost a fair amount of money because this crazy vendor system charged by the row or somesuch. So, by scanning the whole table, they touched all of those rows, and oh hey, massive amounts of money just set ablaze!

The usual venues are discussing it, and I get the impression some people have the wrong approach to this. I want to describe a truly bad database schema I encountered, and then tell you a little about what it did to the system performance.

A fun story with an excellent twist at the end.

Databases pradeepchhetri.xyz

ClickHouse vs TimescaleDB

Two up-and-coming database options compared:

Recently, TimescaleDB published a blog comparing ClickHouse & TimescaleDB using timescale/tsbs, a timeseries benchmarking framework. I have some experience with PostgreSQL and ClickHouse but never got the chance to play with TimescaleDB. Some of the claims about TimescaleDB made in their post are very bold, that made me even more curious. I thought it’d be a great opportunity to try it out and see if those claims are really true.

Founders Talk Founders Talk #82

Journey to CEO, again

Today Adam is joined by Evan Kaplan, CEO of InfluxData. Evan’s journey to become the CEO was not by way of founder, in this company. Evan has founded several companies in the past, and he’s been in a CEO position for more than 22 years. But InfluxData was founded by Paul Dix, and Paul knew years ago that his role (best role?) was to lead the technical and product direction of the company, which lead him to Evan. Today we share that story as well as a glimpse into operating the business that built the defacto platform for building time series applications with deep roots in open source.

Databases simplethread.com

Relational databases aren’t dinosaurs, they’re sharks

I’ve heard way less people throwing SQL under the bus than I did back in the high-hype NoSQL days, but this article by Justin Etheredge does a good job of laying out some of the advantages and disadvantages of the RDBMS side of the fence:

The next time you hear someone describe relational databases as yesterday’s technology, or the next time you see someone assume a relational database can’t handle the needs of their unproven MVP, stop and ask them how they are going to account for the tradeoffs they’re making. Make sure they understand they aren’t skipping a dead dinosaur, they’re taking a pass on the thousands of human-years of effort that have made relational databases the sharks of the data industry.

Changelog Interviews Changelog Interviews #461

Fauna is rethinking the database

This week we’re talking with Evan Weaver about Fauna — the database for a new generation of applications. Fauna is a transactional database delivered as a secure and scalable cloud API with native GraphQL. It’s the first implementation of its kind based on the Calvin paper as opposed to Spanner. We cover Evan’s history leading up to Fauna, deep details on the Calvin algorithm, the CAP theorem for databases, what it means for Fauna to be temporal native, applications well suited for Fauna, and what’s to come in the near future.

Startups clickhouse.com

Introducing ClickHouse, Inc.

Alexey Milovidov, announcing the formation of a (VC funded) corporation around ClickHouse, an open source analytics DBMS:

Today I’m happy to announce ClickHouse Inc., the new home of ClickHouse. The development team has moved from Yandex and joined ClickHouse Inc. to continue building the fastest (and the greatest) analytical database management system. The company has received nearly $50M in Series A funding led by Index Ventures and Benchmark with participation by Yandex N.V. and others. I created ClickHouse, Inc. with two co-founders, Yury Izrailevsky and Aaron Katz. I will continue to lead the development of ClickHouse as Chief Technology Officer (CTO), Yury will run product and engineering, and Aaron will be CEO.

ClickHouse wasn’t always a business. It also wasn’t always open source.

Making ClickHouse open source was also not an easy decision, but now I see: doing open source is hard, but it is a big win. While it takes a tremendous effort and responsibility to maintain a popular open-source product, for us, the benefits outweigh all the costs. Since we published ClickHouse, it has been deployed in production in thousands of companies across the globe for a wide range of use cases, from agriculture to self-driving cars.

Ahmed github.com

Deploy databases and services easily for dev and testing pipelines

Peanut provides a REST API, Admin Dashboard and a command line tool to deploy and configure the commonly used services like databases, message brokers, graphing, tracing, caching tools … etc. It perfectly suited for development, manual testing, automated testing pipelines where mocking is not possible and test drives.

Under the hood, it works with the containerization runtime like docker to deploy and configure the service. Destroy the service if it is a temporary one.

Technically you can achieve the same with a bunch of yaml files or using a configuration management tool or a package manager like helm but peanut is pretty small and fun to use & should speed up your workflow!

Deploy databases and services easily for dev and testing pipelines

Databases 1729.com

The billion user table

Jon Stokes believes blockchain tech has the opportunity to take us from a world where individual corporations build their siloed users tables to a world where the entire Internet shares a single users table.

In place of a decentralized network of user data silos connected by APIs, there’s a single decentralized user data store accessible via an open protocol and a decentralized network of storage nodes. So the identity-hosting blockchain represents decentralization at the datastore implementation layer, and recentralization at the datastore access layer.

What would this produce? Jon envisions this:

Moving identity on-chain, and thereby removing the possibility of users-table-centric network effects, completely up-ends the entire landscape of API-based, access-controlled interoperability that the present Internet is built on. All of the non-technical market and political dynamics around users table size, leverage, and risk suddenly go out the window.

Databases github.com

toyDB – a distributed SQL db written in Rust

This is not a use-it-in-the-real-world kinda thing. It’s being written as a learning project, but may interest you if you want to learn about database internals. It includes:

  • Raft-based distributed consensus engine for linearizable state machine replication.
  • ACID-compliant transaction engine with MVCC-based snapshot isolation.
  • Pluggable storage engine with B+tree and log-structured backends.
  • Iterator-based query engine with heuristic optimization and time-travel support.
  • SQL interface including projections, filters, joins, aggregates, and transactions.

Databases sqlbolt.com

SQLBolt – quickly learn SQL right in your browser

This series of interactive lessons and exercises is a great place to start if you want to learn SQL. And trust me: if you don’t know SQL, you want to learn SQL. Of all the technologies and tools I’ve picked up over the course of my career, SQL has had one of the highest ROIs. It’s portable across languages/runtimes and has incredible staying power in terms of skill relevancy.

Practical AI Practical AI #139

Vector databases for machine learning

Pinecone is the first vector database for machine learning. Edo Liberty explains to Chris how vector similarity search works, and its advantages over traditional database approaches for machine learning. It enables one to search through billions of vector embeddings for similar matches, in milliseconds, and Pinecone is a managed service that puts this capability at the fingertips of machine learning practitioners.

Go github.com

A high-performance, columnar, in-memory storage engine for Go

The general idea is to leverage cache-friendly ways of organizing data in structures of arrays (SoA) otherwise known “columnar” storage in database design. This, in turn allows us to iterate and filter over columns very efficiently. On top of that, this package also adds bitmap indexing to the columnar storage, allowing to build filter queries using binary and, and not, or and xor (see kelindar/bitmap with SIMD support).

Petr Stribny stribny.name

Scaling relational SQL databases

When it comes to scaling, we might need to think about:

  • data storage, if we store more and more data and it becomes expensive or slow working with them
  • fast INSERTs and UPDATES for write-heavy workloads
  • making SELECT queries faster because of their complexity or because they need to query huge amounts of data
  • concurrency if we have many clients interacting with the database

In this article, I will present some basic ideas and starting points on scaling traditional SQL databases.

SQLite unixsheikh.com

SQLite is the only database you will ever need in most cases

SQLite is so hot right now.

Even if you start out small and later need to upscale, as long as your web application can run on the same machine as the database, which it can in 99% of the time, you can just upgrade the hardware to a beefier machine and keep business as usual.

The only time you need to consider a client-server setup is…

Jerod Santo changelog.com/posts

You might as well timestamp it

In my 15+ years of web development, there are very few things I can say are unequivocally a good idea. It almost always does depend.

Storing timestamps instead of booleans, however, is one of those things I can go out on a limb and say it doesn’t really depend all that much. You might as well timestamp it. There are plenty of times in my career when I’ve stored a boolean and later wished I’d had a timestamp. There are zero times when I’ve stored a timestamp and regretted that decision.

Changelog Interviews Changelog Interviews #433

Open source, not open contribution

This week we’re talking with Ben Johnson. Ben is known for his work on BoltDB, his work in open source, and as a freelance Go developer. Late January when Ben open sourced his newest project Litestream in the readme he shared how the project was open source, but not open for contribution. His reason was to protect his mental health and the long term viability of the project. On this episode we talk with Ben about what that means, his thoughts on mental health and burnout in open source, choosing a license, and the details behind Litestream - a standalone streaming replication tool for SQLite.

Player art
  0:00 / 0:00