Databases Icon

Databases

Databases, structured data, data stores, etc.
83 Stories
All Topics

Databases github.com

toyDB – a distributed SQL db written in Rust

This is not a use-it-in-the-real-world kinda thing. It’s being written as a learning project, but may interest you if you want to learn about database internals. It includes:

  • Raft-based distributed consensus engine for linearizable state machine replication.
  • ACID-compliant transaction engine with MVCC-based snapshot isolation.
  • Pluggable storage engine with B+tree and log-structured backends.
  • Iterator-based query engine with heuristic optimization and time-travel support.
  • SQL interface including projections, filters, joins, aggregates, and transactions.

Databases sqlbolt.com

SQLBolt – quickly learn SQL right in your browser

This series of interactive lessons and exercises is a great place to start if you want to learn SQL. And trust me: if you don’t know SQL, you want to learn SQL. Of all the technologies and tools I’ve picked up over the course of my career, SQL has had one of the highest ROIs. It’s portable across languages/runtimes and has incredible staying power in terms of skill relevancy.

Practical AI Practical AI #139

Vector databases for machine learning

Pinecone is the first vector database for machine learning. Edo Liberty explains to Chris how vector similarity search works, and its advantages over traditional database approaches for machine learning. It enables one to search through billions of vector embeddings for similar matches, in milliseconds, and Pinecone is a managed service that puts this capability at the fingertips of machine learning practitioners.

Go github.com

A high-performance, columnar, in-memory storage engine for Go

The general idea is to leverage cache-friendly ways of organizing data in structures of arrays (SoA) otherwise known “columnar” storage in database design. This, in turn allows us to iterate and filter over columns very efficiently. On top of that, this package also adds bitmap indexing to the columnar storage, allowing to build filter queries using binary and, and not, or and xor (see kelindar/bitmap with SIMD support).

Petr Stribny stribny.name

Scaling relational SQL databases

When it comes to scaling, we might need to think about:

  • data storage, if we store more and more data and it becomes expensive or slow working with them
  • fast INSERTs and UPDATES for write-heavy workloads
  • making SELECT queries faster because of their complexity or because they need to query huge amounts of data
  • concurrency if we have many clients interacting with the database

In this article, I will present some basic ideas and starting points on scaling traditional SQL databases.

SQLite unixsheikh.com

SQLite is the only database you will ever need in most cases

SQLite is so hot right now.

Even if you start out small and later need to upscale, as long as your web application can run on the same machine as the database, which it can in 99% of the time, you can just upgrade the hardware to a beefier machine and keep business as usual.

The only time you need to consider a client-server setup is…

Jerod Santo changelog.com/posts

You might as well timestamp it

In my 15+ years of web development, there are very few things I can say are unequivocally a good idea. It almost always does depend.

Storing timestamps instead of booleans, however, is one of those things I can go out on a limb and say it doesn’t really depend all that much. You might as well timestamp it. There are plenty of times in my career when I’ve stored a boolean and later wished I’d had a timestamp. There are zero times when I’ve stored a timestamp and regretted that decision.

The Changelog The Changelog #433

Open source, not open contribution

This week we’re talking with Ben Johnson. Ben is known for his work on BoltDB, his work in open source, and as a freelance Go developer. Late January when Ben open sourced his newest project Litestream in the readme he shared how the project was open source, but not open for contribution. His reason was to protect his mental health and the long term viability of the project. On this episode we talk with Ben about what that means, his thoughts on mental health and burnout in open source, choosing a license, and the details behind Litestream - a standalone streaming replication tool for SQLite.

Founders Talk Founders Talk #75

The journey to massive scale and ultra-resilience

This week Adam talks with Spencer Kimball, CEO and Co-founder of Cockroach Labs — makers of CockroachDB an open source cloud-native distributed SQL database. Cockroach Labs recently raised $160 million dollars on a $2 billion dollar valuation. In this episode, Spencer shares his journey in open source, startups and entrepreneurship, and what they’re doing to build CockroachCloud to meet the needs of applications that require massive scale and ultra-resilience.

Lawrence Hecht The New Stack

ClickHouse has rapidly rivaled other open source databases in active contributors

Lawrence Hecht:

ClickHouse has come out of seemingly nowhere to rival Elasticsearch as the database-related open source software project with the most active contributors…

ClickHouse is column-oriented and allows for analytics reports to be generated using SQL queries in real-time. ClickHouse’s rise in popularity began in 2016, which happens to be when Apache Spark’s peak.

I first heard of ClickHouse last year when I learned that our friends at Plausible use it for their analytics backend (teamed with Postgres for relational data).

ClickHouse has rapidly rivaled other open source databases in active contributors

Databases github.com

Dolt – it's Git for data

Imagine a world where Git and MySQL got together and had a baby. They would name that baby, Dolt.

Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a git repository. Connect to Dolt just like any MySQL database to run queries or update the data using SQL commands. Use the command line interface to import CSV files, commit your changes, push them to a remote, or merge your teammate’s changes.

All the commands you know for Git work exactly the same for Dolt. Git versions files, Dolt versions tables.

The authors also created DoltHub where you can host and share your Dolt databases.

The Changelog The Changelog #429

Community perspectives on Elastic vs AWS

This week we’re talking about the recent falling out between Elastic and AWS around the relicensing of Elasticsearch and Kibana. Like many in the community, we have been watching this very closely.

Here’s the tldr for context. On January 21st, Elastic posted a blog post sharing their concerns with Amazon/AWS misleading and confusing the community, saying “They have been doing things that we think are just NOT OK since 2015 and it has only gotten worse.” This lead them to relicense Elasticsearch and Kibana with a dual license, a proprietary license and the Sever Side Public License (SSPL). AWS responded two days later stating that they are “stepping up for a truly open source Elasticsearch,” and shared their plans to create and maintain forks of Elasticsearch and Kibana based on the latest ALv2-licensed codebases.

There’s a ton of detail and nuance beneath the surface, so we invited a handful of folks on the show to share their perspective. On today’s show you’ll hear from: Adam Jacob (co-founder and board member of Chef), Heather Meeker (open-source lawyer and the author of the SSPL license), Manish Jain (founder and CTO at Dgraph Labs), Paul Dix (co-founder and CTO at InfluxDB), VM (Vicky) Brasseur (open source & free software business strategist), and Markus Stenqvist (everyday web dev from Sweden).

Brad Fitzpatrick tailscale.com

An unlikely database migration

So the Tailscale team were using a single text file as a database (as you do) and it worked great… until it didn’t.

Even with fast NVMe drives and splitting the database into two halves (important data vs. ephemeral data that we could lose on a tmpfs), things got slower and slower. We knew the day would come. The file reached a peak size of 150MB and we were writing it as quickly as the disk I/O would let us. Ain’t that just peachy?

So, migrate to MySQL or PostgreSQL, right? Maybe SQLite?

Nope, Crawshaw had other ideas.

I won’t ruin the surprise and tell you what they went with, but I will say it’s a widely deployed system amongst cloud natives…

Chua Bok Woon github.com

sq is a code-generated, type safe query builder and struct mapper for Go

From reading through the README, this seems like a nice balance between a full-blown ORM and hand-rolling all your own SQL. For example, this point from the The mapper function is the SELECT clause. section:

In sq whatever you SELECT is automatically mapped. This means you just have to write your query, execute it and if there were no errors, the data is already in your Go variables. No iterating rows, no specifying column scan order, no error checking three times. Write your query, run it, you’re done.

Databases github.com

Graviton is like ZFS for key-value stores

Graviton Database is simple, fast, versioned, authenticated, embeddable key-value store database in pure Go… Every write is tracked, versioned and authenticated with cryptographic proofs. Additionally it is possible to take snapshots of database. Also it is possible to use simple copy,rsync commands for database backup even during live updates without any possibilities of database corruption.

Still in Alpha, but a lot of work has been done and there are features a-plenty.

Practical AI Practical AI #94

Operationalizing ML/AI with MemSQL

A lot of effort is put into the training of AI models, but, for those of us that actually want to run AI models in production, performance and scaling quickly become blockers. Nikita from MemSQL joins us to talk about how people are integrating ML/AI inference at scale into existing SQL-based workflows. He also touches on how model features and raw files can be managed and integrated with distributed databases.

Go github.com

A lightweight, high-speed immutable database for systems and applications

With immudb you can track changes in sensitive data in your transactional databases and then record those changes permanently in a tamperproof immudb database. This allows you to keep an indelible history of sensitive data, for example debit/credit card transactions.

There are so many options for storing data these days. If you haven’t heard Go Time’s excellent episode on databases yet, Jaana does a great job of explaining some of the trade-offs.

Go Time Go Time #132

The trouble with databases

Databases are tricky, especially at scale. In this episode Mat, Jaana, and Jon discuss different types of databases, the pros and cons of each, along with the many ways developers can have issues with databases. They also explore questions like, “Why are serial IDs problematic?” and “What alternatives are there if we aren’t using serial IDs?” while at it.

Jaana Dogan Medium

Things I wished more developers knew about databases

Jaana Dogan started with a draft and this tweet and ended up laying down some serious knowledge on databases.

A large majority of computer systems have some state and are likely to depend on a storage system. My knowledge on databases accumulated over time, but along the way our design mistakes caused data loss and outages. In data-heavy systems, databases are at the core of system design goals and tradeoffs. Even though it is impossible to ignore how databases work, the problems that application developers foresee and experience will often be just the tip of the iceberg.

CockroachDB openmymind.net

Migrating from Postgres to CockroachDB

This is a nice lessons learned post from one engineering team making a database switch.

Overall, I’m happy with how the effort turned out and with CockroachDB in general. Because it uses PostgreSQL’s wire protocol, existing PostgreSQL drivers should work as-is. But we did run into some challenges that are worth pointing out. Here’s a list of things you might want to consider…

I like the update at the end, which emphasizes the important of tests for making a switch of this magnitude:

The system that was migrated has solid tests and good coverage. While a lot of the differences we ran into are obvious (like lack of range types and triggers), others were more subtle (especially the odd on conflict behavior). Test coverage made a pretty significant impact in the speed of the migration and our confidence in pushing live.

0:00 / 0:00