You can think of this like
curl but entirely in SQL:
select request_url, response_status, response_headers from http_get('http://httpbin.org/get');
The cool thing is you can save everything from the request: status code, headers, the body (of course), timestamps, and more. Great for archiving!
-- initialize a table create table snapshots as select * from http_get('https://changelog.com'); -- To add more rows later on insert into snapshots select * from http_get('https://changelog.com');
Ben Johnson is at it again.
LiteFS is a FUSE-based file system for replicating SQLite databases across a cluster of machines. It works as a passthrough file system that intercepts writes to SQLite databases in order to detect transaction boundaries and record changes on a per-transction level in LTX files.
I believe Ben was alluding to this (then unreleased) tool on his recent Ship It! appearance…
Miguel Grinberg built a SQLite-backed web app for the blogging team at Twilio to track their effectiveness that started to respond slugishly when it grew much bigger than he expected (6.5 million individual daily traffic records, and with a user base that grew to over 200 employees).
He thought this might be a good time to switch to Postgres, but he wasn’t sure if the wins he expected would be realized:
Having publicly professed my dislike of performance benchmarks, I resisted the urge to look up any comparisons online, and instead embarked on a series of experiments to accurately measure the performance of these two databases for the specific use cases of this application.
In this in-depth article on Twilio’s blog (no doubt being tracked by Miguel’s web app as we speak) he goes into the details of his efforts with lots of interesting findings along the way (even a plot twist!). I’ll leave you with this statement from the end:
If you are going to take one thing away from this article, I hope it is that the only benchmarks that are valuable are those that run on your own platform, with your own stack, with your own data, and with your own software. And even then, you may need to add custom optimizations to get the best performance.
Simon Willison figured out how to run a SQL query directly against a CSV file using the
sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' \ 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count'
Ie TL;DR’d the one-liner for ya, but you’ll have to go to Simon’s site for the explainer.
Many organizations such as Commoncrawl, WebRecorder, Archive.org and libraries around the world, use the warc format to archive and store web data.
The full datasets of these services range in the few pebibytes(PiB), making them impractical to query using non-distributed systems.
This project aims to make subsets such data easier to access and query using SQL.
Crawl a site with
wget and import it into WarcDB:
wget --warc-file changelog "https://changelog.com" warcdb import archive.warcdb changelog.warc.gz
Then you can query away using SQL, such as this one to get all response headers:
sqlite3 archive.warcdb <<SQL select json_extract(h.value, '$.header') as header, json_extract(h.value, '$.value') as value from response, json_each(http_headers) h SQL
I won’t call SQLite’s current moment a comeback, because the most used database engine in the world doesn’t have anything to come back from. I’m going with “renaissance”, because despite its already mass adoption, there has been something of a rebirth of interest from one software sector that had previously relegated it to dev & test environments: web apps
Richard Hipp, announcing SQLite version 3.39.0 in their forums:
If you prefer, you can experiment with the latest 3.39.0 code directly in your web-browser and without having to download or install anything by visiting sqlite.org/fiddle/. (The page name is by analogy to sites like “JSFiddle” and “SQLFiddle”.)
The page runs a WASM-compiled copy of a recent 3.39.0 build in a sandbox in your web browser. Using controls on that page, you can import database files from your local machine (assuming they aren’t too big), work with them, then export any changes back to your desktop.
David Crawshaw gave an excellent talk (at Go Northwest in 2018) on using SQLite (so hot right now) together with Go and how enjoyable/productive the combo is. Quite worthy of its ~30 minutes runtime.
Anton Zhiyanov shows how you can store non-normalized data in SQLite and use virtual columns to query against it without the massive performance penalty that
json_extract would otherwise incur.
Hot off the press from our friend Ben Johnson:
Postlite is a network proxy to allow access to remote SQLite databases over the Postgres wire protocol. This allows GUI tools to be used on remote SQLite databases which can make administration easier.
The proxy works by translating Postgres frontend wire messages into SQLite transactions and converting results back into Postgres response wire messages. Many Postgres clients also inspect the
pg_catalogto determine system information so Postlite mirrors this catalog by using an attached in-memory database with virtual tables. The proxy also performs minor rewriting on these system queries to convert them to usable SQLite syntax.
It appears that Ben is taking code contributions this time around, but only for bug & documentation fixes.
JSON support in SQLite has been in the works for awhile now, but prior to this release you had to opt in to it. Now it’s on by default and this is a great rundown of using the query features.
SQLite and Python strike me as particularly complementary tools.
This hit my radar a few times in the past year – this week, and then most recently a few months back when we had Richard Hipp on The Changelog (again) – but, I didn’t post it to the newsfeed.
Here’s what’s interesting about the SQLite test suite – it’s their secret sauce…the sustaining enablement of building a support business around SQLite. Here’s a direct quote from Richard Hipp in that episode.
Originally we thought we were gonna sell this and make money from it, and that’s how we were gonna support ongoing development. That didn’t really play out, nobody ever bought it. It does sort of become our business value, our intellectual property. I mean, you can take the SQLite code and fork it and start your own thing…but you don’t have the full test suite. You’ve got a lot of tests, but not all of them. So we’ve got a little bit of advantage over you there.
Click here to play that episode from this quote.
Curl maintainer Daniel Stenberg:
We can’t know for sure which products are on the top list of the most widely deployed software components. There’s no method for us to count or estimate these numbers with a decent degree of certainty. We can only guess and make rough estimates – and it also depends on exactly what we count. And quite probably also depending on who‘s doing the counting.
He goes through his process of trying to determine contenders, and ends up with zlib, qlite, and libcurl. Other have added OpenSSL, expat and the Linux kernel to that list.
This week, Richard Hipp returns to catch us up on all things SQLite, his single file webserver written in C called Althttpd, and Fossil – the source code manager he wrote and uses to manage SQLite development instead of Git.
Where by “on the web” James means “in the browser”:
SQL is a great way to build apps. Especially small local web apps. Key/value stores may have their place in large distributed systems, but wow wouldn’t it be great if we could use SQLite on the web?
I’m excited to announce absurd-sql which makes this possible. absurd-sql is a filesystem backend for sql.js that allows SQLite to read/write from IndexedDB in small blocks, just like it would a disk. I ported my app to use and you can try it here.
A very cool project that was inspired by phiresky’s Hosting SQLite databases on Github Pages.
A proof-of-concept inspired and enabled by Hosting SQLite Databases on GitHub Pages and the ensuing Hacker News post. The compiled single-page app supports autocomplete for titles, automatic redirecting & other MediaWiki datasets like WikiQuote or Chinese Wikipedia. It makes no external API calls except to get Wikipedia’s images.
2MB of CSS, JS, and WASM. One 43GB SQLite file.
DenoDB has a fully-typed API (which is great for editor integration) and supports a whole host of backends: MySQL/Maria, SQLite, Postgres, and MongoDB.
Broad database support is great for library adoption, but as a user I’d prefer something that leans in to a specific ecosystem, which usually lets you squeeze more out of it.
Regardless of that, it’s great to see the Deno community building foundational tools like this.
Althttpd is a simple webserver that has run the sqlite.org website since 2004. Althttpd strives for simplicity, security, and low resource usage.
As of 2018, the althttpd instance for sqlite.org answers about 500,000 HTTP requests per day (about 5 or 6 per second) delivering about 50GB of content per day (about 4.6 megabits/second) on a $40/month Linode. The load average on this machine normally stays around 0.1 or 0.2. About 19% of the HTTP requests are CGI to various Fossil source-code repositories.
Richard has a knack for creating simple, high quality tools. When we did our (now legendary) show with him back in 2016, he was quite keen on coming back at some point to discuss Fossil. Should we make that happen?
This is a project in Go that compiles to a SQLite runtime loadable extension, which brings Redis commands into a SQL context. This allows you to write SQL queries against data in a Redis instance, using Redis commands like
LRANGEas SQL functions.
Experimental for now. But why? Patrick says:
In general, Redis is fairly accessible from many programming languages, and any query using reqlite could probably be implemented in a language of your choice using a Redis client. However, sometimes declarative SQL can be a better choice to express what you’re looking for, and that’s where this project may be most useful.
The benefits of such a setup are numerous, especially for small sites and side projects:
Hosting a static website is much easier than a “real” server - there’s many free and reliable options (like GitHub, GitLab Pages, Netlify, etc), and it scales to basically infinity without any effort.
The how is also super interesting:
So how do you use a database on a static file hoster? Firstly, SQLite (written in C) is compiled to WebAssembly. SQLite can be compiled with emscripten without any modifications, and the sql.js library is a thin JS wrapper around the wasm code.
There’s more to the story, and the resulting solution is also open source.
Even if you start out small and later need to upscale, as long as your web application can run on the same machine as the database, which it can in 99% of the time, you can just upgrade the hardware to a beefier machine and keep business as usual.
The only time you need to consider a client-server setup is…
Anton Zhiyanov lays out why he thinks SQLite is awesome:
- SQLite is the most common DBMS in the world, shipped with all popular operating systems.
- SQLite is serverless.
- For developers, SQLite is embedded directly into the app.
- For everyone else, there is a convenient database console (REPL), provided as a single file (sqlite3.exe on Windows, sqlite3 on Linux / macOS).
This echoes Ben Johnson’s sentiments in our recent conversation with him about Litestream. If you consider SQLite as merely a good databse for things that don’t matter all that much, maybe it’s time to reconsider…
This week we’re talking with Ben Johnson. Ben is known for his work on BoltDB, his work in open source, and as a freelance Go developer. Late January when Ben open sourced his newest project Litestream in the readme he shared how the project was open source, but not open for contribution. His reason was to protect his mental health and the long term viability of the project. On this episode we talk with Ben about what that means, his thoughts on mental health and burnout in open source, choosing a license, and the details behind Litestream - a standalone streaming replication tool for SQLite.