Performance Icon


34 Stories
All Topics

 Itamar Turner-Trauring

10× faster database tests with Docker

Testing code that talks to the database can be slow. Fakes are fast but unrealistic. What to do? With a little help from Docker, you can write tests that run fast, use the real database, are easy to write and run. I tried Itamar’s technique on’s test suite and the 679 tests complete in ~17 seconds. The same tests run directly against Postgres complete in ~12 seconds. A net loss for me, but that may have something to do with how Docker for Mac works? I’d love to hear other people’s experiences.

read more

Bits and Pieces Icon Bits and Pieces

Understanding Service Workers and caching strategies

Solid tutorial on Service Workers: You can think of the service worker as someone who sits between the client and server and all the requests that are made to the server pass through the service worker. Basically, a middle man. Since all the request pass through the service worker, it is capable to intercept these requests on the fly.

read more


Optimising startup time of Prometheus 2.6.0 with pprof

Brian Brazil: The informal design goal of the Prometheus 2.x TSDB startup was that it should take no more than about a minute. Over the past few months there’s been reports of it taking quite a bit more than this, which is a problem if your Prometheus restarts for some reason. Almost all of that time is loading the WAL (write ahead log), which are the samples in the last few hours which have yet to be compacted into a block. I finally got a chance to dig into this at the end of October, and the outcome was PR#440 which reduced CPU time by 6.5x and walltime by 4x. Let’s look at how I arrived at these improvements. I’ve been meaning to get more familiar with pprof, the Go profiling tool, as my job revolves around working on and around Go microservices. My team has been able to see the impact of the Go experts who can quickly find issues buried in a stack of profiles collected on a service. Brian’s post is a great example of 1) identifying the an issue, 2) diagnosing said issue and 3) observing the implemented improvements using pprof. His parting paragraph is particularly insightful, specifically: I did spend quite a bit of time pouring over the code, and had several dead ends such as removing the call to NumSamples, doing reading and decoding in separate threads, and a few variants of how the processWALSamples sharding worked Profiling and optimization is a mix of knowing your codebase and being able to identifying false leads. A tool like pprof is invaluable when identifying both issues and improvements in a measurable way.

read more

Scott Jehl

Inlining or caching? Both please!

I was exploring patterns that enable the browser to render a page as fast as possible by including code alongside the initial HTML so that the browser has everything it needs to start rendering the page, without making additional requests. Our two go-to options to achieve this goal are inlining and server push (more on how we use those), but each has drawbacks: inlining prevents a file from being cached for reuse, and server push is still a bit experimental, with some browser bugs still being worked out. As I was preparing to describe these caveats, I thought, “I wonder if the new Service Worker and Caching APIs could enable caching for inline code.” I’ve been dabbling a bit with service workers over on Brightly Colored to improve the loading time, so this exploration of caching inline CSS is fascinating. In fact, I used to completely inline all the CSS on the site, but switched to a file request because of the way I thought service workers, well… worked. Surprisingly, this implementation doesn’t look too difficult.

read more


Guess.js - a toolkit for enabling data-driven user-experiences on the web

Our goal with Guess.js is to minimize your bundle layout configuration, make it data-driven, and much more accurate! In the end, you should lazy load all your routes and Guess.js will figure out which bundles to be combined together and what pre-fetching mechanism to be used! All this in less than 5 minutes setup time. That’s an excellent goal! But how will that work? During the build process, the GuessPlugin will fetch report from Google Analytics, build a model used for predictive pre-fetching and add a small runtime to the main bundle of your application. On route change, the runtime will query the generated model for the pages that are likely to be visited next and pre-fetch the associated with them JavaScript bundles. The tool was announced at Google I/O back in May, but as of today it’s still in alpha.

read more

Noa Gruman

Implementing a multi-CDN strategy? Here's everything you need to know.

There’s some seriously interesting thoughts shared here for building out a multi-CDN strategy. Having had issues with how to best use and leverage a CDN to get the best performance benefits, I can see how having a multi-CDN implementation would allow us to choose the right CDN for a given region of the world, as well as a whole host of other options based on things like cost, performance, and of course redundancy for when things go wrong. Murphy’s law, right? This summer, the 2018 World Cup set an all-time streaming record – tripling its own 2014 record – with over 22 Tbps measured by Akamai at peak, but the event wasn’t smooth sailing for everyone. In a highly competitive market, and in an age where streaming failures make headlines, redundancy and quality of experience have never been more crucial for content publishers. Drop a comment below if there are other resources out there on this subject that we should check out.

read more

Addy Osmani Medium

A Netflix web performance case study

Hold on to your seat! This is a deep dive on improving time-to-interactive for on the desktop. Addy Osmani writes on the Dev Channel for the Chromium dev team regarding performance tuning of They were trying to determine if React was truly necessary for the logged-out homepage to function. Even though React’s initial footprint was just 45kB, removing React, several libraries and the corresponding app code from the client-side reduced the total amount of JavaScript by over 200kB, causing an over-50% reduction in Netflix’s time-to-interactivity for the logged-out homepage. There’s more to this story, so dig in. Or, share your comments on their approach to reducing time-to-interactivity and if you might have done things differently.

read more


A high-performance PHP app server, load balancer, and process manager

RoadRunner is an open source (MIT licensed), high-performance PHP application server, load balancer and process manager. It supports running as a service with the ability to extend its functionality on a per-project basis. RoadRunner is written in Go, and can be used to replace the class Nginx+FPM setup, boasting “much greater performance”. I’d love to see some benchmarks. Better yet, I’d love to see someone use this in production for a bit and write up their experience.

read more

Nikita Prokopov

Software disenchantment (or, struggles with operating at 1% possible performance)

Nikita Prokopov has been programming for 15 years and has become quite frustrated with the industry’s lack of care for efficiency, simplicity, and excellence in software — to the point of depression. Only in software, it’s fine if a program runs at 1% or even 0.01% of the possible performance. Everybody just seems to be ok with it. Nikita cites some examples: …our portable computers are thousands of times more powerful than the ones that brought man to the moon. Yet every other webpage(s) struggles to maintain a smooth 60fps scroll on the latest top-of-the-line MacBook Pro. I can comfortably play games and watch 4K videos but not scroll web pages? How is it ok? Windows 10 takes 30 minutes to update. What could it possibly be doing for that long? That much time is enough to fully format my SSD drive, download a fresh build and install it like 5 times in a row. We put virtual machines inside Linux, and then we put Docker inside virtual machines, simply because nobody was able to clean up the mess that most programs, languages, and their environment produce. We cover shit with blankets just not to deal with it. “Single binary” is still a HUGE selling point for Go, for example. No mess == success. Do you share in Nikita’s position? Sure, be frustrated with performance (cause we all want, “go faster!”), but do you agree with his points beyond that? If so, read this and consider supporting him on Patreon.

read more

Dominic Tarr

Your web app is bloated

Using Firefox’s memory snapshot tool, Dominic Tarr measured the heap usage of a variety of web apps. The results are… not good. The biggest losers are Google properties followed closely by Slack (which is probably not a surprise). Fairing much better were GitHub (7.41MB), StackOverflow (2.55MB), and Wikipedia (1.73MB). What struck me most is that while modern Gmail is one of the worst offenders (158MB), vintage Gmail uses just 0.81MB of memory. The ‘good ole’ days’ strike back!

read more

David Mark Clements Smashing Magazine

Keeping Node.js fast

David Mark Clements shares tools, techniques, and tips for making high-performance Node.js servers in this super deep post on Smashing Magazine: The surging popularity of Node.js has exposed the need for tooling, techniques and thinking suited to the constraints of server-side JavaScript. When it comes to performance, what works in the browser doesn’t necessarily suit Node.js. So, how do we make sure a Node.js implementation is fast and fit for purpose? Let’s walk through a hands-on example.

read more

Balaji Subramaniam

Kubernetes' CPU Manager

Feature highlights of the beta CPU Manager in Kubernetes from Balaji Subramaniam, Cloud Software Engineer and Connor Doyle, Cloud Software Architect at Intel AI… A single compute node in a Kubernetes cluster can run many pods and some of these pods could be running CPU-intensive workloads. In such a scenario, the pods might contend for the CPU resources available in that compute node. When this contention intensifies, the workload can move to different CPUs depending on whether the pod is throttled and the availability of CPUs at scheduling time. There might also be cases where the workload could be sensitive to context switches. In all the above scenarios, the performance of the workload might be affected. If your workload is sensitive to such scenarios, then CPU Manager can be enabled to provide better performance isolation by allocating exclusive CPUs for your workload.

read more

link Icon

Rules of optimization

Emil Persson’s optimization tweet was so well received that he decided to turn its <ol> of rules into a full-on blog post: Basically Programming Wisdom … posted a quote that basically suggested more or less that there’s never a good time to think about performance. Even experts should defer it until later! This is way worse advice than your usual “premature optimization is the root of all evil” tirade. I’m not a fan of premature optimization, myself. So there’s lots to ponder in this post. 🤔

read more

Jaana B. Dogan (JBD) Medium

Want to debug latency?

What is latency? And how exactly do you debug it? Jaana writes on the Observability+ blog: In the recent decade, our systems got complex. Our average production environments consist of many different services (many microservices, storage systems and more) with different deployment and production-maintenance cycles. Measuring latency and being able to react to latency issues are getting equally complex as our systems got more complex. This article will help you how to navigate yourself at a latency problem and what you need to put in place to effectively do so.

read more

Figma Icon Figma

Rust in production at Figma

This is the story of how Rust dramatically improving Figma’s server-side performance (one of their most important features). The multiplayer server we launched with two years ago is written in TypeScript and has served us surprisingly well, but Figma is rapidly growing more popular and that server isn’t going to be able to keep up. We decided to fix this by rewriting it in Rust.

read more

Matt Jaffee YouTube

The index as a first class citizen

Matt Jaffe was on a recent episode of Go Time and also gave this talk at OSCON recently on indexes as a first class citizen. In this video Matt talks about a piece of software that’s purely an index, not a database, not a datastore, just the index — and optimizing that single piece of software to be very fast! Here’s a quick breakdown of an index as a first class citizen: Standalone application, not just a data structure Horizontally scalable, distributed FAST, indexes should make things faster Flexible, integrates with other datastores and data types Also, learn more about Pilosa to see Matt’s work in action.

read more

Nuster Cache Server

Nuster – a high performance caching proxy server based on HAProxy

It is 100% compatible with HAProxy, and takes full advantage of the ACL functionality of HAProxy to provide fine-grained caching policy based on the content of request, response or server status. The feature list is long. Click through to see ’em all. Nuster is very fast, some test shows nuster is almost three times faster than nginx when both using single core, and nearly two times faster than nginx and three times faster than varnish when using all cores. Here’s a detailed benchmark backing up these claims.

read more


Postgres looks to LLVM's JIT for up to 20% speed up

This was posted back in March, but it’s news to me: A long-running project has been JIT-compiling SQL queries in PostgreSQL by making use of LLVM’s just-in-time compilation support, rather than passing SQL queries through Postgres’ interpreter. With the LLVM JIT’ed queries, more efficient code is generated by being able to make more use of run-time information and can especially help in increasing the performance of complex SQL queries. JIT-compiling expressions for PostgreSQL has been found to be up to ~20%+ faster in database tests like TPC-H. Creating indexes was found to be even 5~19% faster with this JIT mode Hopefully this feature will progress quick enough to land in Postgres 11. 🙏

read more

CSS-Tricks Icon CSS-Tricks

Hey hey `font-display`

Chris Coyier: Y’all know about font-display? It’s pretty great. It’s a CSS property that you can use within @font-face blocks to control how, visually, that font loads. … What do you get from it? The ability to control FOUT and FOIT as is right for your project, two things that both kinda suck in regards to font loading. Font loading strategy is pretty important. It’s one of the reasons I searched far and wide to improve the performance of fonts on Brightly Colored. Fortunately, if you’re using @font-face, using font-display is as easy as using one line of CSS, and you’ll see the performance improvements immediately. Unfortunately, as Chris points out, there’s no performant way to get around either FOUT or FOIT.

read more

0:00 / 0:00