The Testing Pyramid
Typically, people say that testing is like a pyramid. Imagine a pyramid like in Egypt.
At the bottom you have a very wide layer of unit tests. And unit tests test the smallest piece of code.
(Imagine you write a function that adds two numbers; so you write a test. If I call that function with arguments 2 and 3, do I get 5?)
Every language under the sun has a unit testing framework, because it’s so easy. Just load a piece of code, run it, check the result that you get, make sure it’s what you expect. So that’s why the bottom of the pyramid is usually very wide, because it’s easy to just write hundreds of tests to exercise all your little components.
And when you move higher up in the pyramid and now you’re trying to put units of code together.
Maybe you’re trying to use a
Todo class, but represent something and you do some other pieces of code. You’re now mostly trying to see if a couple of units of code work together; how they integrate. And that’s where you discover parts where the backend team and the frontend team actually did not communicate very well. So my module doesn’t work very well with another module.
Then at the very top of the pyramid you have end-to-end tests.
An end-to-end test is when you’re trying to run the whole thing as the end user would. For example, you open a website in your browser and you navigate and you work with your web application and you check if it updates the page correctly, if it calls the backend correctly.
The top of the pyramid is usually very sharp. That’s because you’re not supposed to write many end-to-end tests. I think this is obsolete thinking nowadays, because why was it hard to write end-to-end test?
It was hard to install the end-to-end test runner. It was finicky. It was flaky. The tests were flaky and didn’t give you much confidence, so you actually spent more time maintaining most end-to-end tests than you would actually spend time writing your web application.
Many people say:
Write many unit tests, write many integration tests, but just a few end-to-end tests (maybe just as a sanity).
And when we look at what Cypress allows you to do (which is write many useful tests that have very little flake) then you wanna write more end-to-end tests. You wanna make the pyramid almost like a rectangle or maybe as a pizza slice, where you have a lot of end-to-end tests and a few unit tests.
The Testing Pizza
It goes back to efficiency. If you test a small, single function that adds two numbers… Yeah, the test is easy, it’s fast, but it really only hits that particular function. But your web app is large and potential sources of errors are not just logical errors in your functions:
- It’s integration
- It’s assumptions
- It’s your bundler
- It’s your transpiler
- It’s your code deployment
- It’s your backend server
- It’s your DNS configuration
- it’s all the environment variables
That you have to set correctly in the backend and the frontend and all the little internet stuff in between. And the modern browser (which is an awfully, awfully complicated machine) where your assumptions that it will execute this add “two numbers together” completely is different from what the end thing will do.
So when you think about what’s the effectiveness, or how much do you actually exercise? How many potential errors can you find?
Well, the unit tests can find you a few logical errors, which is great. I write unit tests for that all the time. But all possible sources of error are discovered by end-to-end tests.
If you can test the site you just deployed and go through the main user story –just like a human user would do later on– if that works… the chances are when the real user goes through the same thing, it will be successful.
So for us, the end-to-end test should be the primary focus.
So if we flip the pyramid and we make the top wider and wider, and we will write more end-to-end tests, because they’re effective and we make it almost like a pizza slice, where we write more end-to-end tests, or if we start with end-to-end tests, it makes sense.
The Testing Crab
And what happens recently? Well, a functional tester or test runner like Cypress finds a text, clicks on the button, does all those things, but it only verifies that the application works.
It doesn’t verify that the applications looks good. And we’re all humans. We like pretty things, so we like styles. Some people even add CSS to their apps. I don’t know why, but it’s crazy. 😉
Once they add CSS and they do the styling, they want to make sure the app looks the same, and they don’t accidentally break it.
If we select elements, and you work with them, and you check the number of items, the CSS can still change. And then the application will look like crap, and users will be unhappy, and nobody is gonna buy anything from your website.
Cypress is just a functional test runner. It doesn’t care about CSS. And it’s very hard to write all the assertions saying:
The color of this element should be blue, and the border radius should be 2…
So instead, what people do –because it’s a real browser– is you can take and generate a screenshot of your page (or a part of it) and then you can do visual testing.
So you save a screenshot and it becomes a baseline or a master image. The next time you run the test you take another screenshot and then you compare it pixel by pixel with your baseline imagine. And you store those baseline images with your source code in your repository.
Computers are really good at comparing images pixel by pixel, and they’ll tell you:
Oh, it used to be blue…
The computer doesn’t have a concept of blue, but it says:
It used to be this. Now it’s a different color. Here’s where it changed
And then you can say:
Did they mean to change the CSS here? Why is it no longer blue and now red?
Visual testing to me is such an effective tool, paired with a full end-to-end test.,You can literally load the application, take a screenshot, and now you know it will never change accidentally. Do something where the application reacts, changes the layout, new DOM elements appear. Then you take another screenshot.
Boom. Now you tie it so close, any visual change (any CSS, SVG, anything) will change with pixels, and you’ll know that you accidentally broke the styling. And you’ll know that accidentally you made Craigslist look like Reddit…
And you’re like
No, no, no. Revert it, revert it, revert it!
So to me, the most useful pyramid right now is a pyramid that’s wholly end-to-end, functional and visual tests. That’s it.
And then, I can track code coverage instrumenting my application code; the end-to-end tests that will go through the whole flow, like a real user, are so effective at code coverage.
Well, hit 80% of your code (coverage).
And I usually say
Well, because end-to-end tests exercise the whole application, a single test can cover most of it, if it goes through the whole user story.
And then you look at the lines not covered, and you write end-to-end tests for those edge cases. And if you cannot reach both lines, because there could be edge cases that are unreachable through a well-designed interface, then you write end-to-end tests, API tests and component tests, and hit those lines, so you know that those components and unit tests of code work as well.
But it becomes a pyramid of end-to-end tests, little triangles for other types of tests, and to me it looks like a crab, because it’s a big kind of helmet shell, and little armored legs under it.
But it’s all about end-to-end tests, in my opinion.
Our conversation with Gleb doesn’t stop there. Listen to the entire episode to hear the story of Cypress, how they found a business model around open source, how you can make your end-to-end tests less brittle, and much more. Play it from the start right here 👇
Oh, and don’t forget to subscribe to JS Party in your favorite podcast app so you don’t miss future episodes and insights. ✌️