Production ML systems include more than just the model. In these complicated systems, how do you ensure quality over time, especially when you are constantly updating your infrastructure, data and models? Tania Allard joins us to discuss the ins and outs of testing ML systems. Among other things, she presents a simple formula that helps you score your progress towards a robust system and identify problem areas.
Chaos Mesh is a cloud-native Chaos Engineering platform that orchestrates chaos on Kubernetes environments. At the current stage, it has the following components:
- Chaos Operator: the core component for chaos orchestration. Fully open sourced.
- Chaos Dashboard: a visualized panel that shows the impacts of chaos experiments on the online services of the system; under development; curently only supports chaos experiments on TiDB(https://github.com/pingcap/tidb).
For the uninitiated, chaos engineering is when you unleash havoc on your system to prove out its resiliency (or lack thereof).
LocalStack looks like an excellent way to develop & test your serverless apps without leaving your local host. It appears they are basically mocking 20+ AWS services which is undoubtedly a lot of work and I would expect to be error prone. Is anybody out there using LocalStack on the regular and can let us know if it actually works as advertised?
Writing good tests is hard, and very few people have thought about this domain more than Kent Beck. In this post, he lays out a short list of properties that good tests have.
Look at the last test you wrote. Which properties does it have? Which does it lack? Is that the tradeoff you want to make?
Kent Beck, for Increment:
It’s 2030. A programmer in Lagos extracts a helper method. Seconds later, the code of every developer working on the program around the world updates to reflect the change. Seconds later, each of the thousands of servers running the software updates. Seconds later, the device in my pocket in Berlin updates, along with hundreds of millions of other devices across the globe.
Perhaps the most absurd assumption in this story is that I’ll still have a pocket in 10 years.
Mocking is a powerful technique for isolating tests from undesired interactions among components. But often people find their mock isn’t taking effect, and it’s not clear why. Hopefully this explanation will clear things up.
Mocking isn’t always the best test isolation technique, but if/when you use it, you might as well use it correctly. Ned’s here to help you do just that.
Some people think that usability is very costly and complex and that user tests should be reserved for the rare web design project with a huge budget and a lavish time schedule. Not true. Elaborate usability tests are a waste of resources. The best results come from testing no more than 5 users and running as many small tests as you can afford.
This article is from the year 2000 (queue Conan O’Brien’s side kick), but it’s filled with timeless goodies. Its conclusions are a straight forward example of diminishing returns, but worth reading how they arrived at them from empirical evidence.
Mat and Carmen along with guest panelists Dave Cheney, Peter Bourgon, and Marcel van Lohuizen discuss errors in Go, including the new try proposal. Many questions get answered…What do we think about how errors work in Go? How is it different from other languages/approaches? What do/don’t we like? What don’t we like? How do we handle errors these days? What’s going on with the try proposal?
This interesting testing tool was pointed out to me by Ned Batchelder when he was on The Changelog.
It combines human understanding of your problem domain with machine intelligence to improve the quality of your testing process while spending less time writing tests.
At its core, Hypothesis is a modern implementation of property based testing, which came out of the Haskell world 20 (!) years ago.
Hypothesis runs your tests against a much wider range of scenarios than a human tester could, finding edge cases in your code that you would otherwise have missed. It then turns them into simple and easy to understand failures that save you time and money compared to fixing them if they slipped through the cracks and a user had run into them instead.
Inspired by JSParty #70, 4 quick lessons on the philosophy of testing. The motivation?
Tools like Mocha, Jasmine and Jest have made writing tests far easier… But there’s still a gap. It’s extremely hard to find information on the philosophy of testing. What to test and why. How much is enough? What type of tests should I be writing, and when does it fit into my process?
Is testing an art or a science? What and when should we test? What’s the point of testing and can it go too far? We explore all this and more in this jam-packed episode on testing.
Stop wasting time mocking APIs. MockIt gives you an interface to configure and create REAL mocked end points for your applications.
When I first discovered that nine of my tests were failing due to a broken external API, my first thought was, “Man, maybe I should mock out those API calls so my tests don’t fail when the API breaks.” But then I thought about it a little harder…
This is why libraries such as VCR are so awesome.
Don’t put off slow tests as an annoyance: do the math on how much time your team is wasting, and then spend a commensurate amount of time speeding it up. A week’s worth of developer time this month will save you a whole lot more over the course of a year.
With prop-sets, you don’t need to outsmart your own code when writing tests. Instead of determining fragile points of failure from particular combinations of inputs, simply generate all possible combinations and assert everything.
We made a silly joke on Twitter yesterday (this is what Twitter is for, no?) about test doubles and that unfortunate moment when they inevitably surprise you.
This prompted Shlomo Kraus to reach out and tell us about Mockshot. In brief:
Imagine you could:
- Never manually write a mock again
- Have a guarantee that your mocks are always valid
Sounds nice! It works by using Jest’s snapshot tests output to generate mocks to be used in other tests.
This is purposeful coupling, which seems like it could backfire in the long-run. However, the team behind the library has been using it for over a year and are still singing its praises. For more on their experience creating and using it, read this.
Testing code that talks to the database can be slow. Fakes are fast but unrealistic. What to do? With a little help from Docker, you can write tests that run fast, use the real database, are easy to write and run.
I tried Itamar’s technique on changelog.com’s test suite and the 679 tests complete in ~17 seconds. The same tests run directly against Postgres complete in ~12 seconds.
A net loss for me, but that may have something to do with how Docker for Mac works? I’d love to hear other people’s experiences.
Congrats to all of Mocha’s contributors on what looks like a huge release!
Tanya Janca compares and contrasts quality bugs and security bugs, arguing that they’re quite different and should be treated differently. This logic resonates with me and she has a lot of insights to share along the way. I particularly enjoyed this bit:
You cannot have a high-quality product that is insecure; it is an oxymoron. If an application is fast, beautiful and does everything the client asked for, but someone breaks into the first day that it is released, I don’t think you will find anyone willing to call it a high-quality application.
A good read all the way through to the end. 👍
This is a great article that covers the 🐛 gamut:
- spotting bugs
- reporting bugs
- reproducing bugs
- fixing bugs
I love the “lifehack” snippets Nikita sprinkles in as well. Like this little gem right here:
Lifehack: sometimes you might want to submit a broken code to your branch so it will trigger a CI build. After the build, it will be saved in your project. And your colleagues will be able to link to this problem. Your next commit will have to solve the issue.
Automated tests are immensely useful, but passing tests don’t mean your software is correct. Correctness requires applying human judgement before, during, and after coding, which is why you need additional development techniques beyond just tests.
Tests are 💯 a tool in your tool belt, not a silver bullet.
JSONPlaceholder is a free online REST API that you can use whenever you need some fake data. It’s great for tutorials, testing new libraries, sharing code examples, …
It comes with a set of 6 common resources. You know, the usual suspects like
/comments. Prefer to use your own data? The whole thing is powered by json-server, which will get you up and running in 30 seconds-ish.
This is a spectacularly thoughtful and insightful piece by Eugen Kiss on testing:
Different kinds of tests have different costs and benefits. You have finite resources to distribute into testing. You want to get the most out of your tests, so use the most economic testing approach.
He goes on to describe why he believes that integration tests provide better ROI than unit tests and end-to-end tests. Then he turns his aim on unit tests in particular:
There is the claim that making your code unit-testable will improve its quality. Many arguments and some empirical evidence in favor of that claim exist so I will put light on the other side… Unit tests ossify the internal structure of the code.
Click through to read his whole argument, but I will say in my experience unit tests only ossify the structure when I do them poorly. In other words, the better I get at unit testing, the more useful they become. In light of that, Eugen’s big takeaway at the end might be 💯 on point:
If you desire clear, albeit unnuanced, instructions, here is what you should do: Use a typed language. Focus on integration and end-to-end tests. Use unit tests only where they make sense (e.g. pure algorithmic code with complex corner cases). Be economic. Be lean.
Testing React components may be challenging for beginners and experienced developers who have already worked with tests. It may be interesting to compare your own approaches with the ones we use in our project. In order to cover the codebase, you have to know which components must be tested and which code exactly in component should be covered.