I love the name “YOLO” for this because it’s single-stage, but I have to laugh that it’s now on its sixth version. You only live once… six times? 😆
This is a minimal implementation of DALL·E Mini. It has been stripped to the bare essentials necessary for doing inference, and converted to PyTorch. The only third party dependencies are
torchfor the torch model and
flaxfor the flax model.
How much mini-er can it get from here? 🤔
Many organizations such as Commoncrawl, WebRecorder, Archive.org and libraries around the world, use the warc format to archive and store web data.
The full datasets of these services range in the few pebibytes(PiB), making them impractical to query using non-distributed systems.
This project aims to make subsets such data easier to access and query using SQL.
Crawl a site with
wget and import it into WarcDB:
wget --warc-file changelog "https://changelog.com" warcdb import archive.warcdb changelog.warc.gz
Then you can query away using SQL, such as this one to get all response headers:
sqlite3 archive.warcdb <<SQL select json_extract(h.value, '$.header') as header, json_extract(h.value, '$.value') as value from response, json_each(http_headers) h SQL
Last week I logged the very impressive Imagen project, which smarter people than me have said is the SOTA for text-to-image synthesis. Now a WIP implementation is just a
pip install imagen-pytorch away.
Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pretrained T5 model (attention network). It also contains dynamic clipping for improved classifier free guidance, noise level conditioning, and a memory efficient unet design.
In this article we take look at how you can use Python’s Kubernetes Client library to automate all the boring Kubernetes tasks and operations, such as creating/patching resources, watching events, accessing containers and more.
I’ve long been fascinated by literate programming (the art of writing code as if it was a novel), but it’s been awhile since I’ve seen a good example of in practice. Here’s a good one:
I wanted to showcase the BDD-inspired low-tech solution I came up with via a toy project, demonstrating a small but significant programming task, broken down as series of design-implementation cycles.
Wordle is a perfect target: it’s a small codebase, with a half dozen features to string together into a useable game.
This story has five chapters and a satisfying conclusion:
This project was my first foray into literate programming at this scale, an attempt to bring together all the good ideas of TDD, modern Python development, Gherkin usage for requirements traceability purposes (without overly zealous extremes of Cucumber automation). All these ideas were until now scattered, implemented each without the others in different places, and this project fuses them into something I hope is more valuable than the sum of its parts.
Nice to see some efforts around standardizing MLOps. Here’s their high-level selling points:
- 📦 Docker containers without the pain
- 🤬️ No more CUDA hell
- ✅ Define the inputs and outputs for your model with standard Python
- 🎁Automatic HTTP prediction server
- 🥞 Automatic queue worker
- ☁️ Cloud storage
- 🚀 Ready for production
From the engineering team at Bloomberg:
It can track memory allocations in Python code, in native extension modules, and in the Python interpreter itself. It can generate several different types of reports to help you analyze the captured memory usage data. While commonly used as a CLI tool, it can also be used as a library to perform more fine-grained profiling tasks.
It has a lot of nice outputs so you can grok what’s going on.
With a detailed instructions video, anyone can build it at home - you just need access to a 3D printer and have to buy a Raspberry Pi computer and an 8MP Raspberry Pi camera.
The full set of instructions are on their GitHub org.
Julia Evans lays out her process for taking API responses in her browser’s dev tools and using them in her own programs/scripts:
- look in developer tools for a promising JSON response
- copy as cURL
- remove irrelevant headers
- translate it into Python
Some of you might be wondering – can you always do this?
The answer is sort of yes – browsers aren’t magic! All the information browsers send to your backend is just HTTP requests. So if I copy all of the HTTP headers that my browser is sending, I think there’s literally no way for the backend to tell that the request isn’t sent by my browser and is actually being sent by a random Python program.
Before any new feature, change or improvement makes it into Python, there needs to be a Python Enhancement Proposal, also knows as PEP, outlining the proposed change. These PEPs are a great way of getting the freshest info about what might be included in the upcoming Python releases. So, in this article we will go over all the proposals that are going to bring some exciting new Python features in a near future!
In their own words:
Grist is a modern relational spreadsheet. It combines the flexibility of a spreadsheet with the robustness of a database to organize your data and make you more productive.
Since so many people make the Airtable comparison that I did in the headline, the team behind Grist has written up a comparison of the two offerings.
sudo works by writing your own version in Python!
One might think that
sudois actually some binary deeply integrated into the kernel, relying on a special purpose-built system call to achieve its functionality. After all, it lets you use
rootwithout even providing the password for that account! But thanks to one bit inside file permissions,
sudocan exist without any of this.
It’s still in closed beta, but this looks like a really cool environment for data scientists and other folks who code to accomplish other goals vs code as craft. One cool thing you can do is take your Jupyter notebooks and convert them to PyFlow graphs (and vice versa).
I love how much hacking has been inspired by Wordle.
The Wordle source code contains 2,315 days of answers (all common 5-letter English words) and 10,657 other valid, less-common 5-letter English words.
We combine these to form a set of 12,972 possible words/answers.
We then simulate playing 1,000 Wordle games for each of these possible words, guessing based on the frequency of the word in the English language and the feedback received.
Then we take three measures to evaluate the observed distribution of ⬛🟨🟩 squares on Twitter according to our valid words.
The resulting code is included in the article.
Funny how stuff like this plays out sometimes:
My cog tool has been having a resurgence of late: a number of people are discovering it’s useful to run a little bit of Python code inside otherwise static files.
He goes on to list out a bunch of tweets from people finding it useful for various tasks and even got to talk about the project on podcast.__init__. Cool tool, cool story.
Martin Heinz on the tools/techniques for finding bottlenecks in your Python code. And fixing them, fast.
The first rule of optimization is to not do it. If you really have to though, then optimize where appropriate. Use the above profiling tools to find bottlenecks, so you don’t waste time optimizing some inconsequential piece of code. It’s also useful to create a reproducible benchmark for the piece of code you’re trying to optimize, so that you can measure the actual improvement.
ZenML is an extensible MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstractions that are catered towards ML workflows.
The code base was recently completely rewritten with better abstractions and to set us up for our ongoing growth and inclusion of more integrations with tools that data scientists love to use.
OpenDrop is a command-line tool that allows sharing files between devices directly over Wi-Fi. Its unique feature is that it is protocol-compatible with Apple AirDrop which allows to share files with Apple devices running iOS and macOS.
Super cool, but with a disclaimer: this is the result of reverse engineering the transfer protocol, so the odds of it being flakey (especially as Apple ships OS updates) are high. It’d be rad if Apple would publish an AirDrop-compatible specification for the community to rally around.
You know, like they did with FaceTime. 😉
Highlights include built-in support for caching with Redis, more easily customizable forms and error lists, and standardizing Django’s timezone implementation to align with the Python standard library’s implementation. for the full list of goodies, check the release notes. Congrats to the entire Django team and community on the big release!
Side note: we haven’t done an episode on Django for many moons. Are we due? Who’d be the perfect guest you’d love to hear from on the topic?
When I was in college my professor tried teaching us regular expressions: the theory, the algebra, all that. I left dumbfounded. When I hit the Real World I quickly ran into many practical use cases where a regular expression helped me solve a problem. That’s when I finally grokked them.
If you want to improve your understanding (and deployment of) regular expressions to solve problems, but don’t have a use case at the moment… here’s a whole bunch of them for you to practice on in the meantime.
The bulk of this post by Brett Cannon is a detailed argument that Python makes sense to select even for projects with known performance concerns, but I got my money’s worth from the concept in the title and opener:
… it dawned on me that the problem is people are not treating language selection as potential form of premature optimization: if you select a programming language based on your preconceived notions of how a language performs, you will never know if the language that might be a better, more productive fit for your developers would have actually worked out.
Doug Turnbull breaks down a major difference between two beloved programming languages in how they handle iteration.
for. Objects tell
forhow to work with them, and the for loop’s body processes what’s given back by the object. Ruby does the opposite. In Ruby,
each) is a method of the Object. The caller passes the body of the for loop to this method.
He goes on to give examples and explain why each approach might map to different developers’ brains… differently. Here’s a succinct summary, if you don’t have time for the deeper discussion:
In Ruby, the objects control the affordances. In Python, the language does.