With every Python release, there are new modules being added and new and better ways of doing things get introduced. We all get used to using the good old Python libraries, but it’s time say goodbye to
namedtuple and many more obsolete Python libraries and start using the latest and greatest ones instead.
A common argument against using Nx for a new machine learning project is its perceived lack of a library/support for some common task that is available in Python. In this post, I’ll do my best to highlight areas where this is not the case, and compare and contrast Elixir projects with their Python equivalents. Additionally, I’ll discuss areas where the Elixir ecosystem still comes up short, and using Nx for a new project might not be the best idea.
Sean is a prominent member of the Elixir community, so that’s the perspective on display here, but it’s a thorough and well-reasoned comparison. He concludes:
While there are still many gaps in the Elixir ecosystem, the progress over the last year has been rapid. Almost every library I’ve mentioned in this post is less than two years old, and I suspect there will be many more projects to fill some of the gaps I’ve mentioned in the coming months.
One beautiful thing about open source software: how hundreds of thousands (millions?) of people’s Python apps got faster while they were sound asleep. From 3.11’s release notes:
CPython 3.11 is on average 25% faster than CPython 3.10 when measured with the pyperformance benchmark suite, and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup could be up to 10-60% faster.
Brenton Cleeland starts a lot of projects. Django is his go-to framework, so he’s settled on these common steps he performs right after
- Move the SECRET_KEY into an environment variable
- Change the database configuration to DATABASE_URL
- Set up a custom user model
- Create your Django app
- Make a base.html
- Gibo and Git Init
I love the name “YOLO” for this because it’s single-stage, but I have to laugh that it’s now on its sixth version. You only live once… six times? 😆
This is a minimal implementation of DALL·E Mini. It has been stripped to the bare essentials necessary for doing inference, and converted to PyTorch. The only third party dependencies are
torchfor the torch model and
flaxfor the flax model.
How much mini-er can it get from here? 🤔
Many organizations such as Commoncrawl, WebRecorder, Archive.org and libraries around the world, use the warc format to archive and store web data.
The full datasets of these services range in the few pebibytes(PiB), making them impractical to query using non-distributed systems.
This project aims to make subsets such data easier to access and query using SQL.
Crawl a site with
wget and import it into WarcDB:
wget --warc-file changelog "https://changelog.com" warcdb import archive.warcdb changelog.warc.gz
Then you can query away using SQL, such as this one to get all response headers:
sqlite3 archive.warcdb <<SQL select json_extract(h.value, '$.header') as header, json_extract(h.value, '$.value') as value from response, json_each(http_headers) h SQL
Last week I logged the very impressive Imagen project, which smarter people than me have said is the SOTA for text-to-image synthesis. Now a WIP implementation is just a
pip install imagen-pytorch away.
Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pretrained T5 model (attention network). It also contains dynamic clipping for improved classifier free guidance, noise level conditioning, and a memory efficient unet design.
In this article we take look at how you can use Python’s Kubernetes Client library to automate all the boring Kubernetes tasks and operations, such as creating/patching resources, watching events, accessing containers and more.
This week we’re joined by Mike Riley and we’re talking about his book Portable Python Projects (Running your home on a Raspberry Pi). We breakdown the details of the latest Raspberry Pi hardware, various automation ideas from the book, why Mike prefers Python for scripting on a Raspberry Pi, and of course why the Raspberry Pi makes sense for home labs concerned about data security.
Use the code
PYPROJECTS to get a 35% discount on the book. That code is valid for approximately 60 days after the episode’s publish date.
I’ve long been fascinated by literate programming (the art of writing code as if it was a novel), but it’s been awhile since I’ve seen a good example of in practice. Here’s a good one:
I wanted to showcase the BDD-inspired low-tech solution I came up with via a toy project, demonstrating a small but significant programming task, broken down as series of design-implementation cycles.
Wordle is a perfect target: it’s a small codebase, with a half dozen features to string together into a useable game.
This story has five chapters and a satisfying conclusion:
This project was my first foray into literate programming at this scale, an attempt to bring together all the good ideas of TDD, modern Python development, Gherkin usage for requirements traceability purposes (without overly zealous extremes of Cucumber automation). All these ideas were until now scattered, implemented each without the others in different places, and this project fuses them into something I hope is more valuable than the sum of its parts.
Nice to see some efforts around standardizing MLOps. Here’s their high-level selling points:
- 📦 Docker containers without the pain
- 🤬️ No more CUDA hell
- ✅ Define the inputs and outputs for your model with standard Python
- 🎁Automatic HTTP prediction server
- 🥞 Automatic queue worker
- ☁️ Cloud storage
- 🚀 Ready for production
From the engineering team at Bloomberg:
It can track memory allocations in Python code, in native extension modules, and in the Python interpreter itself. It can generate several different types of reports to help you analyze the captured memory usage data. While commonly used as a CLI tool, it can also be used as a library to perform more fine-grained profiling tasks.
It has a lot of nice outputs so you can grok what’s going on.
With a detailed instructions video, anyone can build it at home - you just need access to a 3D printer and have to buy a Raspberry Pi computer and an 8MP Raspberry Pi camera.
The full set of instructions are on their GitHub org.
Julia Evans lays out her process for taking API responses in her browser’s dev tools and using them in her own programs/scripts:
- look in developer tools for a promising JSON response
- copy as cURL
- remove irrelevant headers
- translate it into Python
Some of you might be wondering – can you always do this?
The answer is sort of yes – browsers aren’t magic! All the information browsers send to your backend is just HTTP requests. So if I copy all of the HTTP headers that my browser is sending, I think there’s literally no way for the backend to tell that the request isn’t sent by my browser and is actually being sent by a random Python program.
Before any new feature, change or improvement makes it into Python, there needs to be a Python Enhancement Proposal, also knows as PEP, outlining the proposed change. These PEPs are a great way of getting the freshest info about what might be included in the upcoming Python releases. So, in this article we will go over all the proposals that are going to bring some exciting new Python features in a near future!
In their own words:
Grist is a modern relational spreadsheet. It combines the flexibility of a spreadsheet with the robustness of a database to organize your data and make you more productive.
Since so many people make the Airtable comparison that I did in the headline, the team behind Grist has written up a comparison of the two offerings.
sudo works by writing your own version in Python!
One might think that
sudois actually some binary deeply integrated into the kernel, relying on a special purpose-built system call to achieve its functionality. After all, it lets you use
rootwithout even providing the password for that account! But thanks to one bit inside file permissions,
sudocan exist without any of this.
It’s still in closed beta, but this looks like a really cool environment for data scientists and other folks who code to accomplish other goals vs code as craft. One cool thing you can do is take your Jupyter notebooks and convert them to PyFlow graphs (and vice versa).
I love how much hacking has been inspired by Wordle.
The Wordle source code contains 2,315 days of answers (all common 5-letter English words) and 10,657 other valid, less-common 5-letter English words.
We combine these to form a set of 12,972 possible words/answers.
We then simulate playing 1,000 Wordle games for each of these possible words, guessing based on the frequency of the word in the English language and the feedback received.
Then we take three measures to evaluate the observed distribution of ⬛🟨🟩 squares on Twitter according to our valid words.
The resulting code is included in the article.
Funny how stuff like this plays out sometimes:
My cog tool has been having a resurgence of late: a number of people are discovering it’s useful to run a little bit of Python code inside otherwise static files.
He goes on to list out a bunch of tweets from people finding it useful for various tasks and even got to talk about the project on podcast.__init__. Cool tool, cool story.
Martin Heinz on the tools/techniques for finding bottlenecks in your Python code. And fixing them, fast.
The first rule of optimization is to not do it. If you really have to though, then optimize where appropriate. Use the above profiling tools to find bottlenecks, so you don’t waste time optimizing some inconsequential piece of code. It’s also useful to create a reproducible benchmark for the piece of code you’re trying to optimize, so that you can measure the actual improvement.