Python Icon

Python

Python is a dynamically typed programming language.
279 Stories
All Topics

SQLite github.com

Web crawl data as SQLite databases

Many organizations such as Commoncrawl, WebRecorder, Archive.org and libraries around the world, use the warc format to archive and store web data.

The full datasets of these services range in the few pebibytes(PiB), making them impractical to query using non-distributed systems.

This project aims to make subsets such data easier to access and query using SQL.

Crawl a site with wget and import it into WarcDB:

wget --warc-file changelog "https://changelog.com"

warcdb import archive.warcdb changelog.warc.gz

Then you can query away using SQL, such as this one to get all response headers:

sqlite3 archive.warcdb <<SQL
select  json_extract(h.value, '$.header') as header, 
        json_extract(h.value, '$.value') as value
from response,
     json_each(http_headers) h
SQL

Python github.com

Imagen (Google's text-to-image neural net) implemented in Pytorch

Last week I logged the very impressive Imagen project, which smarter people than me have said is the SOTA for text-to-image synthesis. Now a WIP implementation is just a pip install imagen-pytorch away.

Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pretrained T5 model (attention network). It also contains dynamic clipping for improved classifier free guidance, noise level conditioning, and a memory efficient unet design.

Documentation jiby.tech

Literate programming Wordle

I’ve long been fascinated by literate programming (the art of writing code as if it was a novel), but it’s been awhile since I’ve seen a good example of in practice. Here’s a good one:

I wanted to showcase the BDD-inspired low-tech solution I came up with via a toy project, demonstrating a small but significant programming task, broken down as series of design-implementation cycles.

Wordle is a perfect target: it’s a small codebase, with a half dozen features to string together into a useable game.

This story has five chapters and a satisfying conclusion:

This project was my first foray into literate programming at this scale, an attempt to bring together all the good ideas of TDD, modern Python development, Gherkin usage for requirements traceability purposes (without overly zealous extremes of Cucumber automation). All these ideas were until now scattered, implemented each without the others in different places, and this project fuses them into something I hope is more valuable than the sum of its parts.

Python bloomberg.github.io

Memray is a memory profiler for Python

From the engineering team at Bloomberg:

It can track memory allocations in Python code, in native extension modules, and in the Python interpreter itself. It can generate several different types of reports to help you analyze the captured memory usage data. While commonly used as a CLI tool, it can also be used as a library to perform more fine-grained profiling tasks.

It has a lot of nice outputs so you can grok what’s going on.

Memray is a memory profiler for Python

Julia Evans jvns.ca

How to use undocumented web APIs

Julia Evans lays out her process for taking API responses in her browser’s dev tools and using them in her own programs/scripts:

  1. look in developer tools for a promising JSON response
  2. copy as cURL
  3. remove irrelevant headers
  4. translate it into Python

Some of you might be wondering – can you always do this?

The answer is sort of yes – browsers aren’t magic! All the information browsers send to your backend is just HTTP requests. So if I copy all of the HTTP headers that my browser is sending, I think there’s literally no way for the backend to tell that the request isn’t sent by my browser and is actually being sent by a random Python program.

Martin Heinz martinheinz.dev

Upcoming Python features brought to you by PEPs

Before any new feature, change or improvement makes it into Python, there needs to be a Python Enhancement Proposal, also knows as PEP, outlining the proposed change. These PEPs are a great way of getting the freshest info about what might be included in the upcoming Python releases. So, in this article we will go over all the proposals that are going to bring some exciting new Python features in a near future!

Databases github.com

Grist is a lot like Airtable, but open source and more customizable

In their own words:

Grist is a modern relational spreadsheet. It combines the flexibility of a spreadsheet with the robustness of a database to organize your data and make you more productive.

Since so many people make the Airtable comparison that I did in the headline, the team behind Grist has written up a comparison of the two offerings.

Learn rtpg.co

Writing your own sudo

Learn how sudo works by writing your own version in Python!

One might think that sudo is actually some binary deeply integrated into the kernel, relying on a special purpose-built system call to achieve its functionality. After all, it lets you use root without even providing the password for that account! But thanks to one bit inside file permissions, sudo can exist without any of this.

Python kaggle.com

Get the daily Wordle on the first try using the tweet distribution

I love how much hacking has been inspired by Wordle.

The Wordle source code contains 2,315 days of answers (all common 5-letter English words) and 10,657 other valid, less-common 5-letter English words.

We combine these to form a set of 12,972 possible words/answers.

We then simulate playing 1,000 Wordle games for each of these possible words, guessing based on the frequency of the word in the English language and the feedback received.

Then we take three measures to evaluate the observed distribution of ⬛🟨🟩 squares on Twitter according to our valid words.

The resulting code is included in the article.

Ned Batchelder nedbatchelder.com

Ned Batchelder's cog tool is an overnight success 17 years in the making

Funny how stuff like this plays out sometimes:

My cog tool has been having a resurgence of late: a number of people are discovering it’s useful to run a little bit of Python code inside otherwise static files.

He goes on to list out a bunch of tweets from people finding it useful for various tasks and even got to talk about the project on podcast.__init__. Cool tool, cool story.

Martin Heinz martinheinz.dev

Profiling and analyzing performance of Python programs

Martin Heinz on the tools/techniques for finding bottlenecks in your Python code. And fixing them, fast.

The first rule of optimization is to not do it. If you really have to though, then optimize where appropriate. Use the above profiling tools to find bottlenecks, so you don’t waste time optimizing some inconsequential piece of code. It’s also useful to create a reproducible benchmark for the piece of code you’re trying to optimize, so that you can measure the actual improvement.

Alex Strick van Linschoten github.com

ZenML helps data scientists work across the full stack

ZenML is an extensible MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstractions that are catered towards ML workflows.

The code base was recently completely rewritten with better abstractions and to set us up for our ongoing growth and inclusion of more integrations with tools that data scientists love to use.

Command line interface github.com

AirDrop files directly from your CLI with OpenDrop

OpenDrop is a command-line tool that allows sharing files between devices directly over Wi-Fi. Its unique feature is that it is protocol-compatible with Apple AirDrop which allows to share files with Apple devices running iOS and macOS.

Super cool, but with a disclaimer: this is the result of reverse engineering the transfer protocol, so the odds of it being flakey (especially as Apple ships OS updates) are high. It’d be rad if Apple would publish an AirDrop-compatible specification for the community to rally around.

You know, like they did with FaceTime. 😉

Django djangoproject.com

Django 4.0 released

Highlights include built-in support for caching with Redis, more easily customizable forms and error lists, and standardizing Django’s timezone implementation to align with the Python standard library’s implementation. for the full list of goodies, check the release notes. Congrats to the entire Django team and community on the big release!

Side note: we haven’t done an episode on Django for many moons. Are we due? Who’d be the perfect guest you’d love to hear from on the topic?

Python github.com

Improve your Python regex skills with 75 interactive exercises

When I was in college my professor tried teaching us regular expressions: the theory, the algebra, all that. I left dumbfounded. When I hit the Real World I quickly ran into many practical use cases where a regular expression helped me solve a problem. That’s when I finally grokked them.

If you want to improve your understanding (and deployment of) regular expressions to solve problems, but don’t have a use case at the moment… here’s a whole bunch of them for you to practice on in the meantime.

Brett Cannon snarky.ca

Selecting a programming language can be a form of premature optimization

The bulk of this post by Brett Cannon is a detailed argument that Python makes sense to select even for projects with known performance concerns, but I got my money’s worth from the concept in the title and opener:

… it dawned on me that the problem is people are not treating language selection as potential form of premature optimization: if you select a programming language based on your preconceived notions of how a language performs, you will never know if the language that might be a better, more productive fit for your developers would have actually worked out.

Ruby softwaredoug.com

Ruby vs Python comes down to the for loop

Doug Turnbull breaks down a major difference between two beloved programming languages in how they handle iteration.

Python embraces for. Objects tell for how to work with them, and the for loop’s body processes what’s given back by the object. Ruby does the opposite. In Ruby, for itself (via each) is a method of the Object. The caller passes the body of the for loop to this method.

He goes on to give examples and explain why each approach might map to different developers’ brains… differently. Here’s a succinct summary, if you don’t have time for the deeper discussion:

In Ruby, the objects control the affordances. In Python, the language does.

0:00 / 0:00