Python Icon

Python

Python is a dynamically typed programming language.
309 Stories
All Topics

Sean Moriarity dockyard.com

Elixir versus Python for data science

Sean Moriarity:

A common argument against using Nx for a new machine learning project is its perceived lack of a library/support for some common task that is available in Python. In this post, I’ll do my best to highlight areas where this is not the case, and compare and contrast Elixir projects with their Python equivalents. Additionally, I’ll discuss areas where the Elixir ecosystem still comes up short, and using Nx for a new project might not be the best idea.

Sean is a prominent member of the Elixir community, so that’s the perspective on display here, but it’s a thorough and well-reasoned comparison. He concludes:

While there are still many gaps in the Elixir ecosystem, the progress over the last year has been rapid. Almost every library I’ve mentioned in this post is less than two years old, and I suspect there will be many more projects to fill some of the gaps I’ve mentioned in the coming months.

Python docs.python.org

Python 3.11 is up to 10-60% faster than Python 3.10

One beautiful thing about open source software: how hundreds of thousands (millions?) of people’s Python apps got faster while they were sound asleep. From 3.11’s release notes:

CPython 3.11 is on average 25% faster than CPython 3.10 when measured with the pyperformance benchmark suite, and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup could be up to 10-60% faster.

SQLite github.com

Web crawl data as SQLite databases

Many organizations such as Commoncrawl, WebRecorder, Archive.org and libraries around the world, use the warc format to archive and store web data.

The full datasets of these services range in the few pebibytes(PiB), making them impractical to query using non-distributed systems.

This project aims to make subsets such data easier to access and query using SQL.

Crawl a site with wget and import it into WarcDB:

wget --warc-file changelog "https://changelog.com"

warcdb import archive.warcdb changelog.warc.gz

Then you can query away using SQL, such as this one to get all response headers:

sqlite3 archive.warcdb <<SQL
select  json_extract(h.value, '$.header') as header, 
        json_extract(h.value, '$.value') as value
from response,
     json_each(http_headers) h
SQL

Python github.com

Imagen (Google's text-to-image neural net) implemented in Pytorch

Last week I logged the very impressive Imagen project, which smarter people than me have said is the SOTA for text-to-image synthesis. Now a WIP implementation is just a pip install imagen-pytorch away.

Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pretrained T5 model (attention network). It also contains dynamic clipping for improved classifier free guidance, noise level conditioning, and a memory efficient unet design.

Changelog Interviews Changelog Interviews #489

Run your home on a Raspberry Pi

This week we’re joined by Mike Riley and we’re talking about his book Portable Python Projects (Running your home on a Raspberry Pi). We breakdown the details of the latest Raspberry Pi hardware, various automation ideas from the book, why Mike prefers Python for scripting on a Raspberry Pi, and of course why the Raspberry Pi makes sense for home labs concerned about data security.

Use the code PYPROJECTS to get a 35% discount on the book. That code is valid for approximately 60 days after the episode’s publish date.

Documentation jiby.tech

Literate programming Wordle

I’ve long been fascinated by literate programming (the art of writing code as if it was a novel), but it’s been awhile since I’ve seen a good example of in practice. Here’s a good one:

I wanted to showcase the BDD-inspired low-tech solution I came up with via a toy project, demonstrating a small but significant programming task, broken down as series of design-implementation cycles.

Wordle is a perfect target: it’s a small codebase, with a half dozen features to string together into a useable game.

This story has five chapters and a satisfying conclusion:

This project was my first foray into literate programming at this scale, an attempt to bring together all the good ideas of TDD, modern Python development, Gherkin usage for requirements traceability purposes (without overly zealous extremes of Cucumber automation). All these ideas were until now scattered, implemented each without the others in different places, and this project fuses them into something I hope is more valuable than the sum of its parts.

Python bloomberg.github.io

Memray is a memory profiler for Python

From the engineering team at Bloomberg:

It can track memory allocations in Python code, in native extension modules, and in the Python interpreter itself. It can generate several different types of reports to help you analyze the captured memory usage data. While commonly used as a CLI tool, it can also be used as a library to perform more fine-grained profiling tasks.

It has a lot of nice outputs so you can grok what’s going on.

Memray is a memory profiler for Python

Julia Evans jvns.ca

How to use undocumented web APIs

Julia Evans lays out her process for taking API responses in her browser’s dev tools and using them in her own programs/scripts:

  1. look in developer tools for a promising JSON response
  2. copy as cURL
  3. remove irrelevant headers
  4. translate it into Python

Some of you might be wondering – can you always do this?

The answer is sort of yes – browsers aren’t magic! All the information browsers send to your backend is just HTTP requests. So if I copy all of the HTTP headers that my browser is sending, I think there’s literally no way for the backend to tell that the request isn’t sent by my browser and is actually being sent by a random Python program.

Martin Heinz martinheinz.dev

Upcoming Python features brought to you by PEPs

Before any new feature, change or improvement makes it into Python, there needs to be a Python Enhancement Proposal, also knows as PEP, outlining the proposed change. These PEPs are a great way of getting the freshest info about what might be included in the upcoming Python releases. So, in this article we will go over all the proposals that are going to bring some exciting new Python features in a near future!

Databases github.com

Grist is a lot like Airtable, but open source and more customizable

In their own words:

Grist is a modern relational spreadsheet. It combines the flexibility of a spreadsheet with the robustness of a database to organize your data and make you more productive.

Since so many people make the Airtable comparison that I did in the headline, the team behind Grist has written up a comparison of the two offerings.

Learn rtpg.co

Writing your own sudo

Learn how sudo works by writing your own version in Python!

One might think that sudo is actually some binary deeply integrated into the kernel, relying on a special purpose-built system call to achieve its functionality. After all, it lets you use root without even providing the password for that account! But thanks to one bit inside file permissions, sudo can exist without any of this.

Python kaggle.com

Get the daily Wordle on the first try using the tweet distribution

I love how much hacking has been inspired by Wordle.

The Wordle source code contains 2,315 days of answers (all common 5-letter English words) and 10,657 other valid, less-common 5-letter English words.

We combine these to form a set of 12,972 possible words/answers.

We then simulate playing 1,000 Wordle games for each of these possible words, guessing based on the frequency of the word in the English language and the feedback received.

Then we take three measures to evaluate the observed distribution of ⬛🟨🟩 squares on Twitter according to our valid words.

The resulting code is included in the article.

Ned Batchelder nedbatchelder.com

Ned Batchelder's cog tool is an overnight success 17 years in the making

Funny how stuff like this plays out sometimes:

My cog tool has been having a resurgence of late: a number of people are discovering it’s useful to run a little bit of Python code inside otherwise static files.

He goes on to list out a bunch of tweets from people finding it useful for various tasks and even got to talk about the project on podcast.__init__. Cool tool, cool story.

Martin Heinz martinheinz.dev

Profiling and analyzing performance of Python programs

Martin Heinz on the tools/techniques for finding bottlenecks in your Python code. And fixing them, fast.

The first rule of optimization is to not do it. If you really have to though, then optimize where appropriate. Use the above profiling tools to find bottlenecks, so you don’t waste time optimizing some inconsequential piece of code. It’s also useful to create a reproducible benchmark for the piece of code you’re trying to optimize, so that you can measure the actual improvement.

Player art
  0:00 / 0:00