Search results for [BCCMINING]💯long-tail cryptocurrency mining keywords

The world's largest open library dataset

Unsplash has released the world’s largest open library dataset, which includes 2M+ high-quality Unsplash photos, 5M keywords, and over 250M searches. They have big ideas about how the dataset might be used by ML/AI folks, and there have already been some interesting applications. In this episode, Luke and Tim discuss why they released this data and what it take to maintain a dataset of this size.

fasterthanli.me

30 minutes to learn Rust

In order to increase fluency in a programming language, one has to read a lot of it. But how can you read a lot of it if you don’t know what it means?

This 28 minute read will walk you through lots of Rust snippets and explain the meaning of the keywords and symbols they contain. Additional learning resources are included at the end too.

Special thanks to the 46 patrons mentioned by name at the end of the post who enable Amos to write and share this type of content.

cvcompiler.com

An NLP tool for improving dev resumes

CV Compiler is an online resume analysis tool designed exclusively for software engineers.

The review technology scans for keywords from the world of programming and how they are used in the resume, relative to the best practices in the industry.

CV Compiler was built using Python with libraries NLTK and spaCy for tokenization, lemmatization, and POS-tagging.

The internal analysis engine for large datasets (resumes, job descriptions) was built upon a Seq2Seq model in TensorFlow.

changelog.com/posts

Today is a big day for Rust

Lots of great stuff in the Rust world today.

Rust 0.6

First of all, Rust 0.6.0 has been released! You can find the announcement here.

As always, Rust works on Mac, Windows, and Linux. To get it, do this:

$ wget http://static.rust-lang.org/dist/rust-0.6.tar.gz
$ $ shasum -a 256 rust-0.6.tar.gz
e11cb529a1e20f27d99033181a9e0e131817136b46d2742f0fa1afa1210053e5  rust-0.6.tar.gz
$ tar xvf rust-0.6.tar.gz
$ rust-0.6
$ ./configure
$ make
# make install

I added the SHA in there so you can verify you got everything properly. Now, compiling Rust is still pretty slow: it took about an hour on my MacBook Air.

I also have a pull request in on Homebrew, so after that’s merged, you should be able to use homebrew instead of mucking about on the command line.

What's new

So what’s new in Rust 0.6? You can find a detailed list of changes here, and the commit list here. There were 2,398 commits by 17 authors, damn!

While we cannot promise that this is the last time there will be incompatible changes, the great majority of anticipated language-level changes are complete in this version. We expect subsequent releases before a beta and final 1.0 to be more focused on non-language-level work (performance, libraries, packaging and building, runtime system) with only modest language-level changes as we discover bugs and areas requiring residual polish (primarily in the trait system, macro system, and borrow check).

This is the biggest thing for me. The language is almost completely settled down. You can find the meta-bug which describes all of the things that have yet to be removed here.

Here’s some of my favorite changes from the release:

Trailing sigils on closure types such as fn@, fn~ and fn& were removed in favour of the more-consistent leading sigils @fn, ~fn and &fn. (More consistent syntax is always good)
The move keyword was removed; owned types are always passed and assigned by moving now. (It was sorta odd that move was needed in places where the compiler could just infer it. This removes a bunch of clutter in code that sent owned types into spawned tasks, for example.)
The fail and assert keywords were replaced with macros fail!() and assert!(). (I'm generally pro-remove keywords, add macros)
in all cases mutability is controlled by mutability of the owner (inherited mutability). (Read this section in more depth, but I think this really helps the visibility of the mutability rules for a struct).
impl Ty : Trait { -> impl Ty for Trait {. (pretty!)
the "main function" doesn't need to be called main anymore, you can use #[main] to change it.
Rust now supports using inline assembly through the asm! macro. (WEBSCALE!!!!111lolz)

Neat stuff!

Mozilla + Samsung

Mozilla put out a press release today called “Mozilla and Samsung Collaborate on Next Generation Web Browser Engine”.

Translation? Servo is a real project now. At least, that’s how I read it.

We are now pleased to announce with Samsung that together we are bringing both the Rust programming language and Servo, the experimental web browser engine, to Android and ARM. This is an exciting step in the evolution of both projects that will allow us to start deeper research with Servo on mobile.
n the coming year, we are racing to complete the first major revision of Rust – cleaning up, expanding and documenting the libraries, building out our tools to improve the user experience, and beefing up performance. At the same time, we will be putting more resources into Servo, trying to prove that we can build a fast web browser with pervasive parallelism, and in a safe, fun language.

Cool! If you’re not aware, Servo is a massively parallel browser rendering engine, written in Rust. I haven’t been covering it here on the Changelog because it’s been a purely research project, but I’m really excited to see it move forward.

changelog.com/posts

Opa - Event-driven, non-blocking, strongly statically typed web framework with JavaScript-like syntax

It seems our industry’s search for a unified theory of web development is resulting in a blurring of the line between client and server. Opa, the latest on our radar even goes so far to introduce client and server keywords:

 // Opa decides
function client_or_server(x, y) { ... }
 // Client-side
client function client_function(x, y) { ... }
 // Server-side
server function server_function(x, y) { ... }

Personally, Opa’s JavaScript-like syntax doesn’t get me excited. JavaScript’s strength lies in its ubiquity, not in its syntax.

Opa's web site is very well done and the source is on GitHub for your perusal.

changelog.com/posts

jsonpipe: Convert JSON to a UNIX-friendly line-based format using Python

As the web seems to be moving to JavaScript Object Notation over XML for data transfer, it’s nice to find tools to help you work with JSON from the command line. Changelog listener Jason Williams pointed us to JSON Pipe, a simple Python command line utility for visualizing JSON document structure.

Installation and usage

Install JSON Pipe via pip

pip install jsonpipe

Now we can pipe data into jsonpipe to see a breakdown of the structure of the document:

$ echo '[{"a": [{"b": {"c": ["foo"]}}]}]' | jsonpipe
/   []
/0  {}
/0/a        []
/0/a/0      {}
/0/a/0/b    {}
/0/a/0/b/c  []
/0/a/0/b/c/0        "foo"

How about inspecting an NPM package?

curl https://github.com/jashkenas/coffee-script/raw/master/package.json | jsonpipe

/ {}
/name "coffee-script"
/description  "Unfancy JavaScript"
/keywords []
/keywords/0 "javascript"
/keywords/1 "language"
/keywords/2 "coffeescript"
/keywords/3 "compiler"
/author "Jeremy Ashkenas"
/version  "1.1.0-pre"
/licenses []
/licenses/0 {}
/licenses/0/type  "MIT"
/licenses/0/url "http://github.com/jashkenas/coffee-script/raw/master/LICENSE"
/engines  {}
/engines/node ">=0.2.5"
/directories  {}
/directories/lib  "./lib"
/main "./lib/coffee-script"
/bin  {}
/bin/coffee "./bin/coffee"
/bin/cake "./bin/cake"
/homepage "http://coffeescript.org"
/repository {}
/repository/type  "git"
/repository/url "git://github.com/jashkenas/coffee-script.git"

Since the output is regular STDOUT, JSON Pipe is piping hot:

curl https://github.com/jashkenas/coffee-script/raw/master/package.json | jsonpipe | grep "homepage"

/homepage   "http://coffeescript.org"

JSON Pipe will show you the key/value structure of your JSON documents faster than you can say XPath.

[Source on GitHub]

Changelog & Friends #95

wsl.exe -- cat hello.cs

We bring you back to Microsoft Build 2025 to nerd out with Craig Loewen on Windows Subsystem for Linux and Mads Torgersen on leading the design of C#.

Matched from the episode's transcript 👇

Mads Torgersen: …although we’re trying to kind of get just a regular C# closer to that. C# used to be – like, as a member of that Java [unintelligible 00:54:52.19] H club, it used to be fairly clunky in terms of syntax… Like, declaring a virtual method, and overriding, and all that kind of stuff would, it’s a thing with a lot of keywords involved, and your program would be a main method that lives in a class… So you have five lines of code –

Changelog & Friends #91

When life gives you LLMs...

Our old friend, Zeno Rocha, returns to discuss email etiquette, the strange new world of AI SEO, the coming LLM enshittification, and SLATE Auto – the just-announced $20k modular EV truck.

Matched from the episode's transcript 👇

Adam Stacoviak: But how do you research the things you want to buy or consume or enjoy in the world? And I really feel like the place I go to learn… I’m more conversationally asking questions to this thing, versus just throwing in keywords into Google and hoping I get a web page that may help me out. I feel like the internet is dramatically changing as we speak insofar as how we find information, and I wonder how that will impact publishing of information. Because if you don’t go to the website anymore to get the info and the LLM just consumes it… In a case like Resend you don’t really care, because you’re just trying to get them to become a customer, and enjoy your product. But in the case of something else, you may really want them to come to your website, because that’s the value to your brand. It’s a captured consumer, whether they’re a curious person, an advocate, a customer, you name it. I just wonder how this is going to change things.

Changelog Interviews #637

Making DNSimple

Anthony Eden, Founder & CEO of DNSimple, joins the show to talk about the world of managed hosting for DNS and more.

Matched from the episode's transcript 👇

Anthony Eden: So yeah. I mean, I know how we do it. I’m not going to say we do it exceptionally well, but we do it well enough to stay in business and continue that slow growth. But the competition is fierce, on both sides; on the operational side and on the domain side. But especially on the domain registration side. If you’re into looking at keyword prices and things like that on advertising, go look at some of the keywords around domain registration. They’re some of the most expensive keywords in the world, because the competition is just fierce. And those companies are buying Super-Bowl ads; they’re competing at that level. We can’t compete at that level, so we have to instead compete at a level that’s more appropriate for us.

Changelog & Friends #77

Fallthrough & Friends

Kris Brandow & Matthew Sanabria from Fallthrough.fm join Jerod to discuss tools we’re switching to, whether or not Go is still a great systems programming language choice, user-centric documentation, the need for archivists & more.

Matched from the episode's transcript 👇

Matthew Sanabria: It reminds me of like your favorite detective series or whatever, where they have to go into the evidence room and find evidence, and it’s tagged very well, the case files are there, you can find them, they’re dated, there’s keywords, there’s identifiers… And it’s like, that’s pretty decent, actually. You can find what you need, relevant to the things you’re working on. And like what Kris was saying, we’ve digitized a lot of this stuff and we kind of just forgot to add that metadata. We just kind of left that behind. And it’s actually more important than ever because we’re generating so much digital content that now we have the problem of finding it. It’s like, why did we do this to ourselves?

Changelog & Friends #76

Other people's robots

Jerod & Adam discuss Nvidia’s recently announced personal AI supercomputer, Waymo’s latest infinite loop, what’s involved in getting a “modern” terminal setup, and whether or not AI has gone mainstream… warts & all!

Matched from the episode's transcript 👇

Adam Stacoviak: I’m pretty sure it’s Friday, but you can throw some keywords in there. Friday, recent update…

JS Party #344

Kind of a big deal

Jerod & the gang play “Twenty” Questions to get to know Amy, review the big Svelte 5 release, discuss commercial open source & get Nick’s report from SquiggleConf!

Matched from the episode's transcript 👇

Kevin Ball: So I have not used it enough to have a strong opinion. I do think – there are a couple of things I like about it, which is, one, they continue to sort of lean into this idea of “Hey, we control the compilation stack, so that means that we can extend the language in ways that are beneficial for developer ergonomics.” So I think that is nice.

They are moving – I like the fact they’re moving to this more granular reactivity. It feels like everybody’s moving to signals as the way to do that. That’s just sort of become the primitive that people are saying “Oh, this is the way to get high-performance reactivity. We’re going to go to signals.” So they’re clearly adopting that.

[00:32:06.04] I’m a little torn on the “We’re building it into the language and we’re going to keep extending the set of keywords that are magical.” I feel like that overall is not my favorite trend. I think it’s nice to have a small language surface area and then be extending things, but if you are having the compiler do magical things in some ways, you’re already doing that anyway, so you may as well just kind of be explicit about it and say “Hey, okay, these things - they exist. They’re magical. Go.”

Practical AI #292

Big data is dead, analytics is alive

We are on the other side of “big data” hype, but what is the future of analytics and how does AI fit in? Till and Adithya from MotherDuck join us to discuss why DuckDB is taking the analytics and AI world by storm. We dive into what makes DuckDB, a free, in-process SQL OLAP database management system, unique including its ability to execute lighting fast analytics queries against a variety of data sources, even on your laptop! Along the way we dig into the intersections with AI, such as text-to-sql, vector search, and AI-driven SQL query correction.

Matched from the episode's transcript 👇

Adithya Krishnan: Yeah, that’s one of the exciting aspects of DuckDB as well. So if I could take a step back and think about other ecosystems where let’s say Postgres has been shining a lot… Postgres has exploded into the kind of possibilities that you can do because it has an amazing extension mechanism, where you could add extensions and capabilities of Postgres. And in a similar way, DuckDB has an extension mechanism that you have access to the internal workings of DuckDB, and you could add more workflows on top of what DuckDB can do.

DuckDB has these capabilities of doing vector search, for example, and it also has hybrid search, where you also have full-text search, and vector search that you could put together to create hybrid search. One of the ways it does is that it has a really nice data type. I can go into the rabbit hole of the inner workings of how they make this happen, which is also pretty exciting… But one of the things that they make this possible is to provide an array data type where you can have an array of floating points, and then you can store this as a data type, and then that eventually becomes an embedding vector that you can do cosine similarity against.

So that is to do an embedding-based search. Then you can also have full-text search, where you can create an inverted index of keywords to your documents, and you can search across your keywords to find your ideal documents and rank them according to the score. And then you could fuse both of these scores from embedding search and from full-text search to have like a hybrid search. So yeah, so all of these are possible, and they’re very accessible.

Break: [00:30:44.26]

Practical AI #288

GraphRAG (beyond the hype)

Seems like we are hearing a lot about GraphRAG these days, but there are lots of questions: what is it, is it hype, what is practical? One of our all time favorite podcast friends, Prashanth Rao, joins us to dig into this topic beyond the hype. Prashanth gives us a bit of background and practical use cases for GraphRAG and graph data.

Matched from the episode's transcript 👇

Prashanth Rao: [00:24:25.23] Absolutely. So let’s understand what we were doing with, RAG and then go into graph RAG. So the early approach to doing RAG is – we call it naive RAG now. And in that approach, you just create chunks of your data, you embed that using an embedding model, and you store them in a vector database. So essentially, you just store the chunks on the chunk embeddings in a vector database, and when you do a retrieval, you convert your query into an embedding model, using the same embedding model that you used to embed the data. And this returns the most similar chunks, that are similar to the query vector.

So you typically return like the top K, let’s say top 5 or top 10, whatever number you choose. And these top K chunks can then be sent to the LLM as context to synthesize a response in natural language. So in a nutshell, that’s kind of what you could say traditional RAG does.

Now on paper, this naive approach to doing RAG is great. But it quickly became obvious that this has limitations. The first limitation is that the dense embeddings are typically done at the sentence level. And many user queries use keywords. And keyword-based search methods like BM25 can do a fair job at this and they’ve been around for a long time.

So towards the end of last year, you could see a lot of these vector database vendors starting to offer a combination of hybrid search methods, and the term hybrid search itself becoming more popular, where you perform both keyword-based search, which is a form of sparse vector search, with dense vector search, which is a search via dense embeddings. And you pass the retrieved chunks from either of these approaches to re-ranker module. So you had specialized modules that do re-ranking, that give you the most relevant chunks from either of these retrievals. And this is how you combine the sparse and dense vectors into what you call a hybrid search.

Now, even hybrid search can have its limitations, which is, I guess, where people began exploring further options earlier this year and maybe beyond. Because neither sparse nor dense embeddings can capture explicit relationships between entities very well. And I’ll demonstrate this with an example. In certain cases, you can really benefit by modeling some of these entities explicitly.

So let’s look at an example of a professor and let’s say the PhD students the professor is advising. So let’s say you had a block of text, which is talking about the students and the professor and a bunch of other things related to their work in the university. So in natural language, we understand the relationship between the professor and the student as follows. Student X worked with Professor Y, because we know that the act of being a student of a professor means that you worked with them. But in the text itself, you may not have expressed it that way. The text may be written as so and so, person X, was a student of person Y. Now, if you try to search this using the query “Who did X work with?”, this is a very intuitive question in natural language. We humans immediately can put two and two together, that “work with” and “student relationship” are more or less semantically similar here… So we are able to piece together this information and know that a person was a student of someone, and inherently they worked with that person. However, if you try to search for this using vector search, the dense embedding may not capture the relationship correctly, where “student of” isn’t close enough to “work with” in the vector space.

So your vector search alone may not retrieve this answer, because you didn’t model the relationships in that explicit way. However, if you had chosen to model this as a graph, you would explicitly capture this relationship using this concept of a triple, which is “person X worked with person Y.”

[00:28:03.21] So this is where triples come in. A triple essentially is two nodes that are connected via a relationship. You have a source and a target, and the person X is a source, person Y is a target, and the “worked with” is what represents the relationship.

So the very powerful idea here is that where graphs come into this whole picture and why it’s relevant to RAG is that you can actually provide additional valuable context to an LLM by modeling these relationships explicitly, and simultaneously retrieving, both from a dense embedding vector search, as well as a graph traversal. And then using the retrievals in combination with one another to provide additional context to the generation LLM, so that you can actually include this explicit relationship in your answer. And this actually has been proven in practice from some work that’s been done recently.

Changelog Interviews #608

Building customizable ergonomic keyboards

Erez Zukerman shares the story of launching the ErgoDox EZ on Indiegogo (May 2015), what it takes to create customizable ergonomic keyboards, the benefits of split keyboards and custom key layouts, repairability and longevity, community engagement, and the attention to detail required in everything they create. We talk through their keyboard lineup, our personal experience with how we mouse and keyboard…we cover it all.

Matched from the episode's transcript 👇

Erez Zukerman: No, no. The normal keywords are –

Ship It! #119

The diagram IS the code

What if your infrastructure diagram was responsible for the actual infrastructure?! John Watson & Scott Prutton from System Initiative join Justin & Autumn to discuss.

Matched from the episode's transcript 👇

Autumn Nash: I feel like certain keywords, and stuff… I’m just like, if you’re hurting little kids, or like doing horrible things, it should just be fair game.

Changelog & Friends #53

There’s a TUI for that

Nick Janetakis is back and this time we’re talking about TUIs (text-based user interfaces) — some we’ve tried and some we plan to try. All are collected from Justin Garrison’s Awesome TUIs repo on GitHub. This episode is “AI free.”

Matched from the episode's transcript 👇

Adam Stacoviak: No, that’s not selling out. That’s leveraging your channel, bro. You’re missing out. There’s good content waiting for you to build your next machine. All the choices you’ll make as a Bash scripter, a Vim master, a Docker dude, whatever you want to call yourself. I’m just looking at your – I’m looking at all your keywords on your YouTube channel… You’re missing out, man. Build yourself a new machine, use this promotion, get some friends, get some network… Boom.

Go Time #323

Aha moments reading Go's source: Part 1

Jesús Espino from Mattermost tells Natalie all about (the first six of) his 10 “aha moments” he had reading the Go source code. Part 2 (with the rest of his aha moments) coming soon!

Matched from the episode's transcript 👇

Natalie Pistunovich: [00:28:03.07] And you say that what you liked more than that, this is like defining, scoping it for this file, it’s kind of like scoping it for Go in general, that this is your entire toolbox, and there will be no surprises. It’s not keywords, but a toolbox, really.

Practical AI #277

Vectoring in on Pinecone

Daniel & Chris explore the advantages of vector databases with Roie Schwaber-Cohen of Pinecone. Roie starts with a very lucid explanation of why you need a vector database in your machine learning pipeline, and then goes on to discuss Pinecone’s vector database, designed to facilitate efficient storage, retrieval, and management of vector data.

Matched from the episode's transcript 👇

Daniel Whitenack: Just to draw that out a little bit more… So from your perspective, what would be – if you were to kind of explain to someone “Hey, here I’ve got one piece of text, and I’m wanting to match to some close piece of text in this vector space”, what might be advantageous about using this vector-based search approach and these embeddings, in terms of what they mean, and what they represent, versus doing like a… You know, TF-IDF has been around for a long time; I can search based on keywords, I can do a full-text search… There’s lots of ways to search text. That concept isn’t new. But this vector searches seems to be powerful in a certain way. From your perspective, how would you describe that?

Changelog Interviews #597

MAJOR.SEMVER.PATCH

Predrag Gruevski and Chris Krycho joined the show to talk about SemVer. We explore the challenges and the advantages of semantic versioning (aka SemVer), the need for improving the tooling around SemVer, where semantic versioning really shines and where it’s needed, Types and SemVer, whether or not there’s a better way, and why it’s not as simple as just opting out.

Matched from the episode's transcript 👇

Chris Krycho: I think the answer is yes. And one of the reasons that I’m more bullish on sticking with SemVer and putting tooling around it is because I did survey the rest of the world as it were when it comes to versioning, and there are a lot of approaches that just say “Ah, these problems with SemVer are fundamental. Scrap it.” One of them, SoloVer, is just have one version number - it’s 1, 2, 3, etc, just go up. And that has a certain appeal to it, but the actual fundamental issue there hasn’t changed. All it does is take the burden off of the maintainer of a library, and put it on all of the users. It says “Okay, now you’re responsible anytime any one of your dependencies changes, including transitively, anywhere in your dependency tree. So go read the release notes”, which tend to encode things like breaking changes in the release notes. Because again, communication problem, right? We want to know “What did this do?” Also, as an aside, all of those proposals include things like “Well, you can also stick like pre-release numbers on the end.” And I’m like “Hold on, hold on… It seems kind of like we’re backing our way back toward this whole SemVer thing now, aren’t we? Shouldn’t your pre-release just be another number?”

I think there is a sense in which there is a maybe fundamental local maximum. Maybe it’s local, but the hill’s so big that we’re not going to find a different path. I could be wrong about that. But when I go looking around, the things that seem like they might change the calculus here don’t so much eliminate the value of SemVer as they do build on it. So a good example here is what the Unison programming language does. Pretty small language, but it is aimed at industry. It’s not pure research. And they do something that’s really wacky, in the best way. You don’t store your code as plain text. Instead, they take advantage of the fact that they’re a pure functional programming language, with really well specified semantics, and they say “Okay, we can take your code, normalize it, hash it, and store the compiled output of it with a pointer to it”, which means a whole bunch of interesting things… But for the purposes of versioning means when I make it breaking change, the original version is still there, because that hashed, compiled version of it got committed to a database instead of to plain text. And that database version is what anybody who depends on it sees.

[00:34:06.12] So when I add a new parameter to my function, the consumers are still pointing to the old function, which means they can pull this update and say “Okay, I can progressively switch over to the new function signature, but I can do that at will, and the two can live next to each other”, and because it is a pure functional programming language with no side effects that aren’t managed off in the runtime, etc, etc. You know, leave all that aside. Suffice it to say because of that choice, they can just “ship a breaking change” without ever breaking anyone.

The reason you still want SemVer here though is because SemVer is a communication tool. And so SemVer lets you say, “Okay, there are these new features in the library. Here’s a bug fix. You’re going to want this one.” And even though that means you need to actually go update which compiled version of this function you’re pointing to, you’re getting data from that, and when you go to publish your library, you want to be able to use that information. Even knowing that it’s not going to break your users in the same way, it does let you then say “Oh, I didn’t actually mean to make a breaking change here. I wanted this to be compatible and to just keep working forward.”

So things like that, I think, are pointers in the right direction. There’s also a couple of papers out there from folks at the Nova University of Lisbon, who are asking “What happens if you bake versions as types into Java?” Java because it’s the kind of default language to do this kind of research on. Their proposal is very interesting from a type theoretic and versioning perspective, and would never get adopted in industry in a million years, because it’s just way too much boilerplate… But it does the same thing we’re talking about; it bakes this notion of backwards compatibility in, in a way that I think if you were going to actually ship something like that in an industrial programming language, you would actually want SemVer as basically how you do it. And their type system that they slap on top of Java, effectively encode SemVer with keywords. It’s upgrades, and replaces, and things like that.

So I think there’s work to be done here, but I don’t think it’s going to be in the near term, for one. So we’re going to need the tooling. And even if and when we see something like that type system on top of Java, or what Unison is doing, becoming more widespread, I think those kinds of things lower the risks in really interesting and important ways… But they would still really benefit from the kinds of tooling that we’re talking about. They also though highlight, I think, one of the things that’s easy to miss in these kinds of discussions, which is a lot of times people like me, who are type theory nerds, etc. like to go looking for that kind of a solution to a problem. It has two limitations. One is that’s never going to work for Ruby. I say “never”, but you could imagine a world in which type adoption for Ruby is at 100%, but that world seems very unlikely to me… Not least because a lot of people who love Ruby love it because it’s dynamically typed.

And second, doing all of those things purely at that, like “Let’s bake it into the type system level” has costs, because it turns out that itself then becomes a thing that you need to think about in terms of the versioning of your language. Because one of the things that shows up is that the more foundational, whatever your tool is - like, if you’re an app, and you just have consumers, it’s not that big of a deal. Your versioning is basically purely marketing. If you’re a library, you have a bunch of apps that use you and maybe some other libraries. If you’re a framework that everybody else builds on, how well you do this now affects everybody else in the entire ecosystem. If you’re a programming language, you’re kind of doing it at the maximum level, and you still have to communicate those versioning constraints to other people. And the more complicated your type system is, the harder it is to actually understand what the implications are for versioning.

[00:38:09.28] So the tendency that people like me have, to say “Ah, bake it into the types, and it’ll be rigorous and checked forever” can actually undermine your net goals here, because now you’ve made it harder to think about this fundamental communication problem.

Practical AI #275

Apple Intelligence & Advanced RAG

Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

Matched from the episode's transcript 👇

Daniel Whitenack: Those might both factor in. There’s another kind of hybrid or two-level type of search that happens, and this is implemented in several different vector databases, even natively now, because it can be quite useful… Is actually doing two levels of searching, but the first which is a traditional full-text search or keyword search, and then a vector comparison, rather than just relying on the vector comparison. So you kind of hone in on the full text kind of keywords first, and then do a vector comparison.

[00:40:38.00] And you could even ensemble these in various ways, and use one for re-ranking or ordering versus the other one… There’s a variety of ways to implement this. But this would be kind of generally categorized as hybrid ways of searching, I think is most frequently the term. So there’s the context enrichment, there’s the hierarchical search or index retrieval - that’s the kind of summary, then chunk - and then there’s the hybrid search, which would be actually using two different search methodologies.

And notice all of this has to do with the retrieval part for the most part that we’re talking about here, not mostly the LLM side… Although you could use an LLM to generate the summaries for the hierarchical approach. So it’s interesting that those TFIDF, keywords searching, full-text search sort of things are coming up again… So back to our original way we started this episode, the data science pieces still survive, in many ways.

Go Time #320

Gophers Say! GopherCon EU Berlin 2024

Our award ~~winning~~ worthy survey game show is back, this time Mat Ryer hosts it live on stage at GopherCon EU Berlin 2024! Join in & play along as we see which team can better guess what these GopherCon gophers had to say!

Matched from the episode's transcript 👇

Mat Ryer: Well, actually, we aren’t necessarily, because it’s basically whatever people said. So they don’t have to answer with keywords at all.

Changelog & Friends #49

Where DOESN’T curl run

Daniel Stenberg shares his guiding principles for BDFL’ing curl, gives us his perspective on the state of the internet, talks financial independence, ensuring curl won’t be the next XZ & more!

Matched from the episode's transcript 👇

Adam Stacoviak: Absolutely. And you could use an LLM to generate those, too. Like “Hey, I have a business in XYZ sector. Can you give me keywords?”, and it will give you keywords all day long.

Ship It! #109

How to build a Nushell

Devyn Cairns & Jakub Žádník join Justin & Autumn to talk about building a new kind of cross-platform shell that provides easy extensions with traditional command compatibility. That’s no easy feat!

Matched from the episode's transcript 👇

Jakub Žádník: Yeah. Each command has a search term, so like related keywords that are searched by the help if. So even if it’s not the command name directly, if it’s a related term, it will show up.

Go Time #319

Is Go evolving in the wrong direction?

This week we’re catching up on the news! Kris is joined by Ian to discuss some of the recent news from around the Go community. Listen in to hear whether the co-hosts believe there’s software that shouldn’t be written in Go, their thoughts on if Go is evolving in the right direction & whether common nouns make good package names.

Matched from the episode's transcript 👇

Ian Lopshire: And there’s been so many times I’m like “I would like to continue here, please.” So I honestly do think it’s going to be simpler overall, just being able to use your normal semantics and keywords.

Changelog Interviews #594

Microsoft is all-in on AI: Part 2

Mark Russinovich, Eric Boyd & Neha Batra join us to discuss the state of AI for Microsoft and OpenAI at Microsoft Build 2024. It’s safe to say that Microsoft is all-in on AI.

Matched from the episode's transcript 👇

Adam Stacoviak: And the power of a good name, obviously, and the power of a good description is probably equal. Every time I come up with a podcast show summary, I’m always like “How do I do it?” And now we use Riverside. Not here in Seattle, but when we’re in our distributed studios, we use riverside.fm… And when we’re done with that, we can just hit “Summary notes”, and it summarizes the podcast, it gives us keywords that were in there, it helps with some chaptering information, like what are we talking about at each point… So even when we’re editing and doing chaptering, we can define that kind of stuff. That to me is like paramount for just not burning out.

Changelog Interviews #581

It's not always DNS

This week we’re talking about DNS with Paul Vixie — Paul is well known for his contributions to DNS and agrees with Adam on having a “love/hate relationship with DNS.” We discuss the limitations of current DNS technologies and the need for revisions to support future internet scale, the challenges in doing that. Paul shares insights on the future of the internet and how he’d reinvent DNS if given the opportunity. We even discuss the cultural idiom “It’s always DNS,” and the shift to using DNS resolvers like OpenDNS, Google’s 8.8.8.8 and Cloudflare’s 1.1.1.1. Buckle up, this is a good one.

Matched from the episode's transcript 👇

Paul Vixie: So any company who comes into the internet and says “Yeah, we want to deliver value”, it’s like, they’ll look around for opportunities. Well, what’s not working well today? Sometimes their solution will just be “Let’s relax a constraint”, and then it will be the company you go to. And a lot of people have come in with online services, for example, that used to be enterprise services. For example, if we think about Dropbox, or any of the file service companies - we all used to just pile on hard drives, and plug them into a lot of servers, and so forth. But it turns out, for a lot of what you need storage for, you don’t care where it is, and you don’t mind that you have to go across a wide area network to get to it, and you’re happy that they’re backing it up instead of you, and so forth. So there’s a lot of value to be created that takes the form of [unintelligible 01:04:45.00] or just simple disruption. And that’s not a bad thing. In fact, had we gone the other way, had TCP/IP not won the war, had we been on the OSI protocol suites as developed by the phone companies, none of that would be possible. We’d only be able to do the things that they wanted us to do, whereas the internet is designed to kind of let you try almost anything. It’s so called permissionless innovation, as we’ve been [unintelligible 01:05:12.24]

So one of the things that got done with DNS was done by OpenDNS. And that was to say “You know, people hate their ISP DNS service” or “They hate something about DNS, and so we’re going to create a global anycast DNS service, OpenDNS, so that anybody in the world instantly stop using their own enterprise DNS, or their ISP DNS, or any other DNS, just use us, and we will be more reliable. We won’t data-mine their queries to figure out where they’re going, and send them ads…” So they actually did that for a while. “We won’t block things that – we’re not a nanny state, we’re not gonna say “No, you can’t reach this, because it might be harmful in some way”, although there’s always somebody out there being harmed in a lawsuit [unintelligible 01:06:10.21] And there are some costs there. But they just wanted to centralize something that used to be distributed. And it worked really well. But you know, they were growing a for profit company, and they needed to figure out “Okay, we’re here, we have a lot of users. How do we monetize this thing?” And so they did end up – they did this strange thing, they intercepted queries for www.google.com, and instead of getting back the real address, which would be the Google web server, they gave back their address, of their website. And it did not falsely indicate that it was Google. It said “This is the OpenDNS search engine.” And then you would type something into the search bar, just the way we would anyway, and they didn’t have a search engine; they couldn’t answer it. All they would do is then forward that question onto Google, and then [unintelligible 01:07:08.03] the response back toward you. But it gave them an opportunity to associate your interests to keywords that denoted your interest with your IP address. And then they sold that data to advertisers, so that when you then later reached some web server, that web server could ask the question “Hey, this IP address. Tell me what they’re interested in.”

Now, you might be able to imagine that Google wasn’t super-happy about this, and they even went so far as to say “Hey, stop.” But story is that people at OpenDNS said “You know, there’s no law that protects you in this way. We’re not breaking any law [unintelligible 01:07:51.25] getting back the wrong answer. And we’re certainly not costing Google any money, because you’re receiving every bit of query data that you would otherwise have received. So Google is still going to be able to make its old business plan work.”

[01:08:05.04] Somebody at Google probably said “Yeah, but we didn’t want you to get free access to the thing that we monetize. So we don’t want you to be an intermediary here.” But OpenDNS was resolute; they were not going to stop. And that, in my opinion, is why we have 8.8.8.8 today. It’s the only way Google could prevent OpenDNS from continuing to intermediate itself between Google and its search customers was that Google had to build a bigger, more popular system. Once they did it, it was inevitable that we have 9.9, and 1.1… You know, if you think about it, the IP version 4 protocol is 256 octets in that first octet, so there are maybe 250 more companies who are going to get out there and try to get 11.11, and 12.12, and all the rest… Because if you can put yourself in the middle of DNS queries, then you can learn a lot. And then you can take that learning, and even if you’re totally privacy-respecting - which according to their stated privacy policy, Google is, and I have no reason to doubt it - you can still learn a lot that is not privacy-violating. And so why wouldn’t everybody and his brother try to create a system that would cause millions of people to send them their most vital information, which is what they’re working on and what they’re interested in.

Okay, so let’s fast-forward… You’re asking “Why is it always DNS?”

JS Party #314

Take a look, it's in a book

Nick delves into the intricacies of technical book writing with authors Adrienne Braganza Tacke and Dylan Hildenbrand. We talk about the process of working with a publisher, coming up with an outline, actually writing the book, and everything that comes after the book is finished.

Matched from the episode's transcript 👇

Dylan Hildenbrand: So I’m a Vim user, and it was incredibly painful for me to not be using Vim. With Packt they had a SharePoint instance setup, where I could then upload Word documents. They also had an Office365 in there, so I could write in the cloud. Part of the contract that I’d signed though stipulated that should either of us part ways with the other, any chapters written up to that point would then become my property. And so I wanted to make sure that I had local copies of everything, should anything happen. Not that I was saying things were going to, but if for whatever reason one of us had to back out of the deal, I still wanted to take the work that I had and be able to publish later at my leisure.

I’m a Linux fanboy, and so I used Libre Office, and I was also given style guides that I needed to follow… The style guides are there to – I was told that I can ignore them if I wanted, but I think it made a lot of the editing process a lot easier, for both myself and the editor, because things were… You know, keywords are bolded, that can later be looked up in the appendix; chapter titles are highlighted appropriately, and subsections are also laid out in a way that I think that they will look best for the reader. And when it comes to code, I didn’t want the code to look hacky, right? I wanted the code to be nicely indented, and easily readable, because one of my frustrations, like I said earlier, was reading other people’s code can be painful. So I took a lot of care and effort to make sure that the code was properly formatted and followed the style guidelines. But ideally, it would have had some sort of Vim markdown with a git repository setup. So I think if I ever write another book in the future, that’s something that I’m going to push hard for before I sign any contracts. [laughs]

Practical AI #258

Representation Engineering (Activation Hacking)

Recently, we briefly mentioned the concept of “Activation Hacking” in the episode with Karan from Nous Research. In this fully connected episode, Chris and Daniel dive into the details of this model control mechanism, also called “representation engineering”. Of course, they also take time to discuss the new Sora model from OpenAI.

Matched from the episode's transcript 👇

Daniel Whitenack: Yeah. And really cool – of course, there were some major categories of interest, in doing hardware things with robots, and other stuff… But of course, one of the main areas of interest was AI, which was interesting to see… And in the track that I was a judge and mentor in, one of the cool projects that won that track was called Meshworks. So what they did - and this was all news to me; well, some of this I learned from the brilliant students… But they said they were doing something with LoRa. And I was like “Oh, LoRa…” That’s the fine-tuning methodology for large language models. I was like “Yeah, that figures… People are probably using LoRa.” But I didn’t realize – and then they came up to the table, and they had these little hardware devices; then it clicked that something else was going on, and they explained to me they were using LoRa, which stands for long range… It’s these sets of radio devices that communicate on these unregulated frequency bands, and can communicate in a mesh network. So like you put out these devices, and they communicate in a mesh network, and can communicate over long distances for very, very low power. And so they created a project that was disaster relief-focused, where you would drop these in the field, and there was a kind of command and control central zone, and they would communicate back, transcribe the audio commands from the people in the field, and would say “Oh, I’ve got a injury out here. It’s a broken leg. I need, help”, whatever. Or “Meds over here. This is going on over here.” And then they had an LLM at the command and control center parsing that text that was transcribed, and actually creating, like tagging certain keywords, or events, or actions, and creating this nice command control interface, which was awesome. They even had mapping stuff going on, with computer vision trying to detect where a flood zone was, or there was damage in satellite images… So it was just really awesome. So all of that over a couple day period. It was incredible.

Changelog Interviews #576

In the beginning (of generative AI)

This week on The Changelog we’re talking with Joe Reis about data engineering and the beginning of generative AI. We discuss phone hacking via frequency, the role of a data engineer, this AI hype cycle we’re in, build vs buy, the disconnect between data analysts and the business, ethical considerations around AI-generated content, and more. We also discuss the tension between AI and traditional engineering, as well as the inevitability of AI integration into pretty much everything.

Matched from the episode's transcript 👇

Jerod Santo: We didn’t know exactly what to call it… I think we settled on that. And then we did the comma-separated list of keywords behind it. So it was called Practical AI, colon…