Amal Hussein (Engineering Manager at npm) joined the show to talk about AST’s — aka, abstract syntax trees. Amal is giving a talk at All Things Open on the subject so we asked her to give us an early preview. She’s on a mission to democratize the knowledge and usage of AST’s to push legacy code and the web forward.
Featuring
Sponsors
All Things Open – Exploring open source, open tech, and the open web in the enterprise. Raleigh, NC — October 13-15, 2019
Linode – Our cloud server of choice. Deploy a fast, efficient, native SSD cloud server for only $5/month. Get 4 months free using the code changelog2019
. Start your server - head to linode.com/changelog
GitPrime – GitPrime helps software teams accelerate their velocity and release products faster by turning historical git data into easy to understand insights and reports. Ship faster because you know more. Not because you’re rushing. Learn more at gitprime.com/changelog.
TeamCity by JetBrains – Build and release your software faster with TeamCity — a self-hosted continuous integration and delivery server developed by JetBrains. TeamCity is super-smart at running incremental builds, reusing artifacts, and building only what needs to be built, which can save over 30% of the daily build time. Learn more at teamcity.com/changelog.
Notes & Links
We’ll be at All Things Open! We’re hosting a LIVE JS Party on stage and Jerod Santo is giving a talk on using Svelte for a radical new approach to building user interfaces. And as a special thanks from the team behind All Things Open, we’re giving away 5 free passes to the conference. All you have to do tweet “I want a free pass to All Things Open because…” and state your reason and copy @Changelog and @AllThingsOpen in the tweet.
We’ll DM the winners next Friday, September 27th — good luck!
For those who don’t want to wait and just want 20% off your ticket right now — use the code changelog20
when you buy your tickets. This code has UNLIMITED uses, so tell your friends! Head to allthingsopen.com to learn more and register.
- Amal’s talk at All Things Open — Machine powered refactoring: leverage AST’s to push your legacy code (& the web) forward
- AllThingsOpen.org
- AST Explorer
- StranglerFigApplication
- Abstract syntax tree on Wikipedia
- The Web Platform podcast
- Jscodeshift
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
Amal, thanks for joining us. First of all, congratulations for your first week as engineering manager at npm. It’s bittersweet. Tell us, what’s new here?
Thanks so much, Jerod and Adam. Hi everyone, my name is Amal Hussein. I am a new engineering manager at npm; it’s my first week. I came at npm via Bocoup, where I was an Open Web engineer, working on some pretty awesome stuff in terms of web conformance suite testing, with browser interoperability, as well as working most recently on GameBender, which is a Scratch-based game console which uses computer vision and all this other cool stuff, all Open Web APIs, to teach kids how to code creatively.
So that’s what you were doing at Bocoup, or that’s what you’re doing now at npm?
That’s what I was doing at Bocoup. I was doing a lot of work around products, I would say product engineering, and it became very clear to me that I needed to boss up a little bit… Because I was just really strong at managing up, sideways, down for a pretty large project; I was a tech lead for that project. I just am stepping into my love of product by being an engineering manager, which combines for me the best of both worlds… You’re able to be hands-on with the team and drive technical strategy, and you’re also able to work with all of the stakeholders that are involved in the software delivery process. It’s something that I really enjoy doing. I’ve kind of consistently been the go-to person at every team, at every company, for a variety of things…
[04:11] It was a really difficult decision to make, if I’m honest. It was very, very difficult. I identify as a woman, and as a person of color, and for me to walk away from the full-time responsibilities of delivering software, just that aspect, it was a very difficult decision. But I realize that there’s even less of me in engineering leadership, and so… You know, that’s where I think I get some kind of solace. I’m giving folks an opportunity to have a woman of color as a manager, which is a very rare thing for most people in our industry.
Well, that’s awesome. I will say congratulations, and good luck, because you’re just getting started; I hope you have a lot of success there. Boss up - it was time to boss up, I like that.
Thank you.
I like that, too.
It was time to boss up, and own my bossiness too, which is…
[laughs]
It’s something you just have to take a step back and realize “Hey, I can do this.” It’s quite simply that, and I think a lot more folks from our industry need to make the hard decision that I’ve made, because there’s a ton of really bad managers, and folks who really don’t focus enough on mentoring, or don’t focus enough on just the overall technical strategy.
I’ll tell you what, being a leader is one of the toughest positions, because you get criticized, scrutinized, not only by yourself - which is where it usually begins - but then also from the externals; people who don’t even know you will criticize you. And then people who really know you also criticize you, so…
Everybody’s a critic.
Being a leader is tough. It’s a really tough position. And that one in particular that you mentioned, interfacing with so many stakeholders - it really requires somebody who’s very empathetic, can see all sides… Put them in positions of everyone else’s position to drive the ball forward and take nothing personal. Or at least try to.
Yeah, I agree wholeheartedly with your analysis there. There’s a great quote, “Heavy lies the head who wears the crown”, or something like that… There’s a lot of freedom you get in a leadership role, where there’s a lot of autonomy, you’re able to drive decisions and really make an impact for good or for bad, but with that comes a lot of responsibility, and one of those is taking responsibility for failures, or missed opportunities.
I think what’s interesting at npm about this is – you know, I’ve always had a dream of being a toolmaker; tooling, that’s kind of my stuff, it’s my jam. I’m always into architecture, infrastructure, how things connect… I’m very much an in-between person. When I worked on server-side code, middleware was something that was interesting to me, because of the intersectional nature of it. So at npm, in many ways, I’m fulfilling my life-long dream of being a toolmaker. And I think as an engineer that’s a toolmaker, we have the toughest customers, because people are relying on us to then do their jobs, and make their magic happen. So there’s this extra layer of not only scrutiny, but also really – we’re the toughest customers, software engineers… We’re the toughest customers, because we can make the thing that we’re using if we really sat down.
[08:15] And sometimes you do. Sometimes you make your own thing, because somebody else’s thing isn’t good enough… And you’ve got two things.
Right. Is anybody else’s thing ever good enough? Let’s be honest… [laughs] So yeah, if you could wave a magic wand and have the skills to write your own IDE in a day or a week, I bet you would; because you want it your way. So there’s an arrogance and there’s a pickiness in our industry… And much of that I think is to be expected. We have really hard jobs, because ultimately, the engineers that criticize you as a toolmaker that serves them - those same engineers are also criticized by their users and customers; so ultimately, they’re also being judged. It’s like an exponential judging chain. [laughter]
What’s interesting there is that contentment is often the enemy of progress. If you’re content, you tend to not wanna progress and get better. So then you have this idea of discontentment sort of like becoming a norm in our industry, where in some cases discontentment is sort of frowned upon. To be discontent is just not a good position to be in, I suppose.
Because it breeds envy and jealousy.
Right, right. So as an industry, just based on the desire to progress - which we all want to, because that means that our tooling gets better, our software gets better etc. if we have to live lives of discontentment, I wonder how that really impacts us psychologically in our industry.
Yeah, I think that is a topic that I would like to dive into. Not right now, right here, but definitely in the future. The intersection of psychology and all of the pressures that are on us as engineers, and the continuous improvement, continuous change… I wish we had more cultural anthropologists that were studying technologists, because I think there’s a lot of really insightful behavior, and just insightful things in general that are probably very unique to our industry… And how those things kind of play out on our lives, outside of the terminal - that’s another really interesting story.
“Beyond the terminal”, I like that.
Yeah, beyond the terminal. Buy that domain now, if there isn’t already one.
[laughs] It sounds like a podcast.
Since you’ve mentioned your desire for this, it sounds like a podcast we’re actually creating, called Brain Science.
That’s true.
Oh, that’s dope.
In the pre-call I think we mentioned –
I should be on your podcast.
You know, we’re actually taking gusts sometimes soon. We wanna dive into this. We’re exploring the inner workings of the human brain to understand things like behavior change, habit formation, mental health… And basically what it means to be human. Brain science applied. Not just what do we know about the brain, but how can we apply what we know about the brain to transform our lives and better our lives… And some of that is this anthropologist-type approach towards our industry.
Yeah, I’m really happy to hear that. There’s a major at my college that was called Society Technology and Policy. If I was 20 years older when I went to school, I feel like that’s what I would have done… I would have probably done that as a double major, because for me, I consider myself a very intersectional human, because of a variety of things; not just my family background and life experiences, but even just my interests within the industry. I’m an engineering manager - that job is hugely intersectional.
[12:12] So I think that’s a super-relevant thing to explore… And what the effects of that are moving forward, as we progress in this new and unchartered territory of the digital age.
Yeah.
Absolutely.
Well, I’ll add one more layer to that. We often look at the internet as in like so many years, like being a teenager. What is it - about 20 years old now, -ish…?
I remember a couple years ago it got its driver’s license, so I think it’s probably around drinking age, like 21 in the U.S. [laughs]
I know that software’s been around longer than that, but that would mean that in a similar way engineers in that era are similar in their maturity level. Not so much individually, but corporately.
Meaning that we’ve been doing this internet thing for the same amount of time the internet’s been around?
Basically, yeah. We can assume, to some degree, that our awareness of how to best drive the thing is predicated on how old the thing is.
Definitely a young industry.
Right. So we’re still learning. We make mistakes, and that’s human.
Yeah, it’s human.
And a changing industry. Physicists (like astrophysicists) - there’s a lot to learn beyond, but the basics of physics are the same as they’ve been.
Right, theoretically.
Yeah, I go back to the idea of civil engineering - how to build a bridge in a structurally sound manner is a tried and true science.
Right, right. It doesn’t change every year.
You could have written that book a hundred years ago and it’d be slightly different now, but it’d be pretty much the same foundations. Whereas we’re kind of figuring out this software engineering, network-based industry, where we live our lives and we have our jobs, and they’re kind of like in the same milieu… All that kind of stuff is very much living it out as we’re trying to develop it, and we’re making mistakes that impact people that we don’t even know etc. So it’s very young, and therefore I feel like we really don’t understand what all the implications are at this point.
Yeah, I wanna go a little deep – like, a dollar store philosophy maybe on y’all…
Okay.
I like it.
Here’s my dollar store philosophy. What’s really interesting about the web is not only how young it is, but also the impact that it’s had in the amount of time, and how exponential it is in so many ways… Then you look at the under-the-hood experience with developers and just how much change we’ve had, and how actually developing for the web is like an extremely hostile thing… And what other industry do you know where we’re like “Well, I hope this works… Ship this, and I hope this works.” [laughter]
And it’s really interesting to watch the transitions that we’ve had, where 15 years ago - or more; or actually probably about 15 years ago… It was like, a user come to a website, and the server is like “Hey, tell me who you are”, and it’s like “Netscape!” And then it’s like “Okay, here’s your code for Netscape.” We’ve come a long way…
Yeah, for sure.
[15:46] …even just in that, where we’re now driven by features, or more of like progressive enhancement… But it’s still very hostile, because there’s a ton of variability now. It’s a different type of variability. It’s not so much that the browsers have a really low interoperability score, it’s that browsers are – you know, they’re just so much more powerful, and there’s a bunch of other capabilities. There’s assumptions that you can make on the device size, there’s assumptions that you can or can’t make on the capabilities that are enabled… It’s like the matrix is growing, and the problems are changing. It’s really interesting. I kind of think of it like quantum computing style, there’s just so many things…
Yeah…
Some of this might even lead into the bigger topic we’re here to talk about too, which is ASTs and legacy code, and stuff like that.
Change, yes…
Maybe a smaller topic, actually…
Well, something I wanna say is that yesterday’s choices are today’s consequences are today’s consequences. Yesterday’s choices – and we’re talking about our maturity level in terms of an industry, and people, and even as an internet, we’re still learning… But yesterday’s choices are today’s consequences. That’s kind of where we get legacy code from, and this need to transpile into new ways, and take care of tech debt, and all these things that come along with building software.
Good segue, Adam. Good segue.
Yeah, great segue. I saw that segue coming, because I’m a podcast myself… [laughter] I was like “We’re getting there. This is a long-winded introduction to a talk on ASTs…”
I saw it coming as well, and it was so smooth that I decided to call it out and make it completely a noughts move and destroy it.
That’s right.
I killed the segue.
You just janked that up. It’s okay.
That’s fine, Jerod. I forgive you.
We forgive you. [laughs]
Thank you for the forgiveness.
But yes, change. Change. The internet is change, right? It’s all about change, and that’s what we’re here to talk about, because I’m really excited I’m going to be talking about ASTs at All Things Open this fall… And yeah, I’m here to answer all of your questions, Jerod and Adam.
Give us the rundown from the uninitiated standpoint. What are ASTs, who uses them, why do they use them? What’s their purpose? …etc.
Sure. So when we write software nowadays, it’s really high-level. Var = foo… It’s human-readable words that are high-level, and in order for those things to be fed into a machine, and for your code to get turned into ones and zeroes, there’s a series of steps that it goes through a compiler engine. One of the first steps is taking your code and tokenizing it. Tokenizing is the process where the valid syntax items - in JavaScript that might be triple equals, const is a token - all of these things are parsed. So it’s tokenized and then a tree is generated from the structure of your code. That tree is called an abstract syntax tree. It’s not limited to JavaScript; every programming language uses abstract syntax trees to feed into the compiler engine, which translates all of that stuff down to bytecode.
Abstract syntax trees are extremely useful in programming, because they give us a predictable data structure, which helps us understand our code. So if you’re looking at a variable declaration, for example “const jerod = string awesome.”
That’s what I was gonna say…!
[19:52] Yeah, right?! That one line of code, including the semicolon, gets translated into a tree that has a predictable structure. The first thing – it’s a JSON tree that has a type program, it has a body that’s an array, that body has declarations, which is an array of objects, an object type tree… So it gives you this lovely output which is like a programmatic walkthrough of your code. And the kind of secret sauce to ASTs here is that there is a structure for you to understand what something is. You can understand “const jerod = string awesome”, I know that the identifier – it’s a variable declarator, and the value is “jerod”, and I know that “awesome” is a string… So there’s no guesswork.
If you think about things like regular expressions that we’re used to really parse and understand our code to find matches, there’s inherently a conflict between trying to find something with regex, versus using something like a tree that has more detail and metadata… Because the regular expressions are really good for analyzing static code, but they’re really not good at understanding the nuances, the differences in your code.
I’ll give you an example - if you have something that’s commented out that’s a variable declaration, versus something that isn’t, or if you have a function that uses the same name as the variable… So if you’re trying to find matches for that thing, it’s very difficult; technically, you can do it, but it’s just an extremely complicated set of regex that you would have to write in order to make sure that the thing that you are looking for is a function.
So what this tree allows us to do is it basically opens up a whole body of being able to really query your code, and query it in a way that is extremely precise and scapular. So you can say “I wanna find all of the functions that contain these conditions”, the conditions maybe being things that have more than ten variable declarations, things that have more than four if statements, functions with more than 20 lines, I wanna find promises that don’t have error handling… So it enables us to do a multitude of things in order to understand our code programmatically and deterministically… And then the flipside of that is using tools that allow us to take ASTs and transform the code, so that we can actually do an in-place replacement. You can now not only programmatically understand your code and find things, but also you can use that to do safe, in-place refactoring of your code.
The title of your talk is “Machine-powered refactoring: Leverage ASTs to push your legacy code and the web forward.” You’ve just described what ASTs are and what’s interesting about them. I think historically ASTs have been the playground or the domain of people who are writing languages, or thinking about programming languages and have to have parsers that produce ASTs in order to take a syntax and turn it into a thing a machine can understand. It sounds like what you’re arguing for is that there’s a much more mainstream use case for ASTs, where lots of developers should know what they are and be able to use them, because they provide this metadata and this structure, and we can use them not just to write a programming language, but to actually refactor, which is – I’d never thought of this before. Can you expand on how you’ve done this, what works, and is this something that lots of people should be using?
It’s really important for me to democratize this knowledge, because most developers don’t realize that they are actually already using ASTs every day in their workflows if they use things like Babel, Prettier, or ESLint. All of these tools - we allow these tools to programmatically create code for us and change code for us, and we trust them because of the precision nature that comes from leveraging ASTs.
There’s a whole domain of tools, as well as some domain areas in our industry, ASTs being one of them, that are kind of locked away in the library…
Esoteric.
Yeah, in the library author land.
For sure.
And what happens with library author land is folks are really busy, they’re maintainers for really large projects and they’re already overburdened, and getting good documentation is a challenge that most folks have out of their projects. So kind of taking the step to democratize the power of this has kind of been left on us as a wider community.
I’ve been able to leverage ASTs, actually – I worked on a project at Bocoup where we were working with the Edge team (this was a while ago) to modernize thousands of tests that were actually written for IE, but that were valid… So these tests were valid, because the web platform is – you know, we don’t break the web; when we implement the CSS feature, when we implement this API, it’s typically stable, and we just usually enhance it.
So there’s thousands of tests that were written for IE, that were still valid for the web platform, because they were testing open web standard APIs, but it was using an outdated harness, it was using a bunch of proprietary stuff etc. So we needed to modernize it and get those tests ready to be shared with the entire world via web-platform-tests, which is a project where all of the browser engineers contribute, and now have a shared test suite.
[28:24] So there were a lot of similar patterns, but there were also a ton of conditions… So I was able to leverage ASTs to help me power through a bunch of refactoring for thousands of tests, and I was able to make those changes safely; had I done that work manually, it would have been just an X number of days more.
And error-prone, too, probably.
Yeah, error-prone, and not a good use of a human brain. So I’m very pro automating repetitive work and using automation to limit your risk, but also to make it easier for you to repeat and rinse and iterate fast. And when you use automated refactoring, what you’re able to do is build up a set of transforms, you’re able to change thousands of files at once, and if you did something wrong, you just redo it; you just Git check-out, change your transform and then run your refactoring again. That type of quick feedback look is necessary to be productive in 2019 and beyond.
So we really need to examine what type of architecture – or not architecture, but what types of best practices we as a community have… Because we are entering an age where we have a ton of aging code and infrastructure because our standards are changing so fast. npm dependencies are great; it’s a good case study for looking at change. So if a library author changes an API, or if you have an internal private module and you wanna deprecate something, you can use ASTs to upgrade to a newer version of the API safely. You can also use ASTs to write your own custom linting rules around “Hey, I don’t want anyone adding new versions of this. I have a hardcoded count of all of the instances of this thing, and I don’t want any new instances of this deprecated module being used.”
So you can make that decision binary, and you can enforce those things for your team in a way that’s binary and where you’re not having folks having unproductive discussions. So I’m a huge fan of no knits, no– we shouldn’t be arguing over things that are team conventions, or previously agreed-upon things. Brainpower is expensive, and if you make it binary, you’ll have more productive discussions in code review. Let’s not talk about linting, let’s not talk about this.
And lastly, what I’ll say is that using ASTs is one way to really add a resilience layer to your codebase.. Because if you’re fixing a bug, the first thing you should ask yourself is “Alright, I fixed this bug. Could I have avoided this with a linting rule?” If the answer is no, the next question is “Okay, could I have avoided this with a unit test?” If the answer is no, then an integration test.
For me, writing your own custom linting rules or custom transforms and all of these things are like a first-layer defense for a lot of things in codebases.
Do ASTs typically be written in the language that you’re testing against? Where do you begin? What language are they written in? Are they a separate project? Do they live inside the monorepo? What’s the landscape?
Yeah, great question. I’ve only worked with JavaScript, in terms of using tools around ASTs in JavaScript. So what you need is a parser, and there are projects like Babel, that have their own parser. Esprima, Recast… There’s tons of different JavaScript parsers, and the differences are really nuanced, because they all have the same general structure, but then they have some additional information. Sorry, the AST trees that they output have different information based on what the preferences are of that tool, in terms of how they wanna traverse their trees etc. But typically, it’s a three-step process.
The first thing you need – I have actually a diagram here. I was gonna say, should I share my screen…? But this is a podcast, so we’re gonna have to talk through a diagram. So you need a parser, a transformer and a generator. The parsing tool basically just creates a tree for the input code, and then you have a transformer that basically lets you query the generated tree. Then you can say “Oh, here, I’ve found the thing in the tree that I want. Now let me create a new structure for what I wanna replace.” If I wanna change the value of something, or if I wanna remove something, or whatever. Let me make that change in the tree. And basically, a new tree gets generated from all of the transforms, and then that tree that gets generated now needs to go back into code. So that’s the third step. We need a generator. That’s the reverse of the parser. So it takes the tree and then it makes code.
Those are the tree things… But depending on what tool you’re using, your chaining together a parser, a transformer, a generator, or you’re using something that does everything for you altogether. Jscodeshift is what I really like to use, because it’s a wrapper for Recast, which uses Esprima from Mozilla; it’s a parser from Mozilla. Jscodeshift wraps recast, and gives it a very nice jQuery-style declarative API. So it’s just really nice to write. The folks at Facebook are behind jscodeshift; but Recast - you can also use Recast, which is great. I just enjoy the declarative nature of using a tool like jscodeshift. But you’re using JavaScript to write all those things, and there’s an API that usually comes with whatever tool you’re using, so that you can query, but then you can also create.
And then there is the last step, which is “Okay, now that I’ve queried, and I’ve created, now generate the tree and do an in-file replacement.” So in theory, when you Babel (Babelify, or whatever), or when you run ESLint, if you use –fix to make the change, in theory the whole thing actually changes, but Git only shows the diff. So you only see the diff. So the whole file got replaced in place.
[36:30] If we just take a simple example and maybe walk through these three steps… A simple example of refactoring is “Let’s change all of our vars to const”, for example. So I have all these var statements, and I wanna use const instead. I’m going to use an AST in order to do that. So the first step would be to take my file, or my chunk of code that has the vars in it, pass it through the parser… So I have raw text, I passing it through a parser. The parser then generates the AST for me, returns an AST…
Yeah.
Can I read that AST with my eyes, or is it a blob? Is it like Matrix-style…?
You can read that AST. You can print it, you can log it, or you can use an awesome tool that I like to use, which is really the standard around this… It’s ASTexplorer.net. It’s a site which allows you to just drop code, pick your parser, pick your language, and you can view the tree.
That’s cool.
The really great thing is you can use this tool to visualize a tree. There’s no memorization here; I don’t need to know what the tree structure is for a function that has a return value of this. I can just drop it in and see the tree, and then I can write the code for what I want to change it to, and then see what that tree is. So you can do reverse-engineering to basically say “This is what I wanna find and this is what I wanna change it to.” You have both versions, and you can use that to drive how you build your transforms.
Okay.
And I think the best part about it is this is all written in JavaScript, so these are Node scripts that are running, and you can basically do anything you want in the middle of a transform. If you want, you can say “Find me this static array list of images from some cloud server”, and then you can run a transform, and in your transform you can do an API request, get an updated list, do an in-place replacement… You can do dynamic evaluations of your code, so that you can actually have – even though your code is static, it can actually be dynamic. You can use transforms to even change your code, or do pre-evaluations, and things like that… So it’s a very nice thing.
That is interesting. So yeah, ASTexplorer.net, I’ll definitely recommend – I’m pulling it up here; there’s a link in the show notes if you wanna quick click on it. I think part of the ASTs is there’s this – like you said, you’re trying to democratize this knowledge. There is like a mystical aspect of once you get below source code, you’re like “Okay, we’re not a machine-generated thing. That’s scary. Can I view it?” It just seems a little bit more nebulous, a little bit more vague… Abstract maybe might be a good term. [laughter] But this does a good job, I think, just looking at the example, and I’m sure as you put in your own code into something like this, it probably does a good job of demystifying some of that and saying “You know what, this is not all that unapproachable”, and something that is very valuable if you can get past maybe a little bit of that abstractness.
[39:53] Making it more concrete now, once I have my AST, like you said, you can transform it. So the transformer operations - does that depend on the transforming tool that you are using? You mentioned a couple different tools, and one has like a jQuery-style syntax.
That’s right.
What would it be like if I would take all my vars and make them const. Obviously, you don’t have to type out the code to us, but what kind of a transform would that be?
Well, you would say – so if you’re AST.net, for example, you can pick jscodeshift as your transform tool, and you would basically say… So it uses a declarative jQuery style API, so your first thing is you’re looking at the file source, and then you’re saying .find, I’m looking for an identifier, so I’m looking for like a variable name, or a function name… So .find, identifier, and then a dot for each… So it’s just JavaScript–
It’s just JavaScript looping.
Yeah, iterate on all of the identifiers that you find, and then you can have a matching… So you can say “If that node name is Jerod, replace the value to be awesome”, and that’s it. And then .toSource(), which prints the transform tree back to the same file. It’s as simple as that. It’s actually mindblowingly easy.
On the JavaScript complexity metric, this ranks really low. This s way below TypeScript, in my opinion, for example. People look at TypeScript and they’re like “I don’t understand this…!” And then a week later they’re like “Oh my god, I’m converted forever.” For me, the barrier to entry when I teach folks about ASTs is even lower than that. As soon as I show them an example, three minutes later they’re like “I’m sold.”
Yeah. I’m basically looking at this example right here and I’m pretty much sold as well, because this is way more simple than I would expect it to be a bigger buy-in, at least to get started. It seems like it’s pretty straightforward.
The tooling has made it really easy…
When do you reach for something like this in terms of complexity? Because the simple example of like “Change my vars to const”, in my text editor, I can basically Cmd+Shift+A and just type in “find all const and replace with var”, so there’s certain things that our IDEs or our editors make those kind of refactorings pretty straightforward, like a Find All and Replace… But then when do you know “This is a little bit too complex”? Or maybe it’s just like case-by-case; you’ll just know it when you needt it, or…? Maybe the better question is is there enough of a barrier where you don’t reach for this right away, but you kind of upgrade to it when it gets to a certain level of complexity?
Yeah, I think that’s a great question. I think it’s about understanding what your needs are, and what type of change you’re trying to make. If you’re trying to make something that’s really simple, and self-contained, and something that you can just do with a Find and Replace - great. But anytime your change is conditional, or any time your change is more than one line - if it’s like a multi-line change - that’s where you really… You know, moving around function parameters, or deleting code… Things like that.
I would say that the true needs of what we would do as developers to refactor a set of hairy code that’s widespread - that’s one where I would use transform. So I would say that at scale, if something is repeated in multiple areas, if there’s something that’s a clear pattern, if you’re updating something where it can be really hard for a regex to pick up on the differences between things…
[44:07] For example modules, when they’re being imported. I can also use the star syntax to change the name of something, so “import foo” as star. There’s lots of little nuances there, and you can use ASTs to make sure that you’re changing the thing that you need to change, and you’re not gonna accidentally change something else.
Maybe the first time your regex fails you. You’ve gotten so far with a regular expression, and now it just missed a case… And you’re like “Instead of sitting here and iterating on that regex, and just keep on tweaking it for these different cases…”, stop right there. Now it’s time for an AST, because you probably save time that direction.
Exactly. And I think the ramp-up here, which is maybe your deeper question - I’m advocating for developers to have this in their toolchain the same way they have a linting support, and running tests. So we should have an easy way for folks to write transforms. We should take the day or two that it takes to set that up, get that into the project with some examples, and make it so that folks have a path for doing those things. And that can be twofold - you can use that as an opportunity to create a bunch of custom linting tools, and while you’re doing that, adding infrastructure for how to write transforms if you need to…
But ultimately, if this is in our projects, folks become – even if they don’t use it to check in code, even if they use it while they’re developing something to find what they need, it’s a way to level up the playing field for everybody. Because the stakes are getting higher. We have bigger codebases, front-ends are huge… We have not only thick clients, but thick servers. I also think the culture of like “Let’s throw everything away and start over” is a really expensive one, that isn’t a good thing we should be promoting. Folks should feel comfortable with refactoring code, and they should feel proud about it, because you’re able to still drive value for your product and your business while pushing your code forward. I’m personally sick of seeing front-end teams start over from scratch every 12-14 months… So let’s just, like, not do that.
Amal, you’re obviously passionate about this particular subject. It is somewhat dry, you have to convince people to pay attention to somewhat arcane knowledge, like abstract syntax trees, but there’s huge value that can come out of doing these refactorings, and really allowing yourself to refactor better, faster, stronger. Is this a tough sell in engineering teams, or do you find it’s pretty easy to convince people to institutionalize this kind of a tool in their toolbox?
Yeah, that’s a great question. I have to say that I think there’s a few different things happening in our industry right now. One is there’s like a dopamine hit that we get from new tools, and new things…
Fresh starts.
…and fresh starts. And there’s a problem with consistently working on new things; there’s a set of challenges for developing software that you just don’t even get to really explore if you’re constantly starting over to do your Hello World app, or Create React App, or whatever the hell else. It’s great to do that every once in a while.
I’m not sure it’s healthy to be creating new projects all the time, in the sense that there’s some real good engineering challenge that you get from having to understand how to drive value, how to make change while still shipping to production. How do you maintain, how do you refactor safely? How do I refactor a billion-hit-a-month codebase, while still pushing to production? And understanding how to do that safely, responsibly… What are the nuances of that, in terms of testing, in terms of – there’s so many interesting things. There’s like a class of problems that you just never get exposed to.
So for me, the heroes in our industry are really the folks who are working on legacy applications and still driving them forward, and continuing to chip at them. Some of my philosophical ideology comes from Martin Fowler, who has a really great article which I think we’re gonna link in the show notes; I’ve just sent that to you all. It’s StranglerFigApplication. Basically, he was on vacation somewhere - I think in New Zealand - but there’s this tree that is growing roots and is slowly strangling the thing; it’s growing new roots, but it’s slowly strangling the old ones… And basically, the idea here, the pattern is that you can refactor your application module by module, bit by bit, while still driving value forward.
I’m personally sick of seeing the next-gen team, versus the old-gen team. So many companies – you have a group of people that are working on something that is not shipping to production for like 6 months, 12 months, 14 months, 17 months… [laughter] You get the drift, right? Ultimately, you’re building a set of things where you’re not getting that feedback loop from your customers on what’s working and what’s not. You’re developing the new version of your thing in a complete silo.
[51:35] I think a really interesting problem that I had to solve a few years ago - I was new on the team, and I was hired to rearchitect all of the UI, “Get us off of the legacy code…” [laughter] And it’s really funny, I’ve never actually talked about this story, so I’m realizing now that maybe this is the origin story for me… But it was a Backbone application, and they wanted to switch to React. And I was like “We’re not gonna get rid of React. We’re not gonna get rid of all of these Backbone views. The best part about React is it’s just a library, so maybe we just build infrastructure so that this whole new view, this new set of functionality that we’re adding - maybe that’s React, and we’re able to push forward having all of our new views be React components, while still leveraging the Backbone components. Those two things lived in one ecosystem.” It was a little more work, but we were able to slowly replace everything while still driving value, while getting feedback from customers in the wild.
That’s the type of challenge – for me, that’s what makes a senior engineer. That’s what makes an architect. That’s what makes for somebody who really understands the challenges and the nuances of our craft.
We have more code now than ever. Forget our code… Most of our code is actually third-party dependencies. I think Google just did a study on that, and out of every ten lines of JavaScript, it’s one line of code that belongs to the application. That’s a shocking number, right?
Yeah, for sure.
But if you think about it, it’s no surprise. The open source model is working, that’s what it was designed to do. We don’t wanna be reinventing the wheel, we wanna be standing on the shoulders of giants… But at the same time, we need to be able to move quickly and shift. So if I wanna switch dependencies, I wanna be able to do so in a way that isn’t going to set me back, or I wanna be able to do so in a way that’s safe, and it’s not just changing dependencies - it’s about upgrading, and all kinds of things.
There’s a culture now with some of the larger frameworks - Angular being one of them - where they’ll give you a set of transforms with the version bump. They’re like “Alright, new major release. Sorry for the breaking changes, but… We’re now gonna give you a command to run so that you can migrate from 5 to 6, or 6 to 7.”
Yeah. That’s awesome.
Yeah, yeah. This is great. This is like when browser compete for security and speed, and all these other things. These big libraries are now competing on user experience, and DX more so, actually. Developer experience. So the bar is getting higher, because the stakes are getting higher, and we can start adopting those practices in our own codebases as application developers… And that’s my pitch.
I like that pitch, and I know we have this shared metaphor… And I’m not introducing either of you two to it, but we have this metaphor of technical debt and then this idea that you are taking on debt in order to gain somewhere else, and eventually the debt collector is gonna come, unless you manage that over time. And you know, in finance we have ways out - we can declare bankruptcy. Of course, if we do it like Michael Scott, it doesn’t quite work, where he just walks out and says “BANKRUPTCY!” I don’t know if you saw that episode, but it’s one of my favorites… You can’t just say the word out loud, Michael.
I haven’t seen that…
Yeah, he just goes out into the office and he just declares bankruptcy… And Oscar, the accountant, is like “That’s not how it works. You can’t just declare bankruptcy…” Anyways, off-topic. But you know, we hear a lot of people declaring bankruptcy with their technical debt, that’s where I’m trying to get to… Because maybe it’s part of the tie-in with the Silicon Valley mindset, the startup mindset of like “You have to have a bunch of people spin up new things, and then they die, and then here comes a unicorn out of that. 1,000 failures, here comes one success.” Maybe that mindset is tied in with the technological advances, and we get to this point where it’s like “Well, a new thing has to begin.”
[56:13] I’m with you, very much so, on maintaining legacy code, and that being really the software that provides value over a series of years as de facto legacy. The reason why it’s still around is because it’s providing real value to real people. But is there a point where you’ve come across any code where it’s like, “You know what - you guys didn’t manage the technical debt here…” I like the idea of pushing the thing forward, but sometimes you’re pushing up against a wall. Are there limits to this ideology, or can we refactor all things?
I’m sure there are cases - although I think they are very rare - where I think you have to completely just abandon ship for the entire project… But with the kind of module-by-module approach, the idea here is that you’re taking one vertical segment and replacing it, and then throwing away the code that you don’t want.
Right. Instead of throwing the whole thing out.
Yeah, or you’re refactoring in place. Either one. But I think for me an acknowledgment that we don’t make enough in our industry - and I think you’re totally right about your analysis on “Maybe it’s Silicon Valley culture”, or maybe there’s some kind of culture bleeding over here, with just a race to the top. But we don’t acknowledge – I feel like enterprise code is its own beast in our community. You’re like “It’s either enterprise, versus small/medium, versus the Create React App worlds.” So it’s these three paradigms where nobody wants to be enterprise. I think we even coined the term “enterprise dude” on a team that I was on. The enterprise dude always ruins everything, for everybody… [laughter] The enterprise dude is always relying on the least supported version of something, and is holding back people from being able to upgrade things… Anyways.
So real software, software that’s been out in the wild, and has had multiple developers work on it… Applications at scale have cruft. I have yet to see applications that scale that don’t use multiple languages, that don’t have arcane stories behind why this weird thing exists… It’s like “Alright, when you open this file, you’re gonna have to turn around ten times and tap your nose once.” [laughter]
there be dragons…
It’s just the most hilarious stories. But applications are living, breathing; they have cruft… That’s normal. So I wanna normalize weirdness, because that’s just how applications evolve over time, with multiple people. So it’s okay, there has to be some uncomfortableness in our codebases, because ultimately, you have to have something to be pushing forward as a team. I envy the folks who are really happy about everything; congratulations to them… Maybe this talk isn’t for them. But this talk is for 99% of us that are remaining, that have #realproblems.
[59:48] [laughs] I think Mike Tyson said “Everybody has a plan until they get punched in the face.” That’s when everybody’s plan goes out the window, basically. He knows that pretty well, because he’s punched a lot of people in the face. I think code is like that - we all have beautiful, perfect, pristine code, until it hits production, it hits the real world. Once that happens, stuff hits the fan, and you’ve gotta make changes… So the longer it’s been in the real world, the more craggly it’s gonna look.
I’m looking at this picture on Martin Fowler’s blog, of the StranglerFigApplication, and I’m thinking “That code, that tree –”
Isn’t it cool?
That’s an abstract, some kind of tree.
Yeah.
That tree is crazy-looking.
[laughs] It’s crazy-looking. But yeah, at the very minimum, you always have the CEO button. If your code is perfect, I challenge you to find one decision that wasn’t the CEO button decision, where it’s just like “Just put it there, make it happen.”
Just make it happen. [laughs]
“Ship it now.”
Dang CEOs…
Yeah…
Well, Amal, your talk is the first day at the conference, right? You’re on day one, that’s the October 14th. The conference actually happens October 13th through 15th. There’s some workshops etc. going on. If you are planning to go to this conference - which I would suggest you do, because hey, we’re gonna be there…
That’s right.
As a matter of fact, we’re planning to have a live JS Party at All Things Open. Amal might be a future panelist, a future guest panelist on JS Party, so - hopeful there at least.
Yeah.
See Amal day one… But I’m not sure which day our live thing is, but it’s definitely gonna be there, at All Things Open, happening in Raleigh, North Carolina, October 13th through 15th this year.
Come see us.
If you are thinking of registering, I would say that right now between the end of the month their mid-tier pricing is still active. October 1st it goes a little higher… It’s still a very inexpensive conference. Even on its most expensive ticket period, it’s $279. So not a very expensive conference to go to. Amazing speakers… Amal, you’ll be there, of course. Jerod and Kball will be on the stage doing something – I’m not sure, what is the plan, Jerod? Do you have a plan?
The plan will be revealed… When the plan is revealed.
Yeah, it’s a fantastic conference. Just incredible speakers, and lots of – I think it attracts an audience that is really diverse, and also has just an interesting breadth of problems, so I highly recommend it. I’m really excited to be speaking there this year.
I wanna give a quick shout-out too to Todd Lewis, the organizer of that conference. He does such hard work to make that conference happen each year. Every time I talk to him, he’s always moving; he’s always moving, he’s never still. He’s always going. Todd, great work on this conference. Looking forward to being there. Our first time there was in 2016, so we’re glad to be back.
Amal, thank you so much for your time today, and sharing your wisdom. You are welcome back. Thank you so much, it was fun talking to you today.
Thank you so much for having me. It’s been a pleasure.
Our transcripts are open source on GitHub. Improvements are welcome. 💚