Go Time – Episode #310

The magic of a trace

with Felix Geisendörfer & Michael Knyszek

All Episodes

Felix Geisendörfer & Michael Knyszek join Natalie to discuss Go execution traces: why they’re awesome, common use cases, how they’ve gotten better of late & more.

Featuring

Sponsors

Boot.dev – Learn backend development the smart way. Boot.dev is the most captivating, finger-flying, addictive way to learn to code.

Fly.ioThe home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

Notes & Links

📝 Edit Notes

Chapters

1 00:00 It's Go Time!
2 00:44 Felix Geisendörfer
3 03:16 Michael Knyszek
4 05:33 Fitting the ecosystem
5 13:06 Sponsor: Boot.dev
6 15:13 Changes in best practices
7 23:15 Common trace use cases
8 30:04 Interpreting results
9 39:58 AI (obligatory)
10 43:13 Unpopular Opinions!
11 43:54 Michael's unpop
12 45:19 Felix's unpop
13 46:42 Natalie's unpop
14 48:17 Outro

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

I am Natalie Pistunovich, and I am joined today by two people with equally complicated last names. Felix Geisendörfer and Michael Knyszek. Felix, you are a senior staff engineer at Datadog, where you work on continuous profiling as a product, and you contribute to the Go runtime. And Michael, you are a software engineer at Google on the Go compiler and runtime team. How are you both doing?

Yeah, doing well.

Nice. Felix, you came here with something to say about something that was told on this podcast in the past…

Yes. So a while ago, specifically December 2022, I was listening to an episode of Go Time, and at that time I was just looking into the execution tracer as a potential solution to a problem I was trying to solve a Datadog. And what we were trying to do was to simplify the process of debugging slow requests, especially without adding additional telemetry to identify maybe forgotten things like outgoing network calls that weren’t instrumented, and that might explain the latency, but also to deal with sources of latencies that are very difficult or impossible to instrument, such as mutex contention or inefficient goroutine communication patterns, and stuff like this. Anyway, so the episode I was referring to, 260 –

“Making Go more efficient.”

Yes, “Making Go more efficient.” During this episode Christian Simon said at some point “I’ve always started execution tracing when I was desperate, but I’ve never found a problem with it.” And while I was listening to this, I was thinking “Yeah, I totally get it. It’s a difficult tool to use”, because I was using it at the time and just getting to know it… But on the other hand I was also becoming really enamored with the possibilities of that tool, and how it can really give answers to questions that no other tool in the Go tool system can answer. And so yeah, I was hoping that one day I could come to the Go Time podcast to kind of challenge this assumption, that this tool is only for the desperate, and will not even help them… And also, a lot has happened since then. Michael has recently blogged about improvements to the execution tracer on his “More powerful execution traces” blog post, where he summarized a lot of cool stuff that has happened in Go 1.21 and 1.22. And so hopefully, today we can kind of dispel some myths about the execution tracer, and tell people about how to use it efficiently.

So you didn’t just come alone to this episode to talk about this sentence. You really brought Michael to resonate for this – Michael, could you tell us about that blog post?

Yeah, sure. So the blog post was all about the stuff that has happened with execution tracing in the last year, which is a lot. All of that sort of started with Felix and his colleague Nick reaching out to me after I shared some document about possible directions - I don’t remember through which channel - and we were thinking and were excited about tracing, we were trying to figure out what other teams at other companies and in the Go community in general would be interested in seeing. And I remember getting this email from Felix and Nick that’s like “Yes, we want to do this. We’re willing to help.” And I was like “Great, we’re doing it.” And then that started this whole song. I think this was like December 2022, or something like that.

But the history of execution traces goes back way further to Go 1.5, where it was initially introduced, and that came with a pair of APIs. We wanted the runtime to generate the traces, and a tool that came with a Go tool trace, which would let you actually visualize, and look at the different things that your program is doing. And that’s sort of unique about execution traces, is they’re the only tool that offers a view of what your goroutines are doing at any given point in time. And for different things that goroutines do you can see stack traces, and - this is sort of getting into the use cases a little bit, but the sort of killer feature of this is that you get to see things that aren’t happening. And I think that there’s a sort of magic with that. There’s a lot of assumptions we make about what our programs are doing, and there’s a lot of assumptions we make about how our programs will behave when we write them… But the reality is often different. And whenever I find myself using execution traces to try to solve a problem, from my perspective - that’s usually something with the GC, some kind of latency issue that maybe the runtime is the culprit, it is really good at challenging my assumptions about what the runtime is doing.

And how does it fit the ecosystem? It’s one of the tools that come with Go, and… Any information you have on the usage. How often people use, or any interesting bits of information about this at all, if you have?

[00:05:49.27] Sure, sure. Well, so yeah, the tool comes with every standard Go distribution. So there’s the go command, like go build, go test, and if you type “go tool trace”, then that will give you the command to actually look and analyze the traces. If you want to get a trace of your tests, you can use go test-trace. There’s the runtime/trace package for actually generating the traces… There’s net/http/pprof for basically setting up your HTTP server with endpoints that let you grab an execution trace on demand, from production or something… And there’s now a whole bunch of new experimental features that I can talk about later as we move more into the future about execution traces, and talk about the work that has happened in the last year… Which I’m very excited to share, but I feel like I’ve gotta motivate first.

[laughs] Well, do motivate us.

Okay, very well. So the execution traces are great when they work, but the problem is, as Felix alluded to earlier, is that they can be kind of hard to use. So in that blog post, the main problems are that traces had high overheads. So for example, if you wanted to start tracing, you probably couldn’t do so - and for more reasons than just that, that I’ll get to… You couldn’t really trace for more than a few seconds, or maybe as like an ad-hoc tool, because traces tend to have very high overheads. When they were first released, I think it was something – it was quite significant. It was like 450%, or something like that… And then over time, that kind of was able to get down to 10% to 20%.

Which is round about now, with 1.21.

Well, actually with 1.21 it’s down to 1% to 2% thanks to work that Felix and Nick did in the 1.21 release.

I thought it’s the 1.22 one. Okay.

Yeah, Go 1.21 is, I believe, where the low overhead tracing came in.

Go 1.22 had a whole bunch of other features that – but yeah, yeah, so tracing is now…

We’ll also talk about them. Yes…!

Tracing is now substantially lower overhead. So that was sort of the first problem, is people didn’t really want to use it; you couldn’t really use it at scale, or collect a lot of traces, because –

Or in production much…

Yes, yes. I mean, you could use it in production for like one-off things, but no more than like maybe a few seconds or a minute at a time, or something like that.

So the second big problem was that traces didn’t scale well. If you tried to collect a really long trace, and you tried to analyze it, there was a high likelihood that it would fall over. The Go tool trace command would actually probably crash your system by running out of memory, or something like that, because you’d have to load the entire trace into memory in its expanded form. And it would have to keep it all in memory while you’re analyzing it, and that was just way too much.

So in Go 1.22, what happened is now traces are the underlying format. I won’t go too deep into the details; this is a design doc, if you want to go into it. But the format is now scalable. So unfortunately, go tool trace doesn’t actually take advantage of that yet. It still reads in the whole thing into memory… But we can now fix that, finally. This new behavior is still useful, and I’ll get to why in just a moment.

The next thing which I was kind of alluding to, the next big problem was – and I imagine this is really the core reason why people say that they can’t find the problem with traces, is that when you take a trace, what you really want to have a trace of is the bad behavior that you’re seeing. You’re reaching to tracing because something’s wrong with your program and you wanna capture the thing that’s wrong with your program. The problem is, there’s often no way to identify it. You can identify when a bad thing has happened in the past, but if you start tracing at that point, you don’t really get anything out of it. The trace is going to have good behavior, most of the time.

So either you just trace everything, which as we said before was not really feasible, or you just take a whole lot of traces and hope that one of them – [unintelligible 00:09:53.24] small traces and hope that one of them actually catch the issue. There are people that definitely used this technique, and used it well… But it didn’t really help with catching specific issues; it more helped with catching some issue that was happening in production.

[00:10:11.09] So to remedy that, now that the trace format is sort of scalable, and trace overhead is low, we had this idea for what’s called flight recording. So people in the Java ecosystem might be familiar with this idea… But basically, you have tracing on all the time, and you’re not writing out this massive trace, containing all of the information about what your program is doing. Instead, the runtime, or maybe in this case an external package just keeps the most recent trace information. Always. Just like the last five seconds. And every second or so it throws away one second, and stores the last second of information, something like that. And then at any point, you can ask it “Please give me what you have.”

So that means you can do really interesting things, like - say you’re debugging a really, really high latency RPC or request to your web service. You can just put an if statement – you can start this flight recorder, you can put an if statement in your program that says “Did this RPC take more than 500 milliseconds, or something like that? If so, dump the trace.” And what you get out of that trace is exactly what you wanted to know - what happened during those 500 milliseconds that made this particular request slow? So we’re kind of hoping that this is sort of entering a new era of tracing, where you can actually really use it to target problems in a way that you couldn’t before.

And sort of the last big problem with tracing was that traces are great and all, but a Go tool trace and Go Trace UI, which I have to shout out to Dominik Honnef, who maintains this Go Trace UI tool, which is an alternative to Go Tool Trace… And maintaining it has been hard, because there’s no public API to actually parse traces. And the format, for various reasons, mostly efficiency, is quite complicated. And so it’s not for the faint of heart to just dive right into them, as some people have done. But now, in addition to this experimental flight recording, we also released an experimental API, so that anyone can parse traces and get something out of them. So we’re hoping that this will let people sort of use – and it’s a streaming-based API, so it no longer requires holding the entire trace in memory… And people can write their own ad hoc analysis, and learn analyses, and learn what they want to about their applications. So that’s pretty much a summary of everything. If there’s anything you’d like me to clarify or dive deeper on, or anything, let me know.

Break: [00:13:00.14]

There are some changes in the behavior, and some almost like in a core way that one should be using this… So I guess there were some best practices in the past, and I guess with all these changes, some of the best practices or like most efficient ways of using this have changed. What will be some of them?

So I wouldn’t even necessarily say that there were best practices in the past, because I think a lot of people didn’t really know how to use them.

Okay, we can say anti-patterns.

Yeah, sure, sure. There were anti-patterns you could avoid, but there weren’t really patterns for you to follow. Whereas - I agree. Now the landscape is very different. Now if someone said they wanted to debug a latency issue, I would say “Try flight recording. Turn it on, write that little bit of instrumentation code to try to catch it in the act”, right? That was likely to hold some insights as to why your request took a long time.

The other thing I’d say is, you know, is there something that Go Tool Trace doesn’t support for you? Because it’s just one tool, and it can’t possibly support every single use case. Then maybe dive into the API, and write your own analysis, and publish it in the Go community. It feels like there’s a lot – there’s a much bigger opportunity for traces now than there used to be. And you’re totally right, I think now there are best practices, whereas previously there weren’t.

Are there some best practices you can recommend, or is this more like “Let’s see how everybody uses this, and kind of based on feedback from everyone we will be releasing that”?

Well, one thing I want to say is that a lot of the new stuff is experimental. So the core implementation, I’d say, is pretty solid and ready. There are, of course, things we’re still fixing, because nothing’s perfect… But for the most part, the core implementation is solid. The flight recorder and the trace API for parsing it and doing your own analyses - that’s definitely still experimental. It has not gone through the proposal review process; it’s not in the standard library, it’s in the experimental –

It’s under the X…

Yes, yes. x/exp, for now. And to some extent, we’re looking to see how people use these things, and getting feedback and understanding… And on the other hand, we do definitely have some directions in mind. I think flight recording is one of those things that just really – it’s kind of structured to tell you how to use it, which is that you have a problem, you turn on flight recording, and you make a little change to your source code to figure out when you want to get a trace of that problem. And best practices there - I mean, there’s the web service example I mentioned before, and I guess I don’t actually have another example off the top of my head… But yeah, we are looking to see how people will use it.

We’ll make the part two episodes of this with sharing some of the insights in a couple of months.

I hope so.

[00:18:08.22] Yeah, I have a few thoughts on this as well… When I go back to what maybe was talked about in episode 260, I think the problem with execution traces before all these improvements that Michael just went over was twofold. One was the sort of user experience and how the data was visualized, and I’ll go into how Go Trace UI has significantly improved the situation there… And the other one was the best practices for collecting the data. As Michael said, previously it was you use it locally, or maybe one-off in production, but then it was really hard catching it in the right moment. Now you can basically use it continuously, which is a game-changer for getting the data that you actually want.

But let’s talk about the experience you get when you get some data from this tool. By default, what you see when you type in “go tool trace” and you open the trace is a little web interface, and you have to click a link, and then it takes you to the actual visualizations that you want.

And that visualization, I think, to a lot of people doesn’t come natural, because it kind of shows the world from the perspective of a Go scheduler, not from the perspective of a Go developer. So what you see is a list of prox, which - Michael is gonna hate me for this, but they’re like CPU cores. [laughter]

Michael’s holding his head in the hands…

Close enough…

Yeah, we can get into peas and hams and cheese in a second here, but… I think the first time you look at this, you should just think that each lane that you see - you see like several lanes as a CPU core, and each of the little rectangles or spans that you see rendered is when the scheduler is actually taking a goroutine and executing it on the CPU core. So that is maybe surprising to a lot of people, because they think of a goroutines’ existence as something continuous, and they think of the lifetime of a goroutine over time, including the time it’s on CPU off CPU. But I think most users don’t really have a strong sense of when a goroutine’s actually switching. What they also don’t have a strong sense is as the goroutine switches between on and off CPU, the timescales can be crazy. We’re often developing web servers thinking in tens or hundreds of milliseconds, but then you go into scheduling on the runtime and you’re talking microseconds, maybe sometimes nanoseconds… And so some things can be really, really short and tiny.

And also - yeah, the goroutines can migrate between CPU cores or [unintelligible 00:20:27.26] on the runtime. So it’s not what you would expect. What you would expect is one lane per goroutine, and then seeing what that goroutine does over time, including when it’s waiting for something for a long time… Which is honestly where most goroutines spend most of their time, is waiting on networking, or on channel operations, or on timers, or on mutex contention - hopefully not, but that’s exactly the kind of problem you can find.

And so I think the big change for the user experience was, like Michael was hinting at, Go Trace UI as a new third party tool that’s trying to change this user experience by showing you still that traditional view that is useful - and we can talk later on why that view is actually really, really good - but also a view where you see each goroutine in the timeline, what it’s doing, and you can even - when you record CPU samples at the same time as your execution tracing, on the keyboard you can press S… I hope it’s still the shortcut; I haven’t tried for a while. Which shows you the stack traces. So it basically becomes a flame chart, where each lane of each goroutine sort of becomes into a changing stack trace over time, and you can really see what a goroutine is doing, both on CPU and off CPU. And it’s a really neat way to understand what’s really happening.

And all of this at the end of the day - I think Go Trace UI was already started before the [unintelligible 00:21:42.25] and the parser that’s now available… But that sort of user experience is, I think, what was needed back in late 2022 to actually make the execution tracer more approachable for people, who might then also go back to the traditional view of seeing the [unintelligible 00:22:00.08] lanes and doing useful things with it, but it might just not be the best way to introduce people who are not that into the runtime internals.

I totally agree… Working mostly on the Go team, I’m embarrassed to say I don’t have very much experience with Go Trace UI. Everything I’ve heard about it has been uniformly positive. One thing I definitely agree with Dominik, with Go Trace UI, is that the Go Tool Trace is pretty dated, and it could be better. There’s a lot of useful information in there, but I will say it is not the most user-friendly interface.

Someone on the Go team not long ago added a whole bunch of documentation into that little web UI that it gives you, which has made it substantially more approachable… But it’s basically just like a teeny-tiny guide that it pops open for you before you start looking at it, and it definitely doesn’t cover the full range of feature features… But it was certainly an improvement. So I definitely appreciate Dominic’s efforts in this direction. We’d like to improve the default experience, for sure, but it’s not something that has been prioritized so far.

Can you talk about some common use cases or scenarios where trace is really what you need, and it’s really useful? Maybe interesting things you saw, or maybe the go-to’s both are interesting.

I think Felix is going to have the most interesting stories, probably… I spend a lot of time working on the Go runtime, and the Go GC and stuff… And so the things that I’m usually looking for inside traces tend to be different, and I think are going to be less applicable to sort of a general Go-using audience. But I will say, our team has been using traces for a long time to prove and disprove theories about latency issues, and to learn more about how the garbage collector – sorry, I’ve been saying GC; I meant the Go garbage collector. How the Go garbage collector executes… Because it is very much – it has its own goroutines that are doing things, and how that interacts and perturbs the user program, and stuff… And we’ve discovered so many issues; some things become clear as day.

If anybody recalls, in Go 1.14 there was a pretty big change in the Go scheduler, which was that goroutines could now – I’m trying not to get too technical here, but basically goroutines could get…

It’s the right podcast to do this. No worries.

Okay, fair enough. But goroutines basically before Go 1.14 could only stop executing, or like be asked to stop executing by the garbage collector at the start of a function. Like, when a goroutine entered a function, and when it blocked on a channel, or… There was like a whole bunch of situations where it would yield its time. And this would cause latency issues in a lot of applications. If say one goroutine didn’t call a function for a runtime - let’s say it was just like encoding or decoding some really big thing, and it would just be in this tight loop, doing like arithmetic, and nothing could stop it. So the work to make it possible to stop a goroutine, basically at any time, was largely – the problems that it was solving was very much visible in an execution trace. And I think at the time, Austin was the TL of the Go compiler runtime team. They used the execution trace view to sort of visualize that “Hey, this isn’t actually just a latency problem, this is a throughput problem.” And you could see that this one goroutine was executing, the garbage collector was trying to stop all Goroutines, and it stopped all of them except one. And so what that basically meant was that if you had – like, if your application was normally going along happily at, say, 16 cores, there was a period of time, maybe even milliseconds, where only one core was actually doing any useful work. And that’s not just a latency problem, adding milliseconds of latency to basically everything that’s happening at that point, but it’s also a throughput problem, because nothing else is executing. And that was super – I think the visualization in the execution trace was incredibly powerful to display that particular problem and to motivate fixing it.

[00:26:22.05] Super-interesting things.

Yeah, so I think for me the most common case why we built it into Datadog was you have distributed tracing already, which is now a pretty common way for people to look at their request latency… And at some point, you get to a span in your distributed trace where there’s a slow request, but it’s the last leaf span; it’s slow, it’s in your application, and you don’t know what’s going on. And so basically, breaking that down was one of the big motivations for us, and what we’re doing is showing a few similar to Go Trace UI, where we show one lane per goroutine, but we filter all the goroutines to just show the ones related to the request. So that was use case number one.

The second one that I actually didn’t expect so much was also throughput-related, but it’s actually pretty common for people to have systems that read stuff from queues like a Kafka queue, and they observe that – they see a lack of throughput; so there’s a big lag building up, like messages in the queue not getting processed, in the backlogs that they need to crunch through… But then they look at the resource utilization metrics of their application and what they see is that the CPU utilization is actually low as well. So they’re not making progress on the workload, but their application is not working hard. And they’re like “Well, I pay the cloud provider for all this hardware. Why is it not doing the thing I want it to do?” And this is where execution tracing can really shine, because you look at actually the classical view of execution tracing, which is one that’s scheduling-oriented, and what you’re looking for is all the gaps. Because you want this view to be like packed; you want each lane to be full of goroutine execution. And if you see patterns of gaps, sometimes a pattern can tell you a lot. Sometimes you see execution, everything is busy, it’s super, and then you see like a big vertical empty space of nothing happening, which means everything had piled up probably on the same thing, and now you can sort of click around and figure out what that thing was, and solve that throughput issue. So that is definitely another use case that I’ve seen quite a lot.

And last but not least, sometimes - again, the goroutine view can bring things home that you could already see through other signals. For example, if you have a goroutine pool that’s not sized correctly. So you have a goroutine pool that’s maybe making some outbound network requests, but that becomes a bottleneck for your application - especially in the scenario that I’ve just mentioned - you can look at the block profile, but honestly, it wasn’t a recommended profile type because of the overhead, which actually is still a problem, because we haven’t ported the fix for the execution tracer’s faster stack traces to the block profiler yet… But basically, what you’re looking for is you’re trying to figure out which goroutine pool, if you have multiple of them, is sort of the bottleneck. And the way you can see it is the bottleneck goroutine pool on the other side, like the goroutine sending to that, will be blocked on send. So you’ll see a lot of contention on Send operations on the channels. And on the other side, on the outgoing side, you’ll see starvation on receive. And that’s something you can kind of see in the block profile, but it’s really a difficult profile to read, because it often gets also kind of dominated by benign receive waits… So you have like maybe another goroutine pool that’s completely irrelevant for the problem you’re trying to debug - it has more goroutines than everything else, and now it’s just going to drown out that profile type with the receive operations.

And so it’s really something else to then look at that data with execution traces, where you can really see each goroutine, and what it’s doing over time, and you realize “Hey, I have a pool of 50 goroutines here, but they’re spending 95% of their time waiting on networking, and 5% executing. No wonder why I’m not making use of my CPU resources. I need to significantly increase this pool, so not only that pool can do more work, but all the other pools on the left and the right side fit can as well.” So I think that’s – yeah, another pretty common thing.

[00:30:05.06] And when you’re looking at results, and you want to go and interpret them, how do you go about this? Are you like looking for patterns of how you know it looks, and you’re looking for what pops up as an anomaly? Or is there another way that you’re going about that?

Yeah, so the patterns definitely, both in a goroutine view, as well as in the scheduling view. Another thing that’s actually kind of neat, that we haven’t mentioned yet, is there’s at least two very interesting time series on top of Go Tool Trace, when you look at the execution traces. The first one shows you the number of goroutines and the number of unscheduled goroutines. So actually, when that graph takes up with the color of unscheduled goroutines, you know that a lot of goroutines want to run, because they’re runnable, but they’re not getting scheduled, so you can kind of look at what’s going on there. Usually it means you actually don’t have enough CPU resources. The question is why.

Sometimes you can actually see another pattern where a garbage collection comes in, take some of the lanes over a where goroutines want to execute, and suddenly you get all these unscheduled goroutines on top. The other thing is garbage collection, where you get an interesting lane showing you what’s the current heap in use metric is, and what’s the target is, and then you see the two approaching, and garbage collection cycles kicking off… And to some degree, you can see this with metrics, but metrics are often captured at a lower resolution, like tens of seconds or something, and so sometimes you don’t see those violent patterns of GC cycles happening and disturbing the application. I mean, you can still measure it and put it on a graph [unintelligible 00:31:29.09] per second, but I don’t know, something doesn’t quite drive that point home to me as in actually seeing it in the application, and seeing the havoc that it causes.

I think part of why you don’t get to see that with metrics is I don’t think we – I don’t think there is a metric that exposes the classic GC sawtooth, where you allocate up, and then the GC happens, and then it drops. I don’t think there even is a metric that fully captures that, so that might be why. But the execution trace is [unintelligible 00:31:57.05]

And I wanted to stress your point that you made earlier in terms of patterns, is looking at gaps often is much more important in execution traces than looking at the actual things executing. I mean, we put a lot of effort into having all these different kinds of events, and capturing them, and putting them up on display… And they’re incredibly useful, because when you’re looking at the gaps usually, you want to look at the edge of the gaps to see why the gaps are happening. But the gaps end up being the most important, right? Like, if you have a web service, and you think it’s running well, then all of the lanes in this sort of classic scheduler view that Felix was talking about, it really should be packed full. That’s how you know things are going well. It’s when they’re not packed full that something is wrong. And so looking for those gaps is maybe the most important thing that traces can provide.

What are some other things you will be looking for? Or kind of what we call –

I mean, personally, I always look at the GC metrics, because sometimes it can point at something unexpected. I mean, I’m sure many are familiar with optimizing your application – well, not everybody, but many people are familiar with optimizing your application by trying to reduce the number of heap allocations; basically, the amount in memory that counts toward garbage collection. You try to reduce the overhead of garbage collection.

And the latency impact of garbage collection is often hard to see, or hard to understand, I suppose… But in an execution trace it’s much more visible. I won’t go into the details, but there’s mechanisms where goroutines might go off and do some garbage collection work for a little bit, and that is plainly visible in the trace. So you can see “Okay, actually, my latency issue seems to actually stem from a garbage collection issue. Maybe I should spend some time working on reducing the amount of memory that my program allocates”, and stuff like that. Or perhaps it suggests I should look at go GC, or [unintelligible 00:34:03.26] and see what I can do there. Or perhaps it suggests I need to resize my instances. Like Felix was saying, if you see a lot of goroutines that are able to run, but are not running, it sort of suggests the server’s overload.

[00:34:21.14] Felix, do you have anything to add, or have we kind of covered all the ones that are your go-to’s?

I don’t know if it was mentioned already - system called behavior is something that is pretty interesting…

…because - well, first of all, system calls are implemented in an interesting way in the runtime, so that’s definitely something to watch out for… But yeah, in the execution traces you can see any sys calls that’s being made. That has already been true historically, but since the works that Michael’s been doing, they now also show the duration of the system calls. Previously, you’d just get a tiny faint little line that shows sys calls happening, but you have no idea, “Is it a slow one? Is it a fast one?” And it matters to the runtime quite a bit, because if a system call is long, then eventually [unintelligible 00:35:01.12] background goroutine will come along and decide that this operating system thread is now blocked in a system call, and that the runtime should detach that from the scheduler, and let the scheduler create a new thread to run goroutines on. Michael can correct me if I got this wrong so far, but – very interesting things are happening around sys calls and the runtime that are worth studying, and the new execution tracer can show these much better than before.

Yeah, totally agree. As someone that works on the runtime, being able to see that is super-important, because there’s - I don’t want to call it a heuristic, but I will say that the handling of system calls efficiently is not an exact science, and we’re kind of always thinking about ways that that could be improved, and being able to observe that from a runtime perspective is incredibly useful… But coming back to the thing I said before, about challenging your assumptions about how your programs run, being able to see how much time is being spent in sys calls is incredibly useful. You’ll see that a little bit in CPU profiles and stuff, but here you get kind of a latency-oriented view of the [unintelligible 00:36:06.18] sys calls, which is incredibly useful.

What’s also kind of fun is that the runtime treats cgo calls like sys calls, so you also get a lot of visibility into those kind of accidentally, but it can be very interesting as well, because cgo can often be involved in very interesting performance issues, because it’s not easy to use cgo correctly from Go.

Yeah, that’s one thing actually that - I think Dominic has filed an issue against us to not count cgo calls as sys calls, because there’s obviously some overhead to tracing, and cgo performance is one of those things that is always nice to have faster. You want the bridge between Go and C, and Go and other languages in general to be as fast as possible. Today it’s okay, because it’s not that fast to begin with, but you could imagine that if one day it gets faster, then tracing could actually be a significant impediment, and we might want a different way to represent that in traces. But for today - yeah, it’s great.

Yeah, but don’t rely on that use case. [laughter]

Yeah. Felix, you have an upcoming talk about this at GopherCon Europe in June, and you will be talking about all those things, and also a little bit about frame pointer unwinding. Do you have some spoilers?

Oh, yeah. So the talk is called How to Win Frames and Influence Pointers, and it’s partially the story that we’re telling today, of sort of the collaboration on makings execution traces better, but it’s also very specifically about making stack traces faster, also called stack unwinding… Because that was actually one of the things that was previously causing the 10% to 20% of overhead in the execution tracer. 90% of that overhead was essentially taking stack traces every time an interesting event happens in the scheduler. And the way that was optimized was by implementing frame pointer unwinding, which is actually the simplest way to unwind. It relies on the compiler to put some instructions inside of each function so that every time a function is called, you essentially create a linked list that lets you traverse your stack. And along this linked list you find the return addresses, which are the program counters you need for your stack trace…

[00:38:19.03] The Go runtime had supported this for a long time, so my initial idea was “Oh, it’s gonna be great, it’s gonna be so easy. I just have to essentially write like a for loop that traverses a linked list.” And actually, that part was really simple, but it turns out that there were some subtle issues on Intel, and some not so subtle issues on ARM that prevented this from working correctly. Some of them had to do with literally influencing of pointers when stacks get copied when the Go runtime has to grow stacks, and has to adjust pointers on the stack… Because some pointers can be self – well, not self-referential, but they can point back to the same stacks that they are sitting on, and then when you copy those pointers to a new stack, they actually have to be adjusted. And so that is kind of what the second part of the title is hinting at, the work that was done to correct those issues and make frame point unwinding actually work well in the runtime.

Generally, for the pointer unwinding, isn’t Go everything is stored in registers, like the return address? Or is this with offsets?

Now we’re getting into the weeds…

Just a short question. [laughs]

Okay, the short answer is, generally speaking, most of what you would call a stack trace is called on the stack; the return addresses should mostly be on the stack. But your first entrypoint is usually in a register, and depending on the architecture, if it’s Intel or [unintelligible 00:39:39.15] architecture like ARM, then maybe a little bit more sits in another register… But yeah, that’s a little architecture-dependent. For the most part, it’s just on the stack, right at the entry side of the frame. Actually, I think for ARM it’s at the bottom of the frame, depending on how you look at it.

This brings a cross-question in my mind, because we have a future episode planned about how Go is great and easy for cross-compilations to different architectures, and like you’re saying, depending on the architecture… And I really want to ask about that, but we’re also like 45 minutes into the episode. Definitely adding this on the list of questions for part two, right after what are the patterns you have recognized.

But another short question, and probably the last question for this episode before we hop to the unpopular opinion - the previous question was “What do you do when you look at the traces?” and you said you’re looking for specific patterns, you’re looking for gaps, you’re looking for specific fields, and specific values… So I cannot help but ask, out of personal interest, are you doing anything with like throwing pictures of the trace output to AI, whatever your favorite LLM is does vision, and say “What would you do different? What is the problem?” and so on. Any experiment like this, any thought like this?

“Not yet” is a good answer.

I would say not yet. I don’t even know where I would begin on that. I don’t know how one would come up with the corpus of data for this. I feel like every problem I’ve resolved with traces has been so different… And I don’t even know how I would begin with an AI trying to – like with the training set for an AI trying to tell me, or trying to tell the AI like “This was the pattern that caused this problem. And this was the fix.” Well, maybe I just told myself exactly how to do it right there. [laughs]

Yeah. You don’t need to do the training data. Throw it on an existing LLM, see how it behaves. Maybe it knows already something.

I don’t know. Felix, have you got anything?

The after episode conversation is going to be interesting. Yeah. Felix, anything to add?

[00:41:48.18] I have not thrown execution traces at an AI yet, because generally my experience has been that things I’m very familiar with, the AI performance disappoints me, especially around performance. But I have actually taken pictures of data analysis things I’ve done recently actually related to profiling, where I was testing something with sample rates, and what a profiler could detect in terms of how often a function could be called… It’s impossible. But it produced a very interesting graph; it was doing a simulation, and it was producing an oscillating curve that was kind of the oscillations were getting further apart, and then eventually trended down… And I uploaded that to ChatGPT and I was like “Have you seen this pattern before?” and it correctly described the pattern that I was seeing; it did not figure out where it had seen it before, but… Hey, 50% of the way there; it already recognizes what’s in the picture. And eventually, it will be able to connect the dots and give me the right connections. Because yeah, sometimes that’s all you need, connecting sort of one area of knowledge with another one, and then you [unintelligible 00:42:45.07] and off you go. So I think the future is interesting.

Was it the free version?

No, no. I paid, and I got nothing. So I got a preview of the future. I got a preview of the future.

Yeah, that’s fair. Yeah. Interesting. Okay, cool. Also something to talk about part two of this episode, see how the AI developed since our conversation today. But until then…

Jingle: [00:43:14.07]

Okay, so unpopular opinion is where you say your unpopular opinion. It doesn’t have to be about tech, it can be about anything. And then we will make this into a Twitter poll, an X poll, and if many people disagree with you, then you reach the Hall of Fame. Who would like to start with their unpopular opinion?

Sure, I’ll go first. I was actually – I mentioned Austin’s name earlier, and I actually asked Austin… I’m struggling to come up with an unpopular opinion, and I stole this from them. My unpopular opinion is let’s deprecate the entire runtime package. And I’m not talking about like the garbage collector, or the scheduler, I’m talking about the public API of the runtime package. It has so many warts; so many ways to use it incorrectly. A lot of the APIs have replacements in other packages already. A lot of them should have replacements. I have the list open here, and I could just go off. I could just start talking about – maybe some of the functionality wouldn’t come back. Maybe that’s the unpopular part of this unpopular opinion… That some of it should just not come back. Though some of it is also a necessary evil, like runtime.gc, runtime.lockosthread, if anyone’s familiar with them… I think you would agree that they are necessary evils.

[00:45:01.01] So that’s several unpopular opinions. If we are to tweet one of them, which one do you choose? Is it just the first one that you said, that this should be deprecated altogether?

Yes. Let’s go with that.

Alright. Registered on the registry. Felix, you’re up. What is your unpopular opinion?

Yeah, [unintelligible 00:45:19.18] of referencing previous episodes, Matt Boyle recently shared his opinion that Twitter is still a good place for Go developers to hang out… And if you’re gonna have unpopular opinions on Twitter saying that Twitter is great, you’re not trying hard enough. So my unpopular opinion is that while this might be true today, I think that LinkedIn is actually going to eat tech Twitter over the next few years, and that everything is going to move over there. I am not happy about it, but this is what I see happening.

Interesting. Why do you think that?

I noticed that engagement on Twitter has been trending down. I am not entirely enamored with the current ownership of Twitter, and I see LinkedIn just on a growth trajectory that is solid. And they’ve been growing tremendously over the past decade. I think they will continue to do so. It’s a weird place to hang out, but I’ve just seen more and more conversations happening there. I have much less people following me there than on Twitter, but when I post stuff there, I get at least a similar amount of engagement. So I just observe the trend. I’m not happy about that trend, because I think LinkedIn is weird, but… I think it’s happening.

Yeah. Can I interest you in a Java job? [laughter] Some embedded role…

You’ll do great on LinkedIn, tech Twitter.

Some of the weird messages people get weekly there… Yeah. Interesting. Let’s see how that acts. My unpopular opinion that I already happened to tweet and check… So I think it’s unpopular, because most people thought otherwise. When you are working from home, when you are alone, music makes sense to listen to it through your earphones, not through your computer speakers.

Why do you think it makes sense to listen to it through your headphones?

I think it’s because I had German neighbors for long enough… I’m now bothered by the thought of bothering other people.

You know, I kind of agree. I used to listen to music and play stuff through the TV just out loud for pretty much all my life. But recently, I’ve started to really enjoy playing games and watching TV in headphones, because I get a stereo experience that I don’t get to hear otherwise.

Yeah, the quality is –

Yeah, the quality ends up actually being a lot better. So I’m with you on that one, actually.

Yeah, I do the same, actually, so I’m not sure how unpopular this is gonna be at the end of the day.

My small sample group of Twitter followers said that it makes more sense on the speakers, but let’s see what this says. Cool. Thank you very much for joining and talking about the tracing tool. That was interesting. All the updates were interesting. I hope the [unintelligible 00:47:56.12] flight tracing goes well and leaves the experimental, and we will speak about it joining the standard library next time. Until then, enjoy your… Traces. May you have less anomalies. And thanks for joining.

Thanks for hosting.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00