We’re once again exploring hacking in Go from the eyes of security researchers. This time, Natalie & Ian are joined by Ivan Kwiatkowski (a.k.a. Justice Rage)!
Sourcegraph – Transform your code into a queryable database to create customizable visual dashboards in seconds. Sourcegraph recently launched Code Insights — now you can track what really matters to you and your team in your codebase. See how other teams are using this awesome feature at about.sourcegraph.com/code-insights
FireHydrant – The reliability platform for every developer. Incidents impact everyone, not just SREs. FireHydrant gives teams the tools to maintain service catalogs, respond to incidents, communicate through status pages, and learn with retrospectives. Small teams up to 10 people can get started for free with all FireHydrant features included. No credit card required to sign up. Learn more at firehydrant.com/
Honeycomb – Guess less, know more. When production is running slow, it’s hard to know where problems originate: is it your application code, users, or the underlying systems? With Honeycomb you get a fast, unified, and clear understanding of the one thing driving your business: production. Join the swarm and try Honeycomb free today at honeycomb.io/changelog
|Chapter Number||Chapter Start Time||Chapter Title|
|4||04:00||Welcoming Ivan to the show|
|5||04:26||Getting to know Ivan|
|6||05:20||Reversing the SolarWinds attack|
|7||07:51||Famous Go malwarez|
|8||13:21||How compiled Go is unique|
|9||19:53||A primer on reverse engineering|
|10||24:40||Surprises while reversing Go|
|12||32:30||Do Go's idioms help reversing?|
|13||35:26||Top languages for developing malware|
|15||41:09||Tips for writing secure Go code|
|17||46:37||How Ivan got in to Go|
|18||50:40||Tips for learning to reverse engineer|
|19||55:29||It's time for Unpopular Opinions!|
|20||56:04||Ivan's many unpops|
|21||57:11||Do app stores improve security?|
|22||59:56||Cyperspace regulation unpop|
|24||1:03:04||Outro + clip from episode #205|
Play the audio to listen along while you enjoy the transcript. 🎧
Hello, everyone who is joining us today, on a Wednesday of the recording. We normally record on a Tuesday, but we have a very special guest, so we need to make a very special event about that. Ian is my co-host today. Hi, Ian.
Hey. How are you doing, Natalie?
Good! I’m very excited to have Ivan today join us. Ivan Kwiatkowski , also known on Twitter as @JusticeRage. You are a senior security researcher at Kaspersky.
Yes. Hello, very happy to be here. Indeed. So I work in the threat intelligence field, and my daily work involves looking at malware and writing reports about it. Basically, the activity that I’m involved in is trying to figure out what the attackers are up to, what kind of tools they’re using, methodologies, what types of victims they are after, and then we write stuff about it. And our customers read our reports and then it allows them to figure out whether or not this group or this group is likely to attack them or not, depending on what type of information they are after, and if so, how they may defend from those attacks by knowing more about the type of malware that they use, the type of attack vectors that they typically favor, and so on. So really, I spend my day in IDA Pro most of the time, and sometimes as well I do give out trainings, or reverse-engineering, either at universities, or for our customers as well.
And there’s a very cool video that has two parts of you reverse-engineering a malware written about a year ago, that was written in Go, actually.
And that was from the SolarWinds attack.
Exactly. This specific example comes from the SolarWinds incident, which I’m pretty sure that most listeners will be aware of, because it was such a high media impact case. To make a quick summary about it, what happened was a company called – I always get those mixed up; I think the name of the company is SolarWinds, and then the product is Orion IT, but maybe the other way around, right? I do really get confused about this all the time.
I think the way the way you have it is right.
Okay, great. That wasn’t really a 50/50 chance there. Anyway, this company got attacked, but it wasn’t attacked for the information that it had, because it was just a software company, which in itself had little value as an intelligence target. But the thing was that it had a high number of high profile customers, and these customers were US government entities, or big companies in the field. And what the attackers did was they were able to compromise the software build chain, and they were able to insert their own code inside of the software that was then pushed to the customers. And using this, they were able to create a backdoor that would be automatically deployed at all SolarWinds customers. And then maybe two weeks or three weeks later, because this very stealthy attack had a very long sleeping time - it stayed dormant for a while, to make sure it would remain very stealthy… But after a while, then it would start connecting to the city server, and then for all the targets that were deemed interesting by the attackers, they would receive a second stage payload that would allow them to get into the network and then collect intelligence and whatnot.
So the very first stage of the attack was just some modification of the code of the original program. This part was written in .NET. But then the second part, which is called SUNSHUTTLE, was actually written in Go language. So it was for me like the first time I was getting involved in reverse-engineering for the Go language. The learning curve was a little steep, but then again, I kind of used this as a learning experience, but also as an example in future reverse-engineering courses for other people that might be interested in learning how to reverse-engineer Go programs, but also, I think, if you are a Go enthusiast, reverse-engineering can allow you to get to know more about how the language actually works under the hood, which I think is also very interesting from a software development point of view.
So that’s one famous example of Go malware. Are there other famous ones written in Go that you can think of off the top of your head?
[07:58] Yeah. So from the same incident, one of the companies that was breached through the SolarWinds incident was Mandiant; it now belongs to Google. And they were the ones that actually detected that there was something wrong in the network and reported it… And so kudos to them really, great job on figuring out that something was wrong. But one of the things that the attackers were very interested in was getting access to the tool sets that Mandiant was using for their own penetration testing and red teaming engagements. And it so happens that the tools that they were using were actually written in Go language, which I think is really interesting from an analyst perspective. So I think there’s an interesting discussion to have about why they chose this language for their own offensive tools. There are a number of other projects on GitHub, which I can probably – I can think of one called Stowaway on top of my head, which has been also reused and modified by some threat actors…
We’ll add a link to that in the show notes. That sounds interesting.
Yeah, sure. It’s a networking tool. It’s really something that proxies the stuff in and out of a network that goes between protocols, and that kind of stuff. It’s written in Go language; pretty annoying to reverse-engineer it, because it’s a lot of goroutines talking to each other, very hard to figure out how it’s architectured.
And another example I can think about, that I’m not 100% sure, but I do believe that a commercial backdoor called Brute Ratel, which is a big competitor, or a new competitor maybe to Global Strike, which places enormous emphasis on evading detection, and being able to slip through EDR solutions etc. is also written in Go language, I do believe; but I would have to double-check that. So these are examples of malware families written in Go language, and I think that over time we’re going to see more and more of them.
Why do you think we’re going to see more and more? Is there a specific reason? You mentioned that they were hard to reverse-engineer… Is that part of it, or all of it, or…?
Yeah, there are a few reasons. The first reason I think is probably related to the ease of use for the developers. I don’t mean that Go is easier to program than other languages, but the fact that it generates statically-built executables, binaries that are self-contained, that do not need any additional libraries, is kind of very comfortable for attackers. They create their backdoor, they send it to the victim, or even they deploy it at the victim one way or the other, and then it just works. You don’t have to think about, “Is this DLL present on the system, or do I have to pull in additional libraries?” etc.
So this is something that makes running programs very easy on victim machines, where you do not control the environment. A long time ago, maybe 10 years ago, it was kind of a problem, because you cannot send binaries that are two or three megabytes big to victims; if your attack vector is an infected PDF or infected Word documents, then you cannot really send over email a PDF that ends up being five megabytes big, because back in the day it would be rejected, or maybe the victim has some limit on their mailbox, or maybe they had a slow connection that is not going to be able to retrieve that binary. In Europe, or in the US, in the Western world it used to be fine, but if you think about victims that are in third world countries where the internet access is not as good… And it used to be some real issue for attackers; now that the internet connectivity is pretty much – well, at least way better in most parts of the world, then having backdoors that are there 5, or 10, maybe 20 megabytes is really not that much of an issue anymore, I think.
Then the second very good reason for using Go as an offensive language is going to be that reverse-engineering is difficult, which I will get back to, but also, all the standard tools that we as defenders tend to use in order to figure out quickly if a program is malicious or not tend to kind of break with Go language. The reason for this - and it ties into the discussion of why the reverse-engineering in Go is annoying for us, is that Go tends to really do its own thing. The assembly it generates really does not look like any other assembly. It’s not like C, or it’s not like C++ or Delphi, that kind of tend to look like distant cousins, or even brothers in some cases. Go really does things its own way, and all the automated methods or analyzing code statically, or maybe signatures you can recreate for Go language, etc. But old tools that would try to recognize specific patterns in code are not going to work, because the code generated by Go just looks like nothing you’ve seen before.
So that’s one reason… And then the final reason reverse-engineering is really difficult for us is because the constructs that are generated by the Go compiler tend to be very unfamiliar to us. And so the learning curve - I wouldn’t say it’s that steep… You mentioned, Natalie, that I had released a few videos about it - I think by the end of the videos you can have a rough idea of how to approach those programs. So it’s not like an obstacle that is insurmountable; it’s something that eventually you will be able to figure out. But when you’ve been working on similar-looking code as C for ten years, then sometimes learning something new is not something that you are easily going to do, because you have your comfort zone, and then you have to discover something different, and maybe you don’t like to do this. And maybe you have ten easy malware redundancies that are waiting in the test list, and then you are going to work on those first, because it will allow you to end your day earlier next Friday, right?
So you kind of mentioned that there’s assembly differences that make it hard to recognize… Are there any specific things that you’ve learned about Go under the hood from that? …that differ from C, like how functions are called in the assembly, or something like that.
Yeah, absolutely. So one of the major differences – it’s not really about the assembly itself, it’s about the static aspect of the executables. It’s the fact that all the functions are pulled inside the final binary, and then you have this big program that’s two megabytes or three megabytes big just for a print Hello, world. And now it’s getting a bit better. I think IDA Pro has made significant improvements in its later versions… But maybe two to three years ago, when you were opening a Go program, you would have nothing recognized at all. Maybe you would be able to pull a few plugins here or there, or Python scripts that may or may not work… And in that case, if you were lucky, you might have been able to create signatures for the well-known functions, and maybe start from there, but it was really a huge ordeal. Now it’s a bit better.
So at least you are starting to get pretty reliably all the references to all the unknown functions. Beyond this, the Golang convention is – well, I’m not going to it’s weird, because it’s as valid as any other one, it’s just not the same one that we are used to seeing. The main difference is that considering that Go can return multiple return values, then you cannot have the same system as we had before… Like, for instance, in the C program the return values goes into EAX, and that’s it. No difference. I mean, the EAX register of your CPU. When it comes to Go language, if you have three, four, or maybe more return values, typically one return value and also some error objects, if I’m not mistaken - then you cannot put all that into a single CPU register; it just doesn’t work. And so you tend to get values that – well, in the past, you would have all the arguments being passed through the stack, not through pushes, but direct moves from the value into the stack, directly.
So the instruction was not pushed, which these assemblers or automated analysis tools - they just like to see push, push, push, and then call. That’s something that is easy for them to recognize. But Go would just do “Move this on the stack, at this place, move this on the stack, at this place”, and then when you go to another function, it knows, because the compiler knows where the stuff ends up. So it figures it out. But the IDA Pro looks at this and is like, “What the hell is this? This memory has never been initialized before. I cannot show this to you.” There was an issue, and then the return values were given back exactly the same. So the program would just move back all the return values onto the stack as well, at places that it would be able to figure out later, but then when you look at IDA Pro, then it sees “Okay, values being moved on the stack”, you go back into the Golang function and then you see references to the stack as well, but the offsets are going to be different, because since you are returning from a function, those things have shifted a little bit, and so the offsets do not work well anymore… And so this is like another issue that you have to face, like figuring out where your return values go. It still is, by the way, a terrible nightmare.
[16:18] And finally, there is this other key difference… And this difference is the fact that usually the C compiler and other similar compilers will tend to reserve some space on the stack for specific local variables… And this tends to be very reliable; it doesn’t move too much. So when you have some variable in C, it gets used in some part of the program, it’s in that one place in the stack, and then that’s it. And if the program needs another local variable later on, then there’s just another space located for this in the stack. And the Go compiler tends to be very smart about these things. What it does is if it sees that there used to be a variable at some place on the stack and it’s not used anymore, then it will feel like it’s totally okay to reuse the same space to store something else later, which makes total sense. I mean, do not use more memory than you need to, right? But the Go compiler is totally right in doing this. But for me, it’s really, really a problem, because what I do IDA Pro is I try to figure out where the local variables are in the stack, I name those positions by saying, “Okay, this is the error variable, this is the integer that represents an iteration count”, or whatever, and I name or rename everything I can, and then eventually stuff starts to make sense, because I know what represents what on the stack, and I know what the variables are etc. But the thing is, if one position on the stack does not consistently represent a specific variable, then I cannot rename things anymore, right? There’s just no way for me to do this, and the tools that we have, such as IDA I’m pretty sure Ghidra is going to function the same way; it’s not going to allow me to say “Okay, up to this point, this variable should be named like this, and then from there on, then it should have another name, and then yet another” etc.
So this is like a very, very difficult thing for us, trying to track down variables and return values, even arguments, is something extremely complex… And basically, this is the normal flow of how you analyze a program - you try to figure out what the variables are, you try to look at the functions and how they are called, what they return, and that kind of stuff. And just doing those simple things that would be the basic operations and building blocks of trying to understand what is going on in some random program are in themselves extremely complex operations due to like optimizations that were performed by the Go compiler.
Now, the last thing I can mention is that since version probably 16.1 or something like this, or 1.16, I guess, in Go, the Golang convention actually changed, and they do things even smarter now, which is pass some arguments through the registers and not through the stack. For me, it doesn’t change that much. Actually, it makes things a little bit easier, because at least I know argument one is [unintelligible 00:18:50.27] from memory. It might not be that one, but generally, it’s going to be in a fixed register, at least for the two first arguments, and so I know where they are. That’s way better. But overall, this doesn’t change this bigger game of renaming things, which is not possible anymore.
And then when it comes to the quick and easy mode, which is getting my super-expensive IDA Pro license that comes with a decompiler, then I just open a program, press F5, and hopefully I can read whatever is going on in the program - well, that just doesn’t work, because the constructs that are generated by the Go compiler, especially I think when it comes to function calls, is totally alien to IDA, and every time you try to decompile code that comes from the Go language, you just end up with something that makes absolutely no sense… Because again, IDA tries to recreate pseudo C code, and pseudo C code that has just no way of representing concepts like multiple return values, or that kind of stuff. So this is a way that Go breaks everything that we hold dear in the reverse-engineering world.
[19:53] For anybody who didn’t watch the video or is not familiar with how to do reverse-engineering, I can in simple words say that roughly you look at the instructions, and then you try to kind of see - the entry point is usually main, so this is probably function main, this is one thing that’s been returned, and then you kind of try to follow that… Basically, this is what you do when you reverse-engineer.
Yeah. Actually, maybe I can say a few words about what reverse-engineering is for people that might not be familiar with it. The general idea is that we try to understand what a program does, even though we do not have access to the source code. But this is the typical case for malware, because we cannot call up malware authors and tell them, “Okay, please show me the code, because I don’t really understand what going on in there.” We don’t know where they are, they don’t want to be found, and they don’t want to give us their code anyway. So what we have to do then is - we have no other solution but to look at the program and see what instructions the program is sending to the CPU, and then try to figure out from there, based on those instructions that are working at the CPU level, what the higher-level line of code that might have generated this type of instruction might have been. So it’s not entirely a guessing game, because it’s sort of a mostly exact science… But also, it’s a very unnatural operation to perform, because this CPU language was really made for CPUs and machines, and for us humans, it’s extremely difficult to understand. It’s really not something natural for human beings to read those instructions. It doesn’t make sense to us, and it really requires a lot of effort to figure out what the programmer’s intent was just by looking at those instructions. So this is why, actually, we are looking for reverse engineers. I mean, not just at Kaspersky; the whole industry is looking for people that are able to do this, because it’s something that most people find unpleasant, and I have to say myself, I do find it unpleasant most of the times… But at the end of the day, when I am able to figure out what was actually happening in the program, I feel very good about myself, and so this is the reason why I still do this job. But overall, this is kind of a difficult thing to do, and it’s kind of painful, and it takes a lot of time to be able to figure out even the simplest programs.
Especially when the tooling is not even there for you.
Just for some reference, the ratio between lines of, say, Go, to assembly - do you know what that ratio is? Just roughly… 1 to 100, 1 to 1,000?
It’s a good question; it would depend on the complexity of the line. In Go I’m pretty sure that you can do function calls that are chained together in long lines. I’m not sure if it’s compliant to the official Go styling code, or something like this… But if you were to do this, then you would have a – I mean, let’s take it from the other way. If you have some normal-looking Go code, like a Hello World or something like this, it would probably translate into 10 or 15 lines of assembly. So I’d say the default would be 15 lines of assembly for one line of actual Go code. But then if you get up into lines of code that are a bit more complex, that return multiple return values or function calls, then this can get a bit bigger… But this is still going to be the right ballpark.
Okay. Yeah, that gives me a good idea.
What does it for other languages? Is it a lot more? Is it a lot less? Is it roughly the same?
I would say it’s probably going to be mostly the same. C++ tends to be very [unintelligible 00:23:09.17] it’s very comparable to to Go. C might be a bit more direct, like the translation between C and assembly is going to be a bit more – how would I say it in English…? The correspondence between C code and the assembly is going to be a bit more direct. That’s it. But otherwise, I would say this is like a common ratio for languages. The problem is not that Go generates more assembly, the problem is that the assembly generates is not the one that we are used to seeing, and we don’t like that.
Interesting to see if in one or two years from now it will be more supported and more pattern recognition working…
Well, that’s the thing, right? It kind of depends on the attackers. If we do end up seeing more and more Go tools out there in the wild, then there’s going to be pressure on the tool authors, like either IDA, Ghidra etc. to implement better detection, and better support for those languages. I’m pretty sure that since last time I tried using a decompiler on some Go program, IDA has made improvements, and it’s probably not as broken as it used to be. But if we keep seeing offensive tools written in Go, then I’m pretty sure that the tools will get better.
[24:16] We will still have to figure out how the Go assembly works, especially if it changes again in the future… But overall, at least the support in the last years has improved tremendously, and I think it will continue to do so also in the future, if there is a need to. And I would guess that Go is only going to become more prevalent when it comes to offensive software.
Because of all the reasons that you mentioned.
Some specific questions… You mentioned that – you were kind of thinking out loud about the behavior you see in IDA Pro when you were looking at the Go code that you loaded there, or the binary of it that you loaded there… So some – I’m gonna describe two things that you mentioned, and tell me how if you think it’s good, if it’s bad, how it compares other languages… This is an interesting kind of point; it can get too deep, so we’ll try to keep it on a slightly high level for everybody who is kind of hearing about this and not very well familiar… So for example, you mentioned that skipping to the next instruction lands you in another place in the code of the CPU instruction.
Yeah, exactly. So this is something that was super-surprising to me, which is when I reverse-engineer programs so we can look at it statically in IDA Pro, which means you display the instructions and you read them like a book… Or there is another approach, which is not like opposite, but maybe more like a complement to it, which is to look at the program inside a debugger. The debuggers - they just work exactly the same as in the software development world; you execute the code instruction by instruction, or line by line, and you can see the state of the various variables. Except for us, we don’t have the source code, so it’s not lines of code, it’s just assembly instructions. But we can still watch them execute one by one, and we can see the CPU registers getting updated etc. And when I was doing this with Go programs, I was very surprised to see that sometimes I would step from one instruction to the next and I would end up at a totally random place somewhere else in the program.
And eventually, by doing some Google searches, etc. I figured out that it is actually the – I don’t know if it’s the Go scheduler that is involved in there, probably it is, But there is a garbage collector that is in charge of freeing the variables that are not used anymore. And sometimes it takes priority and starts freeing stuff. And then once it’s done running, it takes you back where you were in the program. And so this is something that is super-jarring for us as reverse-engineers, because we are looking at a very specific place in the program, we are frowning, looking very concentrated and focused (Because we are), looking super-serious. And then there we press F7, we step into the [unintelligible 00:26:43.12] and suddenly, we end up somewhere totally different, even though we didn’t see any jump instruction. Suddenly, it’s like, “Oh, something is going on. What’s happening with my program there? …because it’s not supposed to just go somewhere else.”
Now, once I was able to figure out what was going on and understand that I just have to get out of this garbage collector function - and it will take me back exactly where I used to be, and things were fine, but initially, it was another one of Go’s idiosyncrasies that felt super-alien to me. I wasn’t happy about it at first.
So that means it’s not a behavior that you see often in other languages…
Oh, no, it’s something I had never seen before. I know that other languages, they do have their own garbage collectors, but when it comes to Java, we don’t really have to look at the instructions, because Java is compiled to bytecode. So we just read the code disassembled or decompiled maybe, and get access to something that looks like the source code. It may be obfuscated, which means that it will be modified in a way that the variable names are not there anymore, or it has been specifically engineered to be harder to read… But in that case, or for .NET, or for Java, we just never have to worry about CPU instructions, because they are not that relevant to the language. So Go was for me a big surprise on that level, because this was the first time I had to encounter debugging your program and being taken far away somewhere, without even asking to. And it kind of happens on a regular basis, too.
[28:05] And then one more question about another behavior that was peculiar, that you pointed out… That at some point, when you had two following instructions, and they were using the same variable, you didn’t see the return, but because it was right the one after or before.
I’m not sure if I remember exactly the part that you refer to… But what I noticed is - yeah, this might be one of the other ways that the compiler in Go is being very smart, which is that if you have chained function calls, it turns out I think that the way that arguments from one functions are returned on the stack happen to be the exact place where they would be considered as arguments for the next function. So you don’t really see the data moving back and forth from the functions; you just have chained calls, and the compiler knows that whatever was returned happens to be at the right place for the next one, etc.
So one of these other things that we are used to seeing, like we see a function call, we look at the input – we look at what goes in and what goes out, basically; this helps us understand what is going on. And with Go, sometimes you just don’t see that, because it’s hidden from you. The complexity tends to be – well, the complexity is still there, but all these operations are masked by the way that the stack is constructed by the Go compiler… Which, again, is a super-good thing for Go programmers, because it means that you don’t have those memory movements that are taking place in the program that are actually not that useful. And every time you have a movement that involves the memory in a program, it takes a lot of time. I mean, not a lot compared to our human existence, but if you look at how a CPU works, you have the CPU that has some memory regions inside of it, which are called the registers, and then you have the RAM as well. And when you allocate memory in a C program with a malloc or calloc, it goes into the RAM. Or when you move something into the stack, it’s also a region of memory that is on the inside of the RAM, the RAM stick of that computer. Every time the CPU has to talk to the RAM sticks, there has to be an electrical signal that goes from the CPU through a bus to the motherboard, and the motherboard understands it has to request the specific region of data to the RAM sticks, and you have the response that goes back the same way, converted into electrical signals. So it’s pretty fast, of course, when it comes to – it’s probably in the ballpark of microseconds or milliseconds… But compared to just the CPU talking to itself, or moving stuff inside of the physical area that is the CPU, or just not moving things at all, because they are already in the right place, then you get performance increases that I think are pretty significant, especially considering the amount of function calls that you have in the program.
It’s very interesting to hear about this from the perspective of somebody who’s kind of poking this out from the outside…
No, that’s this makes me want to dive more into the reverse-engineering just to learn more about the internals.
So let’s maybe move to a bit of a higher level now. Go’s community is kind of big on consistency; we have like the linters, that keep everything consistent, go format keeps everything consistent… Does that actually helped with reverse-engineering at all? …just the only one way to do thing. Or at the level that you’re doing reverse-engineering if you think it doesn’t matter.
It’s a good question. I have to say, I don’t know that much about the linter itself. I have written a bit of C code myself. When I was trying to like look at assembly code and write Go at the same time, that would generate the same thing. So this is my extent of the experience with the language, and I really noticed something, which is that the Go language is super-strict. I have, in the past, used the expression - maybe it’s going to make you laugh… I was saying that in Go if [unintelligible 00:33:13.22] return values, then the program is complaining. If you have unused variables, then the program complains again, right? And I was saying that to me, Go feels a bit like fascist Python; like, it doesn’t let you do anything that you want, Except if it follows the rules very strictly.
For us, it doesn’t matter too much, in the sense that those checks are enforced at the compiler level, right? It’s something that if the code is not compliant, then you will not get a binary at the end. So it does not add additional stuff inside the binary, And also, if there were some variable that is unused inside the program, then as reverse-engineers, we would not care, right? Because we would just consider that it’s not used anymore, or probably the programmer doesn’t need it, for whatever reason, and we would just move on.
[34:01] So for us, it doesn’t really change that much, although knowing about those guarantees kind of allows us to make more informed guesses about what is going on in the program. Like, for instance, when I do you see a function that returns multiple return values, then I am not a Go developer, but still, I am always going to assume that the last value returned is going to be the object; or the first one, I don’t recall. I will have to check. But I know that since this is the normal way that people are supposed to write Go code, and since I know that the compiler is going to force people to do it, even if they don’t want to, then probably I can base my hypothesis on those conventions, which is actually pretty helpful in that regard.
So would you say that Go is a good language to pick up for a hacker, or for a researcher in security?
Well, I’m not really in the business of helping attackers new being more efficient at varying offensive tools… But if I were to, then yes, I would guess that Go is probably a good language to pick up. Basically, anything that is away from the traditional languages is going to be more annoying for us, because we’re less used to it. I think Rust is going to be a good choice as well. I haven’t looked at Rust too much myself. I have a coworker that did, and also recent videos… And from what he’s saying, it’s like C++, but harder, which is kind of a high standard to beat. So yeah, just Go and Rust would be my advice there… Although it’s not advice; please, don’t.
So if those are kind of the new school ones, Go and Rust, historically, what languages has everyone used on the hacking side and on the research side?
Well, historically, everything has been used. You know Murphy’s law, which says that if there is a way to misuse something, then it’s going to be misused, right? And programming languages have proven time and again that law. The thing is, we are recipients of whatever the hackers are doing, right? We do not get to choose what we are going to work on. Like, hackers are going to write their tools, and they’re going to choose whatever language is familiar for them, or whatever language feels comfortable, or whatever. And this is why we end up sometimes facing the most ridiculous stuff, like malware written in AutoIt; I don’t know if you know about this… It’s some weird scripting language that is used for UI testing, and basically allows you to simulate keystrokes and mouse clicks. Well, it turns out people write malware with this as well. Anything that has ever been available as a programming language has been, one way or the other, eventually used for malware.
So the thing is, this is our bane as reverse-engineers, which is that we receive malware, and whatever it is, we have to work on it… Because at the end of the day, our job is to figure out what was going on in that specific incident. And so whether it’s C, or C++, or it’s Go, or Delphi, or Pascal, whatever… Erlang maybe… I’m pretty sure there’s an Erlang malware. Whatever we receive, we have to work on, and so we cannot really afford to be picky about what languages we get interested in. We just have to be able to adapt to whatever comes, because everything will come eventually.
So you just mentioned right there, your research is on whatever hackers leave behind, let that be malware, or whatever. What other things do people leave behind? Is it just the actual binaries? Or like, are you digging into logs, and other things?
Yeah, so in a typical incident scenario, then you would have people that go into what we call forensics mode; they will collect all the logs, they will collect all the hard drives and try to figure out exactly what happened inside the network. They will collect not just machine logs with DNS logs, they will collect whatever event was generated by the Windows machines, they will collect whatever was saved by the HTTP proxy, and so on… All the NetFlow if it’s available… Usually, it’s not. Usually, not that much information is actually available in case of an incident. But that’s someone else’s problem. I’m not an incident responder, and I have enough stuff to worry about. But what I focus on is the actual malware. We do have information through the antivirus from Kaspersky that gives us information about the execution context… So we can see that, “Okay, this process launched this process”, etc. So we have this type of information. But in a bigger incident context, then you would get a much clearer picture about everything that went on in the victim’s network. And this whole trove of information would allow you to reconstruct the whole timeline of the incident.
[38:23] So you would see that, you know, at this time, you had some suspicious request on some web frontend, and then you’d see that there was a file created at a later date on the same web server, and then you would maybe see some weird, suspicious request to the Active Directory server, with some golden ticket with meme cats or something. Well, those kinds of lateral movement methods, etc. And at the end of the day, somewhere, some attacker would have to drop some binaries to help them either persist on the victim machine, or get further into the network, or deeper… Because they will try to do whatever they can without deploying anything. Some very careful attackers will not deploy anything on disk, and they will just deploy whatever program that they need inside the memory… Which is very stealthy, but also if the machine happens to reboot, then everything that was in the memory just goes away… And so if you have no way of coming back onto the victim’s machine, then all the access that you have deployed is lost. Some very stealthy attackers will decide that they would rather lose access than leave forensics traces on hard drive. Most of them, like 90%, 99% of them will feel like they would rather leave some kind of trace, knowing that most people don’t look anyway, and then leave stuff for us to analyze later, if we figure out that there was an incident and someone goes there, collects everything and just sends the binaries back to us.
You said the incident response teams are the ones that collect all that data, and all of that…
Yeah, exactly. So we do have such teams at Kaspersky, but most cybersecurity companies will have either their internal incident responders…
…or a contractor that they know often, that can be called at any hour of the day or the night, and that will come and just – exactly, swoop in with the big guns if something weird took place. Now, it doesn’t mean that we do not work in direct interaction with those teams. It means that this is their job, and then we get – we are more back-office guys, where we get escalated some stuff, and then we look into it.
But most of the intelligence that we create doesn’t actually come from incident response cases. I think it would be a good idea if we were able to gain more information from that source as well. I think it’s a very valuable one. But we work mostly on the telemetry collected by our antivirus - all the samples that are suspicious or that are uploaded to the cloud for analysis. And then we can also swoop in, but much more quietly, and look at all this data and see “Okay, this looks interesting, because, we’ve never seen this before”, or it looks like some malware that we saw 10 years ago and we haven’t seen since, and it has some modifications. And then we are interested in what happened since then. But our work tends to be a bit disconnected from the actual incidence, and really more focused on looking at the big data lake that we have, and try to understand what is relevant inside of it.
That’s cool. Thanks for that insight.
From the other side of this equation, what are some tips you can give for writing secure software for people who do Go? Or in general, if it’s not specific to Go, it’s also useful.
Yeah. I think one of the main appeals of Go is that you don’t really need to think about security as much as with other languages. Go is a memory-safe language, unless I’m mistaken, and the compiler is never going to let you do stupid stuff, like create an array that is too small, and then write stuff that goes out of it. Like, it’s just not possible. So it eliminates a whole lot of bug classes, which we call memory corruptions; it’s just not going to happen. You cannot do this yourself in Go. And it means that all the old school buffer overflows that plagued all the C and C++ programs for dozens of years by now just are not going to ever happen in the Go language. It doesn’t mean that the program is going to be perfectly safe from any security issues, but the issues are not going to be related to “Oh, I made a programming mistake, and if there is a bug in my program, it’s going to be exploited.” It’s going to be more related to design issues… A memory safe language does not help you implement a secure authentication scheme, for instance; it doesn’t help you write a well-thought-out network protocol.
[42:25] I saw that Go really helps you with cryptography. I noticed that it’s very difficult to choose algorithms that are not safe. By default, you can only – I don’t think you can choose the algorithms in Go by – I know you can do AES, for instance, but like the cipher mode, or those kinds of stuff tends to be, unless I’m mistaken, selected by default for you, and the defaults are good… So you’re not going to be making those mistakes.
But – oh, yeah, the IV… I was working on some code in Go that was relying on AES. I was looking at trying to figure out exactly how the IV was generated, and so on. I was seeing that nowhere the developer code, and doing some research, I noticed that it was actually Go that would, by itself, generate an IV for the encryption, this initialization vector, and then it would append it somewhere in the final encrypted buffer. And so usually, in other languages, this is something you would have to do on your own, and this is a like a big avenue for making mistakes. Like, if you choose a stupid IV, like just zeros, or if you do not select one at all, then you’re going to have encryption problems. Go would not let you do this.
So it’s very obvious to me that Go was created with security in mind, not for the developers, but by the Go creators. They don’t want you to shoot yourself in the foot, and they are going to make sure that there is no way for you to do it, unless you really, really want to.
Even though you do have all those kinds of protections, cryptography can be misused. If you choose a bad key, then nobody’s going to save you from that. If your protocol doesn’t work, then again, you cannot be protected from it either. But I think it allows people to focus on design flaws, instead of programming flaws. And this is already a huge burden off the shoulders of developers.
That is a very interesting insight.
That’s interesting. I see a lot of complaints outside of the GO community, just like Hacker News, about “Go is choosing your defaults for TLS, or not letting you do certain things…” But that’s one I’m firmly on board with. If I don’t need to think about it, I don’t want to. And I don’t want to make the mistake.
Would you be able to confidently select your defaults for TLS? I mean, I don’t think I would feel comfortable doing this. You have to be very well-versed in cryptography to be able to make those kinds of decisions. So it’s very good that Go is not making you do this, I think, in my opinion.
Another interesting – about your interest in Go, you mentioned that you started using Go because malware was thrown at you, kind of…
Yeah, exactly. So I wouldn’t say that I’ve started using Go; I would say that I was forced to learn Go. Not that I am unhappy about it… I’m not saying it’s a bad thing. What I’m saying is that I’m not really writing Go code myself. What I did was I had assembly that was generated by the Go compiler, and I was trying to make heads or tails from it. So what I did was I looked at the assembly, I was like, “Okay, this might be the Go code that generated this assembly”, and then I opened my Go IDE and I compiled my code and checked if it was the same on both ends.
Also, when I start to learn about a language when I want to reverse-engineer it, I think it’s super-useful to write some simple programs and just compile it and see how it looks at the assembly level. You know, just create a simple, stupid C function. Not C function, but some function that adds two integers, or something that will allow you to see what types of function calls the program is using, what kind of constructs the language is generating. The things that I had to face there was, again, the Go compiler being way too smart for my uses… And it tends to in-line all the function calls that are too simple. What I mean by this is, if you have a simple function that does almost nothing, and you call that function, then the Go compiler will be like, “Oh, this is not worth a function call. What I will do is I will take the code of all this function and put it inside the calling function.” And when you try to look at what a function call looks like in assembly, then this is not helping. But the good thing is, I was able to find the good flags for the compiler to disable all optimizations, and things then kind of worked out for me.
You mentioned that IDA, which is the main tool you’re using, and the other tool, are not really supporting Go. So if anybody wants to try reverse-engineer, to get into that, but also want to do that with Go, what would you recommend how to do that?
So if you’re going to reverse-engineer Go programs, I still think that you don’t have much choice there. So you’re still going to have to use either IDA Pro or Ghidra. I want to switch to Ghidra eventually, but I haven’t done so at the moment, so I cannot speak too much about its capabilities. I’m told that it’s being improved at a very rapid pace, so it’s probably a good choice… But when it comes to IDA - it got better. I think that a few months back, maybe a year now, you had my good friend, Juan Andrés Guerrero-Saade from SentinelOne on the podcast, and probably he told you about the various plugins that he wrote to help people reverse-engineer Go programs with IDA. I also contributed to his repository myself, with some script that I find useful..
But overall, even though IDA might not be perfect for the job, it’s still one of the two only tools that are available for the job. So you still have to work through it, no matter what. The thing is, I find myself thinking that even though starting with reverse-engineering Go is kind of difficult, it turns out that I find myself liking reverse-engineering Go programs way more than C++ programs, that tend to be extremely complicated with virtual function tables and the very complex structures that represent classes, and so on… Because when it comes to the Go language, it turns out that it kind of feels like a scripting language in the sense that everything ends up being a call to an API function, or a call to some function that comes from the Go standard library. And so if you’re able to take a debugger and look at all the arguments after you know how to do that, but if you look at all the arguments of the Go functions that are documented, by the way, and look at the return values, then actually, the meaning of the program tends to manifest itself, even though you don’t really understand all the instructions that are in the middle, and you cannot track all the stuff going here and there.
[50:25] So overall, my advice for people that would like to get started with Go reverse-engineering is, okay, it’s going to be very different from what you’re used to, but at the end of the day, I think you’re going to end up liking it more than you would think, because it’s going to be way easier than it looks.
How about those listeners that haven’t done any reverse-engineering, that want to get started? Do you have any good resources out there? I know that you personally have made some videos. Do you want to talk about that a little bit, and anything else that would be helpful?
So yeah, the videos that I put out are just related to the Go language. If you’re going to get into reverse-engineering, I would not advise you to start with Go. Not because it’s going to be harder or anything, but because probably, the basics of reverse-engineering are going to be related to traditional C code, or traditional assembly code generated by C. So this is going to be like your base knowledge of reverse-engineering, and then once you are comfortable with understanding what is going on with the C language, and all the assembly that you see most places, then you can move on to other languages and see how they differ from others etc.
But I think C is always going to be used as a reference for other languages, in the sense that when you look at assembly, first you try to understand it like you would understand C, and then if it’s different, you adapt from that. But if your baseline is going to be the Go language, if the one thing you know is Go and then you try to recognize whatever you learned with Go with another language, then you’re going to be into trouble, because whatever you’re going to see next is not going to look like anything you saw in Go.
So we do have a few courses at Kaspersky, people can check them out if they want. There are a few interesting online courses as well. It’s something for free, which is beginners.re; it’s a website, it used to be free, maybe now it’s behind a paywall, I’m not sure, but it used to be this big, big reverse-engineering course written by some guy, and it was amazing. You have a book, which is called Practical Malware Analysis. It’s a bit old now, but I think it’s still very much up to date. It’s from No Starch Press. I think for beginners it’s going to be a good way to get into the field, because it explains everything that is going on, it provides links to the various tools that you might need, etc. So good resource there.
And finally, if you want to approach this from the fun angle, I can actually recommend extremely good Steam games that allow you to get a feel for reverse-engineering. One of them is called Turing Complete. The pitch of this game is you’re going to build your own computer. And so you start with – they give you logic gates, like XOR gates on electric cables, basically, and based on this, you have to build a CPU, component by component. And then you move on, with increasing levels of abstraction.
So it’s really super-helpful to understand how a program works, or how a computer works. It allows you to get this high-level bird’s eye view of how a CPU is constructed, and how it’s supposed to operate. And knowing how CPUs work is then very, very helpful when you are doing reverse-engineering.
And then you have other games, which are from a developer which is called Zachtronics. These are weird puzzle games that are really related to computing problems. One of them is called TIS-100, you have another one called EXAPUNKS, and they are dubbed “the assembly games you didn’t know you wanted.” And it’s actually a very apt description, because these games have their own weird and limited assembly language, and you have to solve puzzles with them. You have to program some sort of small machine in order to make it do stuff, and you have to do this with assembly. And it forces you to use the language, which has the super-good design side effect of making you learn how CPUs work, or making you more comfortable with handling those weird instructions by yourself. So these would be my recommendations for people that want to get into it.
[54:15] Yeah, I have not thought about games. I’m gonna check those out later, actually.
And actually, if you are working from a university, or if you’re a teacher somewhere, Zachtronics. I think the company may be closed doors not too long ago; I think they are done making games, or they moved on to something else, but they used to have a very extensive education program, where if you are at university and you’re doing some computer science degree or something like this, you could just send them an email and they would give you access to all their games, for free, basically, and you could use them to teach, or as teaching aids. I think it’s amazing of them, and also, the games are really, really fun. They are fun if you like assembly, which I think is a pretty biased statement on my end… But I do still recommend them.
A lot of the things you said are like a cheatsheet for reverse-engineering. Lots of useful information, and I have so many more questions about specific things about Go and reverse-engineering; we might have to do another episode about this, because we are running out of time.
Sure. Well, I can come back whenever you like.
We will prepare our questions, we’ll ask you about things like generics…
I will have to prepare those questions as well, I guess… But no problem. [laughter]
Now, it’s time for an unpopular opinion.
So Ivan, what is your unpopular opinion for us?
Oh, my God, I totally forgot about that. But it’s okay. The good thing is I do have many unpopular opinions, so I’m going to give you things off the top of my head, and you can tell me what you want to know more about. For instance, I think that cyberspace is never going to be regulated. I think that NFTs are a scam, I think that there is no political will to limit the sale of cyber offense tools… That kind of stuff. I do have a lot of unpopular political opinions as well, but I don’t think I want to inflict that onto you. You’ve been very nice to me.
What do you think about the European rule about USB-C, standardizing USBs?
Oh, I’m very, very happy about it. I know it’s some pressure put on some device constructors, but I’ve been carrying lots of different chargers for years, and I’m super-annoyed about this… And knowing that we are going to switch to like a single USB-C for every single device makes me extremely, extremely happy.
Another unpopular opinion I have, which you can add to the list, is that I’m not really a big fan of Apple. Like, not at all. I don’t like their ecosystem. And I’m not going to get into this, but one of the things I don’t like is that people have to pay 40 bucks for new chargers, and they change chargers every time they release a new product. And I’m very happy that this is going to cut off this revenue stream for them, because I think this should have never existed in the first place.
What do you think about all the walled systems, like the Google Play Store, and the Apple Store, and the Amazon store? From a security practice perspective, they say it’s safer. Do you agree with that?
Yeah, this is a very good question. I do have very ambiguous feelings about them. I do believe that on the security perspective, it’s kind of a good thing, in the sense that yeah, it’s another one of those safeguards that prevent people from doing stupid stuff with their devices… And having to go to some friends’ places, or more specifically, friends of my mom’s places to debug computers, and uninstall malware, and fix the printers, then I’m very happy when there are protections that prevent them from doing that kind of stuff. But then again, they are not a perfect solution either. I think the Apple Store in terms of security is pretty good. The Google store, the Play Store has a bad track record when it comes to hosting malware. I’m not saying that they’re doing a bad job; I think it’s a very, very difficult job. But the fact of the matter is there are a number of apps on the Google Play Store that turned out to maybe not be total malware; some of them are, but a lot of them are just there to collect personal data, or that kind of stuff.
[58:23] So I think a better way of securing those devices is not to control the app stores. Creating protections on the device level is probably where I would work. So when you look at both iOS and Android, they are doing, I think, a very good job of – or have been doing a very good job, at least in the past years, of making sure that apps would not be able to access anything just because the user clicked Ok way back when they installed the app. So I think making sure that all those personal information cannot be pulled so easily is going to be a much better way than trying to police all the stores, and look at all those thousands of apps that are updated there every day… Which I do not think that you can realistically ensure that they are always going to be safe.
But overall, the other issue with walled gardens, which is, okay, maybe they do provide something with security, but also I feel like they take away some agency from me as a user, right? I really like to own the devices that I use, and having some restrictions that tell me “Oh, you cannot install this app because Google says you can’t”, or “You cannot uninstall this app, also because Google says you can’t” is something that tends to make me extremely, extremely angry.
So you mentioned a lot of unpopular opinions…
The way that Twitter works for our podcast is that we take an unpopular opinion and then we make a vote. So there’s a poll - do people agree with you or not? And then there’s a Hall of Fame for unpopular opinions, and for popular unpopular opinions. So you listed several… Which one would you like us to vote on?
So if I wanted to win the contest, I guess I would go with the NFT one, because I know that this is something very divisive, and I think that a lot of the audience that you are reaching is going to be probably – I’m not going to say that they are necessarily going to be on my side, but I think they’re going to be on a side. But I think a much more interesting question that I would be actually interested in having the committee’s opinion about is the one about regulation. I do believe that cyberspace is never going to be regulated, and maybe I need to say a bit more about this one, so that people can figure it out for themselves… My opinion on this is that – we have a number of high-level discussions taking place at the UN about acceptable norms for behavior in the cyberspace, etc. And you have all these discussions between states, where they talk with each other, and they are like, “Okay, what type of offensive operations are legitimate?” Like, for instance, espionage is okay, but destructive attacks are not okay. I mean, I’m not saying this is right, I’m just saying this is probably the kind of discussions that they’re having. And we may have differing opinions on what types of attacks are okay, and what types are not, or even if attacks are okay at all; it doesn’t matter.
The thing is, I do believe that – I don’t think that we will ever reach an agreement there, because, well, states do not have an incentive to regulate cyber offense. I think that they have an interest in having a way, or having some kind of framework that allows them to still conduct operations, because when they conduct operations, they know what they are winning, right? They have intelligence services that gather data, they collect it through cyber means, they take it back, and so they know that they are able to achieve certain results, because they have obtained specific information, and they can quantify that.
On the other hand, when you look at the cost of cyber offense, which means all your companies in your country that have been breached because there are no such norms, it’s something that’s super-hard to quantify. You can never know that you lost some contracts overseas to sell planes, or to sell something else because of cyber means, because it’s very likely that nobody knows that the breach even happened in the first place.
So the thing is, you look at the balance of risk/reward for the decision-makers, and they see “This is what we win with cyber offense”, which is a lot. And what they lose - it’s painless. And also, they have no idea what it is. And so overall, I think that all those discussions that are taking place, that are saying, “Okay, we need to make a safer internet, blah, blah, blah” are actually possibly being conducted in bad faith, because there is no political will to actually stop doing this kind of stuff. This would be my unpopular opinion, especially in the diplomatic circles.
Alright. You will be tagged, and we will be following the results.
I’m interested to see the results on this one.
It’s an interesting way to think about it.
Yeah, I want to know what’s well.
Cool. Thank you very much for sharing your knowledge, your thoughts and your opinions with us. This was really fascinating. We will be very happy to have you again. Thanks a lot, Ivan.
Well, thank you very much for having me. And yeah, feel free to call me up anytime, and I will be happy to be back.
Thanks, Ian, for joining. It was fun co-hosting together.
Yeah. Thanks to you guys. This was great.
Our transcripts are open source on GitHub. Improvements are welcome. 💚