This week on Ship It! Gerhard talks with Alex Koutmos about Elixir observability using PromEx. Why do we need to understand how our setup behaves? What is PromEx and where does PromEx fit in changelog.com?
Bonus! Tune in to our LIVE Friday evening deploy š± of Erlang 24 for changelog.com. Check the show notes for a link on YouTube. šæ
Featuring
Sponsors
Render ā The Zero DevOps cloud that empowers you to ship faster than your competitors. Render is built for modern applications and offers everything you need out-of-the-box. Learn more at render.com/changelog or email changelog@render.com
for a personal introduction and to ask questions about the Render platform.
Cockroach Labs ā Scale fast, survive anything, thrive everywhere! CockroachDB is most highly evolved database on the planet. Build and scale fast with CockroachCloud (CockroachDB hosted as a service) where a team of world-class SREs maintains and manages your database infrastructure, so you can focus less on ops and more on code. Get started for free their 30-day trial or try their forever-free tier. Learn more at cockroachlabs.com/changelog.
Linode ā Get $100 in free credit to get started on Linode ā Linode is our cloud of choice and the home of Changelog.com. Head to linode.com/changelog OR text CHANGELOG to 474747 to get instant access to that $100 in free credit.
Grafana Cloud ā Our dashboard of choice Grafana is the open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
Notes & Links
- prom_ex - an Elixir Prometheus metrics library built on top of Telemetry
- Grafana Agent
- Alexās hex.pm packages
- My.Thoughts v1
- Beam Radio
Transcript
Play the audio to listen along while you enjoy the transcript. š§
Hey, welcome to the show. We have Alex today with us, Alex Koutmos. Some of you may know him from Beam Radio, for those that are listening. Elixir - you have Elixir Tips going; tip #100 landed not long ago, right?
I do indeed, yeah. And then Iām taking a small hiatus from Twitter tips regarding Elixir; but I will be back into it shortly, donāt worry everyone.
Yeah. So Alex has been around the Erlang/Elixir community for some years now; I donāt know how manyā¦
I think itās gotta be like six years now. I read SaÅ”a Juricās book āElixir in Actionā back in 2015, and I was hooked on the Beam since then. Yeah, I guess since 2015 Iāve been working on the Beam.
That sounds awesome. So the way I know you, Alex, is from the work that youāve been doing on the Changelog app, which happens to be an Elixir/Phoenix/Erlang behind the scenes app. Youāve been doing some fantastic optimizations, especially with those N+1 queries. Thank goodness for that, because the website will be much slower withoutā¦
Oh, yeah.
Yeah. And those things didnāt happen in a void, right? So you had this amazing library, which you just happen to have; I donāt know how many libraries you have, but Iām sure you have a fewā¦ But this is prom_ex, or Prom E-X, as I like to pronounce it, because of that underscoreā¦ PromEx - can you tell us a bit more about that, what that is, the library?
[04:14] Sure thing. I guess the elevator pitch for PromEx is that you drop in this one library, you add it to your application supervision tree, and then you do some slight configuration, kind of like in an Ecto repo, where you slightly configure your repo, you slightly configure your PromEx module, and then you say āHey, I want a metrics plugin for Phoenix, a metrics plugin for Ectoā, I also have one for Oban, and LiveViewā¦ So you kind of pull in whatever plugins you want that are applicable to your projectā¦ And then thatās literally it. Thatās all you have to do. And then you have Prometheus metrics for all the plugins that you configured, and then for every plugin that I write that captures Prometheus metrics thereās also a corresponding Grafana dashboard that PromEx will also upload to Grafana for you if you choose to have PromEx do that. Thatās kind of like an end-to-end solution for monitoring. You can set PromEx up and get dashboards and metrics in five minutes.
I really like that part, especially the Grafana dashboards. Sometimes itās just so difficult to integrate it just right, get the correct labels, get the correct thingsā¦ What happens when thereās an update? Then youād have to update the Grafana dashboard. And the one really interesting thing that PromEx - Iām pronouncing it the way youāre pronouncing it, Alex; itās your library, so youāre the boss hereā¦ So PromEx - I like how it manages all aspects of metrics, all the way from the Erlang VM, all the metrics, not just Erlang metrics, but as you mentioned, all those libraries, all those components of an Elixir/Phoenix appā¦ And end-to-end, including when you have new deploys.
Exactly.
I felt those annotations were so sweet, because it basically owns the entire chain. It will annotate your Grafana dashboards when there are deploys. I felt that was amazing. Like, never mind managing them, which is super-cool, you also got annotations as to who deployed, and which commit was deployed. That was so cool.
Oh, yeah. These have been pain points for me personally probably since like 2017, because Iāve been using Prometheus and Grafana for some time nowā¦ And I feel like every project I was doing the same boilerplate every single time, with the annotations and stuff like that. But even after I set up that boilerplate, Iād still have problems where itās like āOh, look, a library maintainer updated their Prometheus packageā and youāve got some slightly different metrics. Now I have to manually know about that and then go pull down their JSON definition for the Grafana dashboard, and then I have to go onto Grafana, copy and paste itā¦ Lo and behold, thereās some slight label discrepanciesā¦ This churn all the time - there had to have been a better way.
Iāve been playing around with these ideas for probably a couple years now. PromEx is kind of that materialization of all those ideas. Itās slightly opinionated; I feel like a good tool should have some opinionsā¦ If those opinions align with the library consumers, thatās great. Else, maybe look elsewhere and see if some other solutions fit your problems better.
Thatās right. I remember your early days - I would say maybe the beginning of PromEx, when we were trying to figure out what dashboards are missing, and can we improve them slightlyā¦ So I remember us working together, a little bit - it wasnāt a massive amounts; just enough to make them nice. The integration was really nice. I remember when you added support for custom dashboards, which we do make use of, by the wayā¦ So we have some custom dashboards as well, that PromEx can upload for you. That was a great featureā¦ So now we store our Grafana Cloud dashboards with the app, and PromEx updates them. So we have nice version control going around.
[07:49] And you heard that right, we do use Grafana Cloud. We used to run our own Grafana, but then it was much easier to set up Grafana Agent, scrape all the metrics, scrape all the logs from our apps, from all the pods, from everything; we have even the Node Exporter integration in the Grafana Cloud Agent. We ship all those things to Grafana Cloud, PromEx handles most of the dashboards for us, which is really cool, and we have that nice integration going from our infrastructure, which is running Kubernetes (implementation detail, I suppose). We have a really nice setup, all version-controlled, and PromEx handles a lot of the automation between the Grafana Cloud and our appā¦ Or should I say the other way around - between our app and the Grafana Cloud.
So just to backtrack a little bit, all this was possible ā I think the beginning was the application. So Changelog.com, itās publicly available, freely available source code; itās a Phoenix application. That was an excellent idea, Jerod. I donāt wanna say itās one of the best ones youāve had, but it was a genius idea to do that. It was so good. And what that meant is that we were exposed to this whole ecosystem which is Erlang, Elixir, Phoenix, and thereās so many good things happening in it.
So the app, Changelog, is running Phoenix 1.5 right now, Elixir 1.11, but 1.12 came out, so Iām really excited to try that outā¦ And Erlang 23. But as we all know, Erlang 24 got shipped not long ago, and that is an amazing release. What gets you excited about Erlang 24, Alex?
I think the biggest thing is probably the most obvious one, which is the just-in-time compiler that landed in OTP 24. That has some big promises in store for everyone running Elixir and Phoenix applications. I think a few months ago I was actually playing around with the OTP 24 release and I had a dummy Phoenix appā¦ And I just hit it with an HTTP stress tester. It was a very simple app; I donāt even think it had a database backend to it. It was literally just pass some JSON, get a response back. And there were measurable differences between the OTP 24 - I think it was release candidate 1 I was running at the time - and OTP 23. I was pretty impressed that it was just a very simple Hello World style REST endpoint; you still saw some pretty big performance gains.
So Iām really curious to see people taking measurements in production with actual live traffic, and see what the performance characteristics look like for applications with the changeover.
Yeah. I mean, Changelog can definitely benefit from that. It would be great to measure by how much; I think thatās one of the plans, to try ā now that OTP 24 is properly out, we had the first patch release land, and we also had just today, a few hours ago, thanks to Twitter and thanks to Alex, ARM support. ARM64 support for OTP 24 with the just-in-time compiler.
So for those that have tried it or would like to try it, and are wondering why, the performance increases between 30% and 50%. So it can be up to 50% faster whatever youāre running, just simply by upgrading to 24. And yeah, depending on how it was compiled, how your code was compiled, it could be even higher. So it depends based on which optimizations youāre picking up from OTP 24.
Okay, so how would someone using PromEx - how would someone figure out what is faster? So you have your app, your Phoenix app or your Elixir appā¦ Iām imagining that PromEx works with Elixir as well; I donāt have to have Phoenix. Is that right?
Yeah. And the idea was to decouple the two. Because you might wanna grab Prometheus metrics on your application, but maybe itās like a key worker. Thereās not gonna be a Phoenix component there. But as we all know, Prometheus needs to scrape something over HTTP, unless youāre using remote write. Weāll get into that a little bit later.
So PromEx actually does ship with a very lightweight HTTP server, and itāll just serve your metrics for you. So you could very easily run PromEx inside of like a key worker, expose that one endpoint and have your Prometheus instance come and scrape it at its regular interval.
Yeah, thatās right. And you expose metrics. Just metrics.
[12:10] Yeah, for now itās metrics. Earlier you mentioned Grafana Agent, and the idea is to eventually ship that as part of PromEx. It will be like an optional download. So as PromEx is starting, if you configure it to push Prometheus metrics, you can have PromEx download the agent, get it up and running in a supervision treeā¦ Then you donāt even need to have PromEx serve up an HTTP server. You can push metrics directly.
Iāve actually used Grafanaās cloud offering. Itās quite nice, and it makes the observability story super nice, especially if youāre running in Heroku, or Gigalixir, places where maybe you donāt own the infrastructure end-to-end, and itās tough to have a Prometheus instance scraping your stuff over the public internet. So remote write, Grafana Agent - all super-exciting things, and hopefully coming soon to PromEx.
Thatās really interesting. So this is such an amazing piece of information, which I donāt know how Iāve missed, but Iām glad that youāve mentioned thisā¦ Because we were thinking a couple of weeks back āHow can we run the Changelog app on Render and have all the metrics and all the logs ship to Grafana Cloud, without having to set up something else that scrapes the metrics, and tails the logs, and then forwards them?
So this is super-exciting, because you have metrics already. I am feature requesting logs, please, so that we can ship the logs as well using the Grafana Cloud agent, which I know it supports them. And then the only thing remaining would be traces, which by the way, it also supports.
So we have metrics, logs and events. That is a very special trio. Can you tell us a bit more about that, Alex? What are your thoughts on that special trio?
We could start with the abstract and then we can work down into the technical nitty-gritty. So those three that you mentioned just happen to be the pillars of observability. All three of those are the pillars of observability. Itās theorized that if you have all three of these pillars in your app, youāve achieved the coveted observability, and all your SREs and your DevOps people in your organization will come and shake your hand, and all will be well in the world.
But jokes aside, the idea is that these three different types of observability tools yield different benefits for your application. So with logs, if youāre capturing logs in your applications or your services, you can see in very nitty-gritty detail whatās happening on every single request, whatās happening if there are errors, if there are warnings, if youāre having trouble connecting to other servicesā¦ You get very fine-grained detail as to whatās going on. This is super-awesome, and itās very helpful to have this very in-depth information.
The problem is that you can kind of be inundated by too much information, and itās very difficult to extrapolate higher meaning out of all this nitty-gritty detail. Then, if youāve ever run like an ELK Stack and had to administer that, you know the pains of trying to index all this data.
Then you might say āOkay, letās only log whatās importantā, and Iām sure people with production apps have had their DevOps people come to them and say āHey, letās dial back the logging. Itās a little too much, and Elasticsearch is just keeling over.ā
Then you reach for other tools, like metrics. Metrics eventually find their way into some sort of a time series database, and theyāre usually pretty efficient in comparison to logs, because theyāre more bounded. You have a measurement, you have a timestamp, and you have some labels associated with it. A little asterisk there, because that kind of depends on what your time series database of choice is. But thatās kind of roughly speaking what goes into capturing time series data.
[15:57] So given that youāve paired down what information youāre capturing, you could start a lot more efficiently, and itās a lot easier to query, and you can keep these for way longer periods of time. But the problem is there that youāve now traded off high-fidelity logs for explicit metrics that youāre capturing over time. Again, a trade-off, and there are different tools for the job, and you kind of reach for whatās best at that particular point in time.
And then traces is kind of like a merger of the two, logs and metrics, where you can see how long your application is sitting in different parts of the application; if youāre making external service calls, how long are you waiting for those external service callsā¦ If you have something like Istio setup and you can track requests across services, you can see how long it takes to balance across service A, B, C and D, and how long it takes to unroll and go all the way back to the original callerā¦ And then again, you get some metadata associated with those traces, and timestamps, and stuff like that.
Again, all three of these are different tools, they have some overlap, but itās really a matter of picking the best tool for the job. Itād be nice if you have all three of those in your company or application, but in the real world it is tough to get all three of these stood up and running efficiently, and running effectively.
I really like the way you think about this, I have to sayā¦ There is something pragmatic about, and something like - you can have this within five minutesā¦ But I also am very wary, because Iāve been following Charity Majorsā Honeycomb and those perspectives for many years, and my understanding is that the only thing you should care about is events. And if you have a data store that understands arbitrarily-wide events, something that can query them just in time, at scale, then you donāt have to trade off the cardinality constraints that metrics have, versus the volume of logs that is just too much, and the indexing, and how basically that happens behind the scenes. So the implementation that limits you to how you use those logs.
So I think that perspective is very interesting, and I will definitely follow up on that some more in the context of this show, of Ship It. But Iām also aware of where we are today - and when I say āweā, I mean the Changelog app - what we have already set up, and that ideal, which is that everything is an event. I think whether we want to or not, I can see how we are going on the journey, maybe some are more frustrated, others are more enlightened, but I can see how events potentially have the answer to all these things. But right now, the reality is that we still have to make this choice between metrics or logs. Traces as well. Theyāre like separate components. And I think that Grafana Cloud is doing a pretty good job with Cortex, which is a Prometheus that scales, basically, Loki, which is for indexing logs, and itās great to derive insights out of that, and Tempo, which I havenāt used yet, which is for traces. But these are the three components in the Grafana Cloud that serve these three different functions.
I think itās very interesting to get to that tool which unifies them all, and Grafana Cloud could be it, but there are others as well. Now, Iām not going to go through all the names, because thatās boring, but what is interesting is that we seem to be going in the same direction. And we may argue between ourselves whether the pillars of observability are a thing, or are just a big joke - different perspectives - but I think ultimately what really matters is being able to understand what is happening in your application, or what is happening with your website, or your service, or whatever. Unknown unknowns. Iām not going to open that can of wormsā¦ But the point being is āDo you understand what is happening?ā It may be imperfect, it may be limited, but do you have at least an idea of where to look, where the problems are?
[19:59] And I do know that PromEx helped us or helped you with the N+1 queries. It was very obvious āHey, we have a problem in Ecto, and this is what that problem looks like, and this is how we fix it. And yes, we fixed it. Does Erlang 24 improve things to Erlang 23, and in what way?ā And we can answer those questions as well.
So I think that monitoring is not going anywhere, and I think everybody respects it for what it isā¦ But we also are aware that there are better ways, and we should improve this. So with that in mind, where do you see PromEx going? What are the hopes and the goals for the project?
Yeah, sure thing. So Iām gonna first address a couple points that youāve made, and then Iāll answer the question.
Sure.
And this is just my own personal opinion. I donāt see everything rolling up into one solution. I just donāt think itās feasible at the moment. Like, would it be nice if everything was an event, and we could easily search it, and everything is hunky-dory? I think everyone would agree that yes, that would be great. And I think weāve tried this in the past - stuff everything in ELK, write some nice regex expressions, and extrapolate metrics from those regex expressions from your Elasticsearch database. From organizations that have gone down that route, itās extremely painful.
I think for now, for the foreseeable future, having those explicit tools for explicit purposes I think makes sense, just because theyāre very different problems that are trying to be solved, and trying to have one unifying tool that does all the things I donāt think will pan out well.
But I do like the approach that Grafana is taking, and the observability community in general, where theyāre trying to provide bridges from one pillar to another. A perfect example is exemplars in Prometheus, where your Prometheus metrics can have an exemplar tag on them, and itāll effectively say āHey, this metric data point is applicable to this trace.ā And you can kind of jump and say āOkay, something weird is happening here in the metrics. Iām getting a ton of 500ās. Let me look at an exemplar for that 500.ā You can click through and you can kind of shift your focus from metrics and go to traces, but still have that context of that problem that I was having 500s.
So I like that approach better, where you can bounce between the different pillars of observability, but still have the context of āIām trying to solve this problem. What is going on at this moment in time?ā I like that approach. Again, thatās just my personal opinion.
And to that end - and Iāll go back to your original question now - I would like to get PromEx to a point where it does take into account things like traces, and you could use exemplarsā¦ And if Grafana Agentās incorporated into PromEx, you could very easily use Syslog and export logs from your application via Syslog to Grafana Agent, and then those find their way to Lokiā¦ So I donāt wanna tailor PromEx solely to Grafana, but I do see that Grafana is offering a lot of tooling that is very powerful, and I would love to leverage it. Hopefully that answers the question there.
I think thatās a very interesting perspective. I love that.
That was a really interesting point that youāve made, Alex, just before the break, and I would like to dig into it a little bit more. I would like to hear more about PromEx, the hopes and goals, because I think thereās more to unpack thereā¦ But I find it very interesting how the exemplars that you have in metrics, how they link to traces. Youāve mentioned something very interesting about logs, and how a lot of information can be derived from them if the logs are in the right format.
In our Changelog app, just to give that example, we have a lot of logs - actually, most logs are still in the standard, unstructured format. So you have long lines of text, and thatās okay, but thatās where the regex are needed, to extract meaning from those lines.
So the thing which iāve found to work a lot better, for example Ingress NGINX, which we also run, is to use JSON logging. So we put all the different information, which you can think of them as metrics, in that one very wide event which is the log line.
For example, status 200, how many bytes, how long it took, which was the refer, stuff like that. And that information, when it ends up in Loki, writing LogQL queries, which are very similar to PromQL queries, makes it easy to derive graphs, which we would typically get from metrics, from your logs.
So then the boundaries between metrics and logs are blurry. You donāt really know whether āWas this a log, or was this a metric?ā Does this really matter? Itās what your understanding is from metrics and logs.
So that makes me wonder, how are logs and metrics different if you use logs as JSON, and you have this arbitrarily wide metric, if you wish - because itās a kind of metric, right? You have all these metrics like status, as I said, bytes, time taken - all those are metrics, and they all appear in a single line. So what is the difference then between the metrics that you get in Prometheus, which have a slightly different format, and the value is at the end, and then you have many metrics that you may put together, like for example for samples or summariesā¦ But in logs theyāre slightly different, and yet the end result is very similar. What are your thoughts on that?
Yeah, I think in the spirit of just-in-time/JIT, I think thatās effectively what weāre doing with logs when we try to extrapolate the metrics out of them, is through this event into the ether with a whole bunch of data associated with it. Maybe we donāt know what we wanna do with it at the end, but given that that event is in the database, we can extrapolate some metrics out of it. So weāre just-in-time kind of getting some metrics out of that log. You could go down that route.
I think that for some scenarios that may be your only option. Letās say youāre running an external service, and all itās giving you is structured logs out. Thereās no way to tie in maybe an agent inside of there, or get internal events and hook in your own Prometheus exporterā¦ For some scenarios, that may be your only option. And then I think thatās a valid use case. Read the structured logs, and generate some metrics out of them.
But for when you can control those things, I think storing them in a time-series database will be beneficial for the team, because itās less stress on the infrastructure, itāll be far more performantā¦ So thatās, again, a bit of a trade-off there as to what route you go down.
Thatās interesting. Okay. So PromEx - big on metrics. Maybe logs? Are you thinking maybe log?
[28:07] Perhapsā¦ I think the extent of the log support out of PromEx will be just the shipping mechanism, given that the plan is to have Grafana Agent as part of PromExās optional download. You can target that Grafana Agent for exporting logs to Loki. But I donāt think PromEx will transform into a library where it also provides structured logging mechanisms. I think thereās some good stuff already built into the Elixir logger on that frontā¦ But thatās not a problem Iād like to tackle in the PromEx library.
Okay, that makes sense. What about events?
So like traces, for example?
Iām thinking events we have from the Erlang library and the Erlang ecosystem. Itās very rich, in that it can expose all sorts of events, and I think this is where we are touching on the OpenTelemetry and the sort of things that the Erlang and Elixir ecosystem have going for them, which I think is a very good implementation, a very good story around telemetry.
Yes, yes. So letās rewind a little bit out of PromEx and talk about what youāre hinting at hereā¦ So there are a couple projects in the Elixir and Erlang ecosystem. OpenTelemetry as far as I understand right now is an implementation of the OpenTelemetry spec. I think itās solely just for tracing. I think even that library, so OpenTelemetry, builds upon another Elixir and Erlang library called Telemetry; that lives in a GitHub organization - I think its beam-telemetry. But that library, Telemetry, offers library authors a way to surface internal library events to whoever is using that library. Itās completely agnostic for how you structure these things, aside from you capture some measurements associated with that event and some metadata. Thatās pretty much it.
So every library can surface events, and you as the consumer of that library can say āOkay, I wanna pull out these measurements from the event, and maybe this metadata from the event.ā A perfect example would be the Phoenix web framework will surface an event when itās completed a request, when itās serviced a request. And inside of that event itāll have a measurement for how long it took to surface that request, so thatāll be your durationā¦ And then the metadata may be the route that the person hit, or the response status code, the length of the response payload etc. And then if you choose to hook on to that telemetry event, you can use all that data. If you donāt hook on to that event, itās effectively like a no-op. So youāre not losing any performance per se here.
Thatās effectively how PromEx works. All these libraries that I attach to are emitting these telemetry events. I just so happen to hook into all these telemetry events, and then generate Prometheus metrics out of them.
I think the story there in Elixir and Erlang is very unique, because the ecosystem has kind of said, āOkay, weāre all gonna use these foundational building blocks.ā And I think ā the last time I looked on hex.pm, I think there were like 140 libraries using telemetry, which means now across the ecosystem we have this ubiquitous language for how do we surface internal events in our librariesā¦ Which is very powerful, because now I donāt need to learn how Phoenix exports events, and how Oban exports events, and how Ecto exports eventsā¦ Itās all the same thing; I just need to hook into an ID for what that event is, and Iām off to the races at that point, and I can capture any information that I like.
[31:45] That explains why PromEx was such a ā I wouldnāt say straightforward, but almost like it was obvious how to put it together. It was obvious what users want and need, because you have all these libraries that expose these events; theyāre there, you can consume them. So Ecto this week, Oban next weekā¦ Iām simplifying it, a lot, but roughly, thatās how you were able to ship support for all the different libraries, because they all standardized on how they expose events. Is that a fair summary?
Yeah, thatās exactly right. It is quite a bit simplifiedā¦
Itās an oversimplification, of course.
Because a lot of times Iāll sit down to write a PromEx plugin, and as Iām writing plugin, Iām like āHm, I need some more data here.ā So Iāll make a PR to the library author, and say āHey, I think we need some additional metadata here, some additional measurementsā, and then we have to go through that PR cycle, and I have to wait for a new release to get cut, and then I have to make the Grafana dashboardā¦ So thereās a good amount of work. But yeah, effectively, thatās it - see what events that library emits, hook into them, convert them into meaningful Prometheus metrics, make the Grafana dashboard, and then ship it.
Thatās a good one, actually. I like that, especially the last part. Especially the ship it part.
Yeah, I thought youād like that.
Okay. So you have all these eventsā¦ So Iām wondering if - youāre ingesting events, youāre translating them into metricsā¦ Is there a point where you could just expose those events raw, and then something like for example Honeycomb, which loves events, could just consume them. I think thatās how the Honeycomb agent, in some languages, works. They just expose the raw events.
Iād have to play around with that and seeā¦ Some of these events have a lot of metadata associated with them. Again, letās say that Honeycomb is infinitely scalable, and it doesnāt take any compute time - yeah, sure thing; just dump a couple thousand lines of metadata per event into Honeycomb. But yeah, Iād have to play around with Honeycomb specifically to see if thatās event possible.
Iām also fascinated by it, because I think the take is very interesting, and I can see the uniqueness, I would like to understand it more, how they make that possible, for sureā¦ And the challenges ā I mean, if they pulled it off, which apparently they have, thatās impressive. And I think it takes an understanding of how complicated these layers are, just to understand what a feat that is in itself. So thatās interestingā¦
So we telemetry, we have PromEx, you mentioned about pluginsā¦ Is there anything specific that you would like to add to PromEx next, anything that users are maybe asking for, anything that you would like to ship, which you know would be a hit?
Yeah, so aside from Grafana Agent, which I think some people are excited aboutā¦
I am. Big fan. Pleaseā¦
[laughs] So one thing I forgot to mention was ā so in addition to supporting all these first-party plugins and Grafana dashboards (and you kind of hinted at this before), users of PromEx are encouraged to make their own PromEx plugins and their own Grafana dashboardsā¦ And those plugins and dashboards are treated identical to how the first-party things are. So youāre able to upload those dashboards automatically on application init, your events will be attached automaticallyā¦ So all those first-party plugins are kind of dogfooding the architecture. I wanted to see how easy it was to create plugins and dashboards and have them all kind of co-exist together.
So the idea is that you use PromEx for all the shared libraries in the ecosystem, and then you write your own plugins and Grafana dashboards for things that are specific to your business, that obviously are not gonna be supported in PromEx. So thatās one thing I forgot to touch on. And then what was the original question?
I was asking if there are any specific libraries that you are looking to integrate with. And Iām looking at the available plugins list, and I can see which ones are stable. This is, by the way, on github.com/akoutmos/prom_ex. And thereās a list of available plugins. A bunch of them are stable: Phoenix, Oban, Ecto, Phoenix-Bream, and the applicationā¦ And then some are coming soon, like Broadway, Absinthā¦ Iām not sure whether Iām pronouncing that correctlyā¦
Yeah, yeah. Just like the booze.
Right. I donāt knowā¦ I really donāt know.
[36:17] Yeah, me neither.
Okay.
So Broadway - that plugin is more or less done. Iāve made some changes to Broadway itself, and those changes were accepted and merged into the Broadway project. I donāt think thereās been a release cut as of us recording right now. So that plugin is kind of on hold until a release gets cut, and then I can kind of say that PromEx depends on this version of Broadway, if you choose to use the Broadway pluginā¦ Because I added some additional telemetry events.
The idea is to get Broadway wrapped up. For those who donāt know what Broadway is - itās a really nifty library where you can drop it into your project and you could read from various queue implementations, and it takes care of a lot of the boilerplate in setting up a concurrent and parallelized worker. So you can read from Rabbit, and you can configure āHey, I want 100 Beam processes reading from Rabbit at the same time and processing the work from there.ā I think it supports Rabbit, Kafka, and I think Redis as well.
But yeah, Broadway is on the listā¦ And then Absinth is on the list after that, because thatās the Elixir GraphQL framework. So that seems to be pretty popular. Yeah, after those two are wrapped up, Iām just gonna go on hex.pm, see which one has the most downloads after that, and just ā think of that as a priority queue. Whatever libraries have the most downloads and are the most popular, just make plugins for them, as long as they support telemetry.
That makes so much sense. Of course. The way you put it, itās obvious. Whatās the most popular? That thing. Okayā¦ Well, that will have the most users and will be the most successful, and people will find it the most useful. So yeah, that makes perfect sense. I like that. Very sensible.
So one of the things that we wanted to do - I think we were mentioning this towards the beginning of the showā¦ We were saying how Erlang 24 just shipped. It was a few weeks ago, the final 24 release. We have the first patch releaseā¦ And we wanted to upgrade the Changelog app to use Erlang 24. So hereās the planā¦ By the time youāre listening to this, either next day or a few days after, we will be performing a live upgrade on the Changelog.com website, from Erlang 23 to Erlang 24. We have PromEx running, we have all the metrics, and we will see live what difference Erlang 24 makes to Changelog.com.
[39:40] PromEx is obviously instrumental, all the metrics and all the logs get shipped to Grafana Cloud, so thatās how we will be observing things, and we will be commenting out what is different, what is better, what is worse. So with that in mind, Iām wondering if thereās any assumptions or expectations that we can set ahead of time. What are you thinking, Alex?
Yeah, so Iāve been thinking about this for a little whileā¦ Because measuring things before and after changes - it just excites me, to see that youāve made a change and you have some measurable differences between how it was before and how it is afterwards. So Iāve been thinking about this, and some of my hypotheses are that memory usage will go up slightly, because that interpreted code that was compiled to native needs to be stored somewhere. So memory usage will go up slightlyā¦ And then I imagine most things CPU-bound will be sped up. So serializing and deserializing from JSON, serializing and deserializing from Postgres database - all these things, we should see a considerable change in performance. Those are kind of top of mind at the moment. How about you?
Iām thinking that the end result that the users will see, because of those serialization speed-ups, is a lower latency. So responses will be quicker. Now, if you have listened to the Changelog 2021 setup, you will know that if youāre accessing Changelog, youāre going through the CDN. So every single request now goes through Fastly. And what that means is that the responses are already ten times faster, or maybe faster still. So your responses are served within 50 milliseconds; thatās what the Grafana Cloud probes are telling us.
So the website is already very fast, because itās served from Fastly. What we will see, however - we have probes that also hit the website directly. So expect the response latency, if you go directly to the backend - or to the origin, as the CDN calls it - it will be slightly lower. I also expect the PostgreSQL - maybe not the queries necessarily, but the responses, as you mentioned, Alex, because of the serialization, to be slightly faster. So I would expect the data from the database to load quicker. And that will also result in quicker response time to the end users.
Iām very curious what happens with context switches. Are we going to have fewer context switches, so less work on the CPU, or more? Obviously context switches are not just like the work the CPU does, but I think things will be a lot less work to do, so fewer context switches. CPU utilization - I think it will go slightly down, but right now we donāt have to worry about that because we have 32 CPUs. All the AMD EPYCs, the latest one - thank you, Linode; those are amazing. Everything is so much quicker. And we have the NVMe SSDsā¦ Everything is super-quick. But yeah, for more, listen to the 2021 Changelog setup where we cover some of these. And I think the blog post will come out.
Thatās what I expect to seeā¦ So will it make a difference for the users? I donāt think it will, because they have the CDN. So everything is already super-quick, as fast as it can be. You have TLS optimizations, you have data locality of all the good stuff, because the CDN just serves requests from where you are.
For the logged in users, because obviously those requests we canāt cache, things will be slightly quicker. So for Adam, for Jerod, whoever is working on the admin, those things will be quicker.
Another thing which I do know that we do - we do background processing on some of the S3 files, the logs and stuff like thatā¦ So expect those to be quicker. But I donāt know by how much. I think weāre using Oban for that, arenāt we, Alex?
Yeah, weāre using Oban. I think Oban was set up just to send out asynchronous emails. I donāt know if there was any other work being done by Oban. But now that you mention those things, we probably should have metrics in place to capture those S3 processing jobs, see how long they take pre and post OTP 24.
Yeah, thatās right. Thatās a really good one. Thatāll be a great one to add. Okay, Iām really looking forward to that. And if youāve listened to this, you can watch it live. And if you havenāt, thatās okay; youāll see it on Twitter. We will post. Maybe weāll even do a scheduled livestream. Does that make sense for you, Alex? What do you think?
Yeah, it works for me.
[44:06] Okay. So no impromptu. Weāll schedule it and weāll say āOn this time, at this day, at this hour.ā Okay, I like that. Thatās a great idea, actually. So weāll have like at least a few days of heads up, and then you can listen to this, and then you can watch that, how we do it. Great. that makes me very excited. Okay.
So weāre approaching the end, and I think we need to end on a highā¦ Because itās Friday when weāre recording this, it was a good week, and the weekend is just around the cornerā¦ So what do you have planned for this weekend, Alex? Anything fun?
This weekendā¦ I think I have one thing I wanna do in PromEx, but then Iāll be building a garden. So Iāll be outdoors, using the table saw, and the miter saw, and the nailgun, and putting together some nice garden beds.
Okay, well that sounds amazing. You have to balance all the PromEx and all the Erlang/Elixir work somehow, right?
Oh, yeah. You need to find a healthy balance between open source work, the full-time job, and a little bit of fun for yourself.
Yeah, thatās for sure. So building a garden - that sounds amazing. You must be either very good or very brave, Iām not sure which one. Either a great DIYer, or very brave, youāll figure it out. Which one is it?
I donāt wanna be arrogant or anything, but I think Iām a decent DIYer. I also used to tinker around with cars quite a bit before I had a familyā¦ When it was okay to financially irresponsible and buy a $3,000 motor just because I felt like it. Nowadays you canāt do thatā¦ [laughter]
Okay, different timesā¦ Right?
Yeah, exactly.
Different world.
I could buy a motorcycle anytime I wanted to. I didnāt have to worry about providing for my kiddos. I go with safe hobbies, like building garden beds or doing some woodworking.
Okay, that sounds great. So I hope the weather is going to be great, because for me, the weather has been rubbish for the whole week. Windyā¦ I wouldnāt say itās cold, but itās not nice; itās been raining all day every day, we had some downpours as wellā¦ So it hasnāt been really great. And right now Iām looking at it like ā I was going to do a barbecue; I love barbecuing, the proper charcoal oneā¦ But the weather is not good. Maybe we get the parasol out, so it doesnāt rain on my barbecue regardless, maybeā¦ I donāt know. But what we have to do is post the pictures. Because how can people appreciate how good of a DIYer you actually are if they donāt see your work?
Well played, sir. Well played. Iāll have to take some selfies. I usually stray from the selfiesā¦ [laughs]
And videos. Those are very important, because if you donāt take videos, someone else could be doing the work and you just take pictures. Noā¦ That would never happen, right? Only in movies. [laughter]
Never, never.
Alright, Alex. Well, itās been a pleasure to have you on the show. I really enjoyed this. Iām looking forward to doing what we said we will do. Thatās super exciting. Shipping Erlang 24 for Changelog.com - thatāll be great. And which version of PromEx are we at now? Do you know which one is the latest?
I donāt rememberā¦ I think 1.1.0 is the latestā¦ And I think the Changelog is on 1.0.1.
Right. So not that far behind, butā¦
Yeah, weāll bump it up.
Thatās great, okay. So we shipped that. That is exciting. Ship a garden in the meantime as well; maybe a barbecue. Weāll see. This has been tremendous fun. Thank you, Alex. Looking forward to the next time.
Likewise, thank you.
Our transcripts are open source on GitHub. Improvements are welcome. š