The .NET on AWS Show, Featuring Martin Thwaites!
In this episode we are joined by Observability Extraordinaire Martin Thwaites
Brandon Minnick
Amazon Employee
Published Mar 14, 2024
Last Modified Mar 15, 2024
Continue to The .NET on AWS Show, Featuring Laïla Bougriâ!
Loading...
Francois Bouteruche 1:07
Hello, everybody. Thanks for joining. Welcome back to the dotnet on an earlier show. As you can see, I'm not Brandon Minnick Burnden is not there today, because he's celebrating the Independence Day in United States. So Happy Independence Day for everyone. It's tomorrow, but it could take off to celebrate this errand in family. So Happy Independence Day, Brandon Happy Independence Day to every US base people. So today, I'm rich. I'm a developer advocate. And I'm joined by James is hum, fan Oakland. Oh, are you James?
James Eastham 1:51
I am doing well. I seem to have just developed a cough in the last five minutes. So if I keep swallowing and coughing, then I apologize in advance. But yeah, I'm doing good.
Francois Bouteruche 2:01
Okay, so what what are the news you want to share with us this week.
James Eastham 2:08
So there's a couple of things I found that I wanted to share, actually. One of which isn't necessarily new news. But it's a really cool library that I've just discovered in the last couple of weeks. And it's a library called Baugus. So I'm sure many people listening. If you're developing something new while you're building out a system, or you're running some tests, you need some data to test against, right? Like you need to generate some data. And sometimes I found myself like manually keying in data, thinking of people's names going through like me, and then my family, and then my dog, and then my family's family and trying to think of all this random bits of data. So bogus is this really cool library that you can use to automatically generate test data or random data? And you can use that to populate databases before your tests are? An all manner of different things. So that's one thing I've discovered recently, which I don't think is a new thing. I don't think it's a new library. But I haven't discovered it. It's very cool.
Francois Bouteruche 3:05
Yeah. I didn't know the library before you speak to me about it. And I wish I knew it before because I went through many projects where I was, okay, let's create a test data set. And you're, like I mentioned, you're always guessing. Okay, what first name or last name should I use? though? Yes, definitely a good helper to spit.
James Eastham 3:29
Really neat Barbie that you know, you can have like validation. You can like this. It's it's uses the fluent style of defining things. So yeah, it's it's a really cool library. And then the second thing I wanted to share, so anyone who does know who I am knows who I do a lot of things with serverless dotnet, particularly with lambda with AWS. And as of last year, with the release of dotnet, seven, the native AR T going generally available. We've added support to be able to run their today LT on lambda, we've got tooling to be able to do that. But when you start to use native AR T, there's some Shall we said limitations I trade offs you've got to make. And there's a really awesome podcast episode from Brian Hogan, on the no dogma podcast that really dives deep into how native AR T works, the history of Native AR T arrived to where it is today. It's a really good deep dive into that background, and also some of the trade offs that that you need to consider. And that's where Andy Gorka who's the, I believe he works on the compiler team. He does. He does a lot of work with building sbrt at Microsoft. So it's a really cool podcast episode. If you're doing serverless things. You're looking at adopting native AR T with dotnet. Eight coming later this year, and the support for ASP. Net with net ability. It's gonna get more and more powerful, especially within the realms of, of serverless things. So yeah,
Francois Bouteruche 4:53
I know. Let's take two minutes to discuss this. I know you've did a lot of tests around through product dot add on areas under our serverless Compute Engine. And especially recently you test native iotium lambda. So can you speak a bit more about this?
James Eastham 5:12
Yeah, absolutely. So one of the one of the big benefits of native alts is startup time, your application will simply just start up faster. And we've run a lot of benchmarks for the different ways of running dotnet on lambda. And the way lambda works without diving too deep into technical details is you've got this idea of what's called a cold start. So the first time a lambda function is invoked, the execution environment needs to start up, your code needs to be downloaded, the code needs to be started, and then it's ready to receive that request. And historically, for languages like dotnet, and Java, the compiled languages, that code start time, takes longer than it would for said not ipython. And we ran quite a lot of benchmarks for dotnet, six, using the managed runtime with dotnet. Six. And for for the application, we use the cold start time was typically around 600 700 milliseconds of p 50. Going up to 900 milliseconds at p 90 p 99. And that's just the dotnet, six standard dotnet six running with lambda. Once we enabled native LT that dropped to around 300 milliseconds, 320 milliseconds. So that's a 60%, give or take a performance impact that p 50. The percentage difference get slightly smaller. P 99. P 50. That's much, much faster.
Francois Bouteruche 6:39
That's that's pretty awesome. I guess when we picked it to reduce the court style, or this small, small, very small, gold star? I guess it's it covers most of use case. Yeah, I mean, I think they're probably very, very performance oriented use case.
James Eastham 7:02
Yeah, absolutely. Yeah. If you're doing things asynchronously, you know, you're doing written off a queue, I mean, think of an event boss, then typically that call to that performance, you know, it's a second, maybe that's typically not going to affect the vast majority use case is if you build in synchronous API's, then of course, your performance becomes much more important. And the other thing I will point out, while we're on the subject of cold starts is the same benchmarks that that I ran, we run 110 requests a second for 10 minutes, I think, is what was 100 requests and 10 requests a second for 10 minutes. And that led to about 160,000, inbox of the lambda functions behind the scenes, of which I think around 500 of them were cold starts. So actually, if your API's under a relatively consistent law, again, that startup time becomes less relevant. I'm not saying it's not important, but you know, it's less relevant because the vast majority of requests, because dotnet is so fast, like warm when it's warm, it's 567 millisecond response times. So it becomes less and less important if you've got a steady state load. So the thing I always say with cold starts is, you know, run your tests against some kind of real world traffic or some you know, the traffic pattern that you're actually expect your application to receive. Use that when you're doing your tests to see the performance works, because you're Meg may be surprised by what you see, compared to just running it manually.
Francois Bouteruche 8:29
Okay, that's pretty good. So now it's time to welcome our amazing guests for today. We, we have an amazing guest. If you are looking at concern around dotnet and observability in the last month, you you probably won't either way, I would like to welcome Martin for weeks to stream. Welcome, Ratan. Thank you for joining us today.
Martin Thwaites 9:01
Hello, welcome. Well, thank you very much. Yes. I, I have been a little bit promiscuous in the community over the last sort of six to 12 months. So yes. I would imagine people have seen me around if they've been following any of the observability or open telemetry, hashtags around any of the social platforms. So but yeah, great to be here and talk about dotnet in general.
Francois Bouteruche 9:27
Yeah, thank you, Marlena. First of all, can you for those who don't know you, maybe don't know you? Can you just present yourself very quickly.
Martin Thwaites 9:40
Yeah, absolutely. So my name is Martin Thwaites. I go by Martin dotnet. On the socials from Mastodon to Twitter. Blue skies, a recent one. They've just gone. But, but yeah, I'm a first and foremost observability advocate I've been preaching about understand doing production systems and telemetry and all of those kinds of things for seven or eight years now, and recently joined a company called honeycomb. We do a observability platform that's agnostic of language. And I was brought on as the dotnet. guy who knows all the dotnet things, I'm pretty sure they didn't know what dotnet was before I arrived. So you know, already adding value from day one. But yeah, that's me.
Francois Bouteruche 10:28
Okay. I have a question for you. What we want to share with the community is how our guests came to the tech industry and to dotnet, in more specifically, so kick, can you share your journey to the tech industry into dotnet?
Martin Thwaites 10:50
Yeah, I mean, I would say mine's a little bit nonstandard. I, when I first I did a university degree in it, when they didn't really know what it was. To them. It was building circuit boards, and supporting people with terminals and stuff like that. It was, I think it was the first person doing something called Information and Computer control technology, which was essentially internet is what it actually turned out to be. But I left without a degree and started working in hardcore sales, selling credit cards. For my son's, I'm sorry, and anybody who I got into debt, I'm really sorry. So yeah, I started doing that. And I always knew I wanted to be in it got a job doing something called file, tab D, which is a proprietary language, which is all about decision tables for insurance companies, essentially doing assurance calculations and algorithms and that kind of stuff. Incredibly fast, incredibly fast. But you know, it's not really an O language, by any stretch of the imagination. However, I started to try and write it in an hour way. And everybody was like, what's that? That doesn't look right, that's. But then I got into testing and systems analysis. So designing and essentially doing automated testing. And that's kind of what got me into dotnet. I was working at a company in Warrington, and we needed to test the application, I was the quality manager there. And started writing tests with n unit. Back in the day that exercise the API exercise the website using selenium. I was like point two beta for selenium. And yeah, just started writing C sharp code in a new unit. And I was like, Oh, this is actually pretty cool. And very quickly, I built an application that was testing things out load in production, and all this kind of stuff, which was, which was pretty interesting. And yeah, that was what got me started in it. And then went into started designing systems, building systems, then building teams, that build systems. Most recent project was building the systems for Manchester Airport Group. I built all the backend systems from there that was all based on AWS and lambda and step functions and loads of stuff around using honeycomb at the time as well. And yeah, that's kind of where I came to, and then just started doing independent consultancy. And I was doing that until I can totally go, there you go. That's my life story.
Francois Bouteruche 13:35
An amazing journey. Amazing. And how, in this journey, Oh, do you learn? Or did you learn C sharp and dotnet? And modular? Or did you learn a new technology? Because that's always the often the difficult part of our job, always learning new things. So how do you learn?
Martin Thwaites 14:00
So now, I learned by Twitter, I learned by people tweeting stuff, and I go, Oh, that looks cool. I'll go and play with that. I mean, I do have the luxury that my job is to learn these things, and then learn how to observe them, and then tweet about them and make content about them. Which is a bit of a luxury. But one of the things I find hard is my hobby is my job. Which means that my personal time for learning is blurred into my professional time for learning. So I find something new that somebody's talking about and try and work out how I can observe it because I one of these people, I I've run production systems, and if I can't observe it, I don't like it. If I don't know that it's going, Okay, I don't like it. So I take these new technologies, I see somebody tweeting about it and go, Okay, but what would that look like in a trace? How would I monitor that what what things are important to me about that particular thing that I'm running and I'm so I kind of look at it from the angle of I tried to do something it's not meant to do. And that makes me kind of delve into what it can do.
James Eastham 15:10
I think that's actually how I got into deeper into observability is like, Who's this? Who's this? My team through it's blocking keeps tweeting about this observability thing. Okay,
how will that work with lamda? Exactly how I discovered when you call them and observability and all that good stuff. So yeah, I like the Twitter, Twitter as a way of learning Union.
Martin Thwaites 15:28
But it's like, you know, you need to somebody needs to prompt you. Because yes, you can. I mean, your prompt might be an RSS feed that you're looking at, it might be podcast, it might be Twitch streams. But somebody mentioned something. And the best way I find to learn is to dig into that thing. And not just sit back and watch somebody, do something with it. It's trying get it into an application, try and get it into even just a demo app, and see what you can do with it and make it do something that everybody else didn't make it do. You know, the hello world that somebody's done, it's great. Okay, I've done Hello, world. We're gonna blog about and I'm gonna move on. No, no. Hello, world. Great. Okay. added on. Now make it count. Count sheep. You know, that's make it do something that somebody else didn't want it to do. And then you go over, it doesn't do that thing. And I really like comparing, like, Oh, I did it this way with this thing. How would I do it with this thing? And as long as that's not something that was literally part of the blog post I read or the Getting Started docs. I feel like I learned so much about it.
Francois Bouteruche 16:33
Yeah, that's, that's so you love to get your hands dirty.
Martin Thwaites 16:40
By the 100%. I am. I am the person who was quite happy going on call because I love I love production incidents. I don't love the fact that production incidents happen. But I love the investigation. And you know, really delving deep I was I was always the parachute guy. The sort of oh, something's going wrong. Throw mine in there. Like, like, but yeah, the guy that they throw into gone digging,
Francois Bouteruche 17:10
and I can see that you were seeing that your prediction, and I'm seeing on your T shirt that you love doing tests on? Yes. Can you tell us more robotics?
Martin Thwaites 17:22
Okay, so it's a bit tongue in cheek? It, it causes a lot of emotions in people when they go, no, no, no, no, we tested the test environment. It's like, great. Does that represent production? It's like, yeah, it's got the same service. Like, also, it's got 100,000 Customers hitting it during the day as well. Oh, that's a weird test environment? Oh, no, no, it doesn't have the same customer load. Okay, so it doesn't match production. And what we say is that everybody tests in production. It's just some people ignore the results. Your customers all day every day, are using your system, which is basically a test. They're testing it in production for you. Now, if you ignore the data from those tests, then that's more fool you. So everybody's testing in production, everybody all day, every day, they've got customers hitting their site. And if you don't observe what's happening, and treat those like a test, then you're missing out on such rich data that allows you to know actually how it's working. You know, you mentioned bogus as a library, which is great to do things in the test environments. But I can guarantee you, your customers are way better at generating bogus data than that library is.
James Eastham 18:41
All user input evil, I think is the mantra that someone once told me everything that you think the user won't do, they will probably do at some point in the lifetime of your application.
Martin Thwaites 18:49
Oh, I saw a tick tock recently, where there was a guy with a SETI or a sofa for the American people. And the developer comes in the cities there and he goes and sits down on the city. And then he lies down on the city. And then the QA person comes in and takes the cushions off and sits down on it, and then tries to stick it on his end and it's great. And then the user comes in, takes the city throws it against the wall jumps up and down on it. And you're like, alright, okay. Yeah. All right. But yeah, this whole point is that like, I think the other joke is everybody has a test environment. It just so happens that some people have a separate production environment as well. But yeah, it's not only testing prod is the key thing. It doesn't say I only testing products as I test in prod, because you will still do unit testing. You'll do local component, integration testing, you'll do deployed integration testing, you'll do UA T testing, you'll do performance testing, all of that testing still exists. But you also need to be testing in prod. And it's another way of talking about observability.
Francois Bouteruche 20:01
Okay, yeah. When it when you you've said you, you need to understand results of the test of your user, you need to have access to, to those test results to those data. Because personally, when I work in many company where they will work on basically access to the production and say, No, don't you know, the access to the production you can't access with, I need the log, I need to understand what's going on on the production. So that has been my issue for four years to get, I need to understand what's going on in production.
Martin Thwaites 20:45
And that's, that, to me is a, it is a smell as a company, if you're stopping your developers from accessing telemetry data, it's a an organizational smell, you know, we talk a lot about code smells, that, to me is an organization smell. If you're saying that people can't access that data, it's a couple of things that could be going wrong, you don't trust your developers to not push out PII data is normally the biggest one. You can't access logs, because well, it's got PII in there. So we need somebody who's PCI certified. To look at that data, fine. Get your developers PII certified, so that they can go and look at it. I'm sorry, that's not an excuse. The other thing is Trishy. Developers, put things inside of your pipelines implement code reviews, if that's what you want to do implement PRs, that will be able to do it, increment analyze it. I mean, we're in a dotnet world, we're writing a resume analyzer is really, really easy. Now, you can write Roslyn analyzer that says, oh, right, if you're putting activity dot set tag, first name, I'm sorry, I'm gonna put a big warning on there and say, Don't do that. You know, these, these are things that that's normally a smell, if you're not actively as an organization trying to work with that and say, No, I will, I will now try and work out how I can give all of my developers we talk about developers having the ability to understand production, well, that's both access, so that they can do it. But also the ability to understand the tools of where that is, you know, they write in JQ in cloud watch to try and grab all the the log entries that make a particular thing that's about ability, how many people are you going to train your developers on how to do that, that's a core skill that they should be able to do, you should spend time on making them understand it, you know, with the as your side is something called custo, which is the most horrible language I've ever seen. But you need people to understand it. And if they can't, then you've got a problem, you might be able to give them access. But if they don't have the ability and the knowledge to use that tool, then you've got you've got the same problem, you might as well not have access because people won't use it. So it's not really just about that idea of, can I give somebody the keys to be able to go and look at that data? Well, no, it's also about how do we train them up so that they know how to do it. Some of the things that are at the heart of honeycombs philosophy, is about democratization of production, is this idea that everybody should have access, everybody should have the ability, everybody should have the knowledge to be able to understand what's going on in the production environment. I don't care whether you're on call or not, you should be able to understand it. Because if you don't understand what's going on in production, you're at more risk of taking production down by adding this little bit of code is that, oh, I'll just delete this one line of code. It's fine. nobody's using it. Well, did you check the production telemetry and see whether that was that code path was actually been here? You know, what was the query that you ran to know that that is safe to be removed? Oh, I didn't I just, you know, I've looked through the code and it doesn't like it's been used. Well, I'm sorry, that's, that's a problem. If you've not got this ability, say, I've just deployed some code, how are you going to know whether that's performing within the parameters that you expected? What what query? Are you running against your production telemetry data that tells you that this is running? Which graph? Do you expect to go like that? Or like that? Or like this? You know, what, what effect are you planning to see? And if you don't, then you're doing the developers a disservice. And that's why people, developers in general, hate production, they think it's scary. It's a wild west, that things can just go randomly wrong. And that's why you need this observability mantra around access and knowledge and ability. Because then they go, Oh, well, of course, it's just like, it's part of the team. We've been talking recently about this idea of, I love this idea of you go into the business and you say, right, what I need is I need a tool. Okay. I need this tool that will, that I can ask questions of that will tell me all about what's happening in production. You know, I need this developer, right. I need a developer. It's going to cost me I don't know, X 10s of 1000 pounds to buy this developer and their primary job is just to sit there and understand production. So I can just ask them questions and say, Is this code path hit? Are we running slower? Are we winning fast? Are we slower today than we were yesterday? I just want a developer that I can just go and ask that foreign. And then you go to them, they go, alright, yeah, here's the budgets like, right, what I'm gonna do is I'm gonna use that budget to buy a tool that I can just query and can ask these questions with, oh, no, no, no, no, you can't buy the tool. No, no, can hire a developer to do that. But you can't buy a tool.
James Eastham 25:35
I think the first the first time I heard you talk about the testing production idea, I think I was a little bit like that. That's terrifying. That's really scary. But actually, I think once you dive into it, and you see it in practice, you see, via shooting feeling, shall we say that you do something, you make a change, and you push it out? And then you telemetry goes, Well, hey, look, that core party is being hit. I think what I'm interested in Martin is, do you have any more practical advice for dotnet developers to think about to actually bring this into practice that, you know, this is all well and good talking about? Sounds like a wonderful idea and everything you're saying, but what are the some of the practical things that I've done a developer I can do to make this a reality, I suppose
Martin Thwaites 26:21
opentelemetry opentelemetry, all the way is start ditching logs, and start thinking more about tracing, using open telemetry, forget metrics, they're for infrastructure, just use distributed tracing technologies, like open telemetry, put it into a back end, and start to really understand that back end, even do that locally. I was literally talking to somebody earlier on today who's running a, they're just getting started with moving a monolith over into lambdas. And they've got, I think, there was talking about about 50, or 60, different lambdas. And it's a distributed application. And they're running locally and struggling, because they're looking at the logs and trying to, you know, try to attach to seven different lambdas at the same time, and playing with Sam and all of that kind of stuff to try and get all of this thing working. And I said, Well, if you were to just add distributed tracing, this was a TypeScript application. So I've kind of done that. You know, TypeScript is basically dotnet. But without the dotnet name. So but they're using type scripts, such as the opentelemetry, SDKs. And start looking at your application now, in a distributed tracing context, locally, pushing it to a provider, pushing it to Jager, pushing it towards wherever. But start looking at it from a distributed tracing perspective. And we went through and we did a few proof of concepts and just like it was like 10 lines of code in each of the services. And they ran it was like, oh, that's shouldn't be calling that thing. And you're like, well, it is cuz cuz the telemetry tells you it is and like, but I didn't think it was calling that thing. And then we're starting to see even just locally, as a, as a developer, and engineer developing a distributed application, even locally, seeing this information. And then getting used to being able to query these tools and look at a Trace View, put them in a really good position to be able to move forward as a into production, and then understand those things in production. So starting to think, especially if you're starting to think about nano services, or micro services, using lambda, or even just running stuff in fargate, and eks. If you're starting to think about running these distributed applications, we'll think about, do using that distributed telemetry locally. Don't wait for production. Don't wait until you're ready to go live. And you're like, let's add the distributed tracing, because we need to know what's going on. Doing that really upfront, is a makes you understand it better. But B makes it easier to go live. Because all the things you've been doing has been going along with that, oh, I'll add this attribute because it'd be nice. This will be interesting. I'll add this attribute I had that maybe we should add a span around this thing, because it'll make things make sense. And those are all concepts that will be foreign to people who have not done this. But if you start by adding the distributed tracing stuff, literally just the Getting Started stuff and the auto instrumentations just those then go in and start going, Oh, maybe it'd be interesting. What's this concept called if I want to add this thing here? And it's just it becomes that production environment just becomes an extension of your team. Because that's just something you can ask questions on and say You doing all right. Is everything okay? Who hurt you? Where does it hurt?
Francois Bouteruche 29:54
I see kind of parallel between what you just said and what I think Sprint's when I first started to play with serverless function and services like AWS lambda, because I always used to build my kind of standard on dotnet monolith running on on my IAS server. So I was building my ISP dotnet monolith, and I was fine with this. I did this for years. And then suddenly, I came to AWS lambda and trying to put my monolith into a native system. And just realize that, Oh, it doesn't seem to be a good idea. So maybe I should split everything and start learning new things about okay, what is really serverless computing? And what you just say here, like, okay, look to me was fine was when I was in a kind of monolithic system where all the load was on the same server, and I can just teach a log and everything was fine. But no, we have distributed system. And we need to understand how they are they interact all together. So there is a kind of maybe journey for developers to move from, okay. This was where we do love and metrics before. And now we are in a new era where distribute tracing even reads microservices, not only not only with serverless function, but I can see some needs for all these Cotner reading into communities. Um, I don't know, no,
Martin Thwaites 31:38
you're exactly right. I mean, like, say this is, this isn't about distributed systems. And lambdas goes a little step further, in my opinion, than just distribute systems. Because now we're on about nano services we're on about things that do a diploid for just one thing. So we're starting to get to this point where actually the lambdas could be so distributed, they could be different code bases, they could be different repos. And it becomes even more fragmented than trying to think about one bounded context around one micro service around one thing that you can test independently. Now, all of a sudden, these lambdas something you can't test independently, it'd be pointless to test them independently, because, well, it doesn't do anything unless I've got five of them together. So actually, no, I need to test all five. And now all of a sudden, I've got five different things that I'm running, in order to get the logs from five different consoles and try and work out what's hitting what. And that just seems like hard work. Right? If I if I came to the industry now, and that was my experience, I'd be like, I'm going back to Starbucks, I'm gonna, that seems easier for me.
James Eastham 32:47
I think, especially when you start adopting services like lambda, and you start building more event driven compute, I think it gets even more important, right? Because if you're doing any kind of publish subscribe, where you've got potentially 10s, hundreds infinite number of subscribers, you don't even know about, I don't even know exist, having some element of distributed tracing to understand I guess, at a higher level, exactly what's communicating and what and what triggers what and all my system is just fired. And I didn't know why. Oh, it's because this event came from this system. And that's why so yeah, it's vital part. I think whenever I talk about serverless, I think about service that trace is particularly such a vital part of it.
Martin Thwaites 33:29
One of my major annoyances right now is when they do bootcamps, or they do like there was an AWS bootcamp, recently, we were asked to come and do an observability module. So we've got the observability, module 11, it'll be this sort of date, can you come and do an observability module on like, we'd absolutely love to, but it's module two. We're not that because the problem is you teach people how to build services in a cloud platform using lambda using fargate using containers, and they build this big distributed application, and they feel the pain of trying to debug it. And then unlike, just before, they're about to finish you go and here's the way that you could have debugged all of that and it could have been finished in an hour. And then they go but, but you made me go through all that pain. Like, why why why didn't you tell me this earlier? And it just makes things so much easier if you can see interactions
Francois Bouteruche 34:22
I can see kind of retaliation is kind of functions because they feel the pain before also experiencing so then need to share this pain with
Martin Thwaites 34:34
I felt the pain therefore, you will feel the pain too before you get the nice tools. It's like say no, no, usually, you're gonna do some gardening right? So here's a rock. You're gonna have to chisel that into an axe first. Before you can then go and mine the iron to come and get an iron axe before you like, no, no, just these technologies exist. They're mature. They're really, really easy to use now. And the opentelemetry community has spent a lot of time on easy mode buttons on Getting Started docks, we've spent so much time on trying to make it so that people can easily do all this stuff. And, you know, why aren't people using it? Well, it's because they used to logs. You know, it's console dot write line, it's console dot log. This is the first thing we teach people. And am I there's a better way to the better thing.
James Eastham 35:32
I think quite nicely on something else I think I've heard you talk about before mighty, which is observability driven development are bringing observability into your unit tests. Yeah, I've done this kind of thing before, after the conversation we've had in the past where you literally write your unit test your spin up a server interested something in memory at the end of a collection of traces in memory within your unit test, and then use that to determine what core path was taken. And then when you push it out to production? You've got the traces there.
Martin Thwaites 36:04
Yeah. So now I so there was a blog post recently on the honeycomb blog around what we call an odd, and specifically what odd is not and what it is, which was my colleague, Jesse, Jesse Tron, basically having a rant about the D in odd, because we don't need another DD. So it was over really doesn't drive your development. Well it does do is influence things. So when we talk about testing our applications, if you think back to where agile originally came from, it was from extreme programming. And extreme programming was all about feedback loops. It was all about how do we get the fastest feedback loop so that we know that we're doing the right thing. Now, if you think about the the far outer feedback loop, that's your customer, telling you things are going wrong, things don't do what I expected. You think about the most inner feedback loop, also the fastest feedback that we can get? Well, it's probably the linter really just telling us that things aren't wrong, then we've got the compiler. And then we've got really low level unit tests. And all of those, they require a lot of work. And what I was developing there before, when I was an independent consultant, spent a couple of years building a bank from scratch, I was literally commit one on the first service repo that we built. And what we established was that they can more about completeness and accuracy than they do about code. Like I can go and show them 4000 lines of code and they don't care. What they do care about is the fact that everything goes right. They don't care about the class that's got 4000 unit tests on it, all they care about is if I hit the transaction endpoint to create a transaction, the balance goes down. That's the thing that they care about. Now, there's certain things inside of there that I can't test, if I was to spin up my lambda, and just hit the lambda execution endpoints. I mean, if we're running something like a web API in dotnet, and we're using lambda to then create all of those functions for each one of those endpoints, I can just use the web application factory to test my endpoints. But there's certain things I can't test. But there's also certain things I care about in production, knowing whether things go right. So using observability. And putting observability in my code means I can actually assert against some of this stuff. So what some of the examples I use of, I'm presuming people understand what the strategy pattern is. And the strategy pattern is the idea that if I make a call to something under the hood, it will choose a different class path. So it might choose the A say pricing strategy is a really good example of this, where if I go in, it'll choose a pricing strategy, that is my price plus 50%. Or it might choose price plus 10 pounds, or it might choose price divided by two. There are different pricing strategies that might be attached to a different category. Now, if I was to test from the outside, I don't know which pricing strategy it's used. And that's where normally people would write 4000 unit tests around this pricing strategy code to make sure that all of these edge cases work. Well, ultimately, what we can do is we can do this from the outside, if we can test those inner bits. But in production, I would also want to know this. In production, if I'm debugging why the price isn't right on a particular product. I want to know for that particular request, which pricing strategy was used? And how am I going to know that? Well, telemetry is what's going to tell me that. And if I care about that, locally, I care about that. If I care about that in production, I should probably care about it locally. Therefore, I can actually bring observability into my test flows. And start inside of my unit tests, spin up a web application factory, hit the Get Products endpoint, and then assert that actually, there was a pricing strategy hit the the ad 10 pounds pricing strategy, because sometimes the price might be the same. You know, if, if I've got a, I don't know, let's do the maths, if I've got a 20 pound product, adding 50% to the product and adding 10 pounds are exactly the same thing. So from the outside, I still end up with a 30 pound product. But I don't know whether it's used the right pricing strategy. So I can use that from an observability standpoint to say, well, I care that in production, I'm going to be able to see that. So I'm going to bring observability into my local development workflows, and start using it that way. That's one way to use observability driven development. The other way is to bring in when you've got microservices if you're building a distributed monolith, and as I was saying to somebody recently distributed Monolith is not a bad term, that they're not a bad thing, distributed monoliths are perfectly fine. Building a distributed monolith and saying you've built micro services is a bad thing. But building a distributed monolith isn't a bad thing. It's hard to test them. So bring in observability into your distributed monolith. And then when you're running things locally, and you spin up a Docker container with a Docker compose script with seven other services on it, being able to see how things transition through all of those services locally is really, really useful. So you can bring observability into that. The third way you can use this is to actually understand how execution paths work. So we had an example recently where we just deployed our new query assistant, which is a large language model that's built off observability data, and how people query systems and stuff like that. So not just an API, Collins, open AI, oh, hey, we've actually built our own language model around it. And some of the things we noticed, were the performance impacts. So if somebody types some things in, and it has a problem with doing it takes five seconds to return. And we saw this big massive gap in the telemetry. So here's the whole roll request. And then we got this thing here. And this thing here, there's a big thing in the middle. And we would notice in that locally. So then we've dug into it, add some more spans in the middle of that, to work out what's going on, it turns out, it was our tokenization mechanism that was causing a problem, ran through that great, added those bands and allowed us to then bring that back and optimize that thing. And then you start using that in your workflow. It's all about using that telemetry data to help accelerate what you do locally. Because more data is better, always. Having more data about what's going on, is never going to be a bad thing. It's always going to be a good thing.
Francois Bouteruche 42:38
I have a question for you. Maybe going back to begin with or discussion. A few years ago I was working at in the crop lending company. Just drawing recently. First using bean has in the in the first month while Okay, we have issue in production, and we would like you to help understand what's going on. And first thing I did is just look at the IIS server logs. And just realize that more than often the firewall errors, so 500 arrows in the log now okay. Do you know what's going on there? I think to the team, do you know what's going on there? No, we know they are there. For for the varying meaning, but we don't know what what it is. Okay, but if there is a five unloader arrow, probably the end user is seeing something wrong that and we started to, to look at it and we realize that so it was in the customer acquisition funnel on the website. At some point, some paths were not properly in them. So it generated 500 It was generating an exception on the server side. And so we were losing customers. So we must the worst
Martin Thwaites 44:13
time to have an error is why you're trying to acquire a customer by address.
Francois Bouteruche 44:20
And so I will put this to the CTO it's a so look at this, we are losing about 5% of all the customer entering the the acquisition flow. But my question for you is how how do you educate software developers to taking care of their logs? Whatever it is logs of distributed tracing my experience is that not everyone is really interesting or not interested but not everyone understand the importance of understanding what's going on in the law because So how do you educate people on this? Or do you will create awareness about the importance of distributed tracing from a business perspective.
Martin Thwaites 45:07
Martin Thwaites 45:07
So the, the best way I found is actually to cause pain. So once that actually causes them active pain, because in that scenario, it wasn't causing them pain at all. Yeah, there's 500 errors in the logs, but me, you know, not not my problem. As soon as you start maybe alerting somebody and getting somebody up at two o'clock in the morning to go and fix them, because there's too many hours, you'd be surprised how quickly things get fixed. But that's, that's the tongue in cheek version. Ultimately, the it's all about what we're talking about the start is that ability and knowledge to be able to see what's going on. And if they don't know how to go on to the IIS server and see the IIS logs, if they don't know how to get those logs off into a centralized logging solution to be able to then query that centralized logging solution, they don't know how to do telemetry, if they don't know how to do distributed tracing to get all of those in there, then they're not going to do it, they don't care, because it's not something they can do easily. So the things that they can do easily is just write some new features. The things that are harder to them harder for them, is to be able to go and query that data to be able to slice and dice that data to be able to find where the common things are. And that is where it comes down to making it so that all of that data is available to them is easy to query is able able to surface anomalies, and errors and outliers and all of that kind of stuff really easily. And all of a sudden, if you were to tell the same developers, that here's a little query that you can run against your telemetry data, that tells you exactly which line of code is causing that 500 error. I guarantee that they would have fixed it a lot earlier. If they could see it on a an SLO is firing, saying that we're having too many errors, and they could click on it and go, well, the errors in this class here. And they go and look at the class and go, Oh, it's this line here, I'll fix it. It's not that they don't care is that they don't know how to be able to query that data. And they don't know how to be able to find it, how to be able to debug it, you give them that error locally, you give them some repeatable steps on how to be able to replicate the issue locally to go and fix it easily. It's not that they don't care, it's likely that they don't have access.
Francois Bouteruche 47:29
So it's all about transparency on what's going on in the position, like you said before.
Martin Thwaites 47:35
And I think this is endemic in the in the dotnet community more than it is in a lot of other places. I've served for quite a while we've been behind the curve in things like DevOps, and production automation of applications, we've in the dotnet community has been quite a lot of what somebody else goes and deploys to is because is is really hard. And somebody else is going to do all the is work for me. So I don't really need to care about that. We're starting to move now into a world where actually people are like, Oh, but but I need to, I need to support my own software. Where's the ops team? Well, the ops team are now part of your team. And you do all of this together. It's like, but can we just not hire somebody to go and do the production stuff, because I just want to type the type of some features. And ultimately, the teams that succeed in this are the ones where it's really easy for them to support production, because they know where to find the data, it's really easy for them to deploy and access production to fix it. If you put too many gates in, then they'll fight against it. If you make it easy, they'll go, oh, this is great. Because you know, I used to get this bug fix from somebody that goes, Dave's having an error. Like I've literally had that come through in a ticket. So Dave's having an error, name obfuscated for obvious purposes. But you know that what you really want is like Dave's having an error. When he's been online for an hour, he's done these seven steps beforehand, do you then clicks this button here with this bit of data in this field, on this tenant, with this particular product in his basket, and you go, Well, I can replicate that. So if you can actually give them all of that data, then people end up wanting to support production, because actually, they can actively go and do a query and say, Well, is it always when it's this product? Is it always on this page? Is it always with these prerequisites? And you can start to narrow it down and go, well, it actually makes my job easier as a developer, because I can spend way less time actually debugging and trying to find the error and way more time actually fixing it.
James Eastham 49:46
Yeah, I think from from my experience using opentelemetry, at least with dotnet. Like it's so easy to start to, you know, start to span. The other two other things. You've got the kind of collectors and all the stuff around it, actually adding that to your core is pretty low. Blyth really isn't it? You know, it's part of your just development work, and do activities. applicative
Martin Thwaites 50:05
super addictive, like adding extra context and adding the extra spans and going, Oh, this is interesting. Let's add this bit on there. It gets super addictive,
James Eastham 50:18
without the ability to get your thoughts on this. So when I've done that, in the past, and I've gone really gone to town on my annotations, and adding additional data to each span it I find it clutters your code up somewhat, shall we say. Because you end up with, you know, three times as many trace annotation lines as you do actual business logic, you can sometimes get hired to see the wood from the trees, I guess, extension methods, you have any have any ways you can start to get around that maybe it's an education thing. But I don't know if you've got any experience in doing that and making that a bit easier.
Martin Thwaites 50:52
It's all about extension methods. And people, people don't spend enough time on test frameworks. They don't spend enough time on telemetry writes your extension methods, you want to annotate with product information, create an extension method for activity that goes activity dot add product information and pass in the product. And it will add 10 attributes onto there. You know, this is the way to do things that make it really interesting. If you want to add common things, use filters, you know, user action filters that allow you to be able to add stuff dynamically. So it's not in the main path of your code. But ultimately, if these things are interesting to you, if these things are important, then they should be visible in your lines in your code. I did do an entire talk shameless plug that find it on YouTube. But I've done a practical opentelemetry course, which is all about best practices, specifically in dotnet. With where to create activities, how to create activities, all that kind of stuff. We don't have a link, but I'm sure people can find that if you Google me. But yeah, it is it is a hard thing to get to grips with. And, yes, there are best practices. Yes, it does make your code a little bit worse in that respect. But so does it loglines It's no worse.
Francois Bouteruche 52:14
I'm looking I'm trying to find very quickly.
James Eastham 52:18
It was it was NMDC Meissen. Yes, it
Martin Thwaites 52:21
was NDC Oslo at the end of last year. I'm also doing it again in Copenhagen this year, in August, and impatto in October. And the targets different every time because Best Practices evolve. As the best practice, I think at the moment, it's an hour and 20 minute talk. That, Hey, there we go. But at the moment is an hour and 20 minute talk when I did it in Oslo it was 58 minutes. I'd managed to get rid of some stuff. I did it again. Last week in Iraq in the Netherlands and it was 49 minutes. I managed to fit it just in the slot time.
James Eastham 53:03
The point you made about you know, it's no different a logline is a really interesting one, actually. Because I think you know, at least historically, I've been like probably many developer just obsessed with logs like logger dot log information everywhere, log all the things but yeah, I guess adding trace and annotations that ways it's not defense of logs. And really that what you just said about the extension methods, that's a really neat. I said domain friendly way of adding logs, right, you know, like, say add product information and you pass in the product. Okay, well, I don't necessarily need to know what that does under the hood, if I'm just a developer using that. So I really like that. Certainly be using that myself.
Martin Thwaites 53:40
Yeah, I wrote a logger implementation recently, which I called death to logs. So basically, you can do logger dot killer logs. And what it does is every time you add a log line, it just takes whatever your state is and adds it has properties on to the current activity. And if you do look it up in scope, it creates an activity, which is really nice way of getting people away from using logs. Because as far as they're concerned, they don't they're using logs. It's like fine, you use logs, but we're using tracing.
Francois Bouteruche 54:11
I have a last question, because we are approaching the end of the show. I have one last question for your mother about performance. And I think you've written a blog post on this about those opentelemetry as an impact on a new performance. So
Martin Thwaites 54:31
the ultimate answer is you add anything in a code path. As an if statement as a creation of an object. Yes, you are getting a performance impact. The reality is adding an activity is about 400 nanoseconds. For every annotation tag you add to it. It's something like five nanoseconds, like if you're optimizing at that level. Don't use not that. Sorry if that if that's if you're worried about adding like, I don't know half a millisecond to your code paths, then you shouldn't really be using dotnet. You know you there are more performant things out there, you know, as James said at the start, about five to six milliseconds, is what you're looking at the shortest possible thing that you can do in dotnet, you add network latency in there calling out another service call enough to push something to SQS. Putting it in Dynamo, all bets are off, you know, you're already talking like three, four or 500 nanoseconds, 500 milliseconds. And you're worried about having half a millisecond on to that. Yeah, it doesn't add a performance impact. So that what that means is the benefit far outweighs the impacts here. And I've done an entire blog post on that as well. So have a look at the honeycomb blog. I've put loads of stuff on there about dotnet. And getting started with opentelemetry. On that.
James Eastham 55:48
Front side, we have time for one last quick question, because I've noticed this one that's coming on the Twitch on the actual chat on Twitch. Yes. So the question is Martin, there's a lot of love for Honeycomb in the in the question, but the question is what structures need to be in place before production testing is considered? Okay. So question from NZ Laura.
Martin Thwaites 56:06
So production testing is always okay. When you want to rely on it, without doing local testing, never always do local testing, production testing, using all of that production data is always something that you should do, but it doesn't replace anything that you did earlier in the pipeline's whether it's unit testing, whether it's integration testing, I did an entire post around getting rid of unit testing in favor of component testing. But that's not the same as production testing. That's still testing locally, still testing, stuff in pipeline, all of that kind of stuff is still important. But doing production testing, is just something that you add on. And that's where you get the real data. That's where you get what customers are actually doing. That's when you get the real bogus data. That's how you do a comedy set, like call back to the start.
Francois Bouteruche 57:02
Francois Bouteruche 57:02
Yeah, nice. Very nice. Okay. Thanks, Robin. So we need to close the show. So we can find you on Twitter. People can find you on Twitter. I'm sure you. You're open to their question. Yep. you message me? Very nicely. You're on LinkedIn. Oh, it's
Martin Thwaites 57:29
just rolls off the tongue, maybe four, four or 5112120. He's just rolls off
James Eastham 57:34
top. I know you. Remember, I'm thinking about who's that person. I need to speak to ice Eddie Bravo 5121.
Francois Bouteruche 57:42
And, of course, people can navigate to open dimitri.io. Maybe you have a few words for the open telemetry project you speak about? Yeah, it is
Martin Thwaites 57:53
the de facto standard. It's the number two project from the Cloud Native Compute Foundation. You may have heard of them. They did a small project called cube Cooper Cooper something. Yeah, I mean, some some container ships or something. It's like shipping stuff. I don't know. But yeah, it's their number two project. It's the de facto standard for telemetry already. And it'll be their number one project by the end of next year.
Francois Bouteruche 58:17
Okay, great. Thank you, Martin, for joining us. Thank you, James for for being our cause. Today, I just want to highlight the dotnet foundation survey is currently running. So take the time to infer the survey it will help the dotnet foundation to have a better view of the community and to better serve us. So thank you, everyone, and see you in two weeks for the next episode of the dotnet show.
Continue to The .NET on AWS Show, Featuring Laïla Bougriâ!
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.