The .NET on AWS Show, featuring Mauro Servienti!
Today we are joined by Senior Solutions Architect, Mauro Servienti from Particular Software. Join us as we learn about NServiceBus, Messaging, and more!
Brandon Minnick
Amazon Employee
Published Oct 28, 2024
Loading...
Brandon Minnick 1:13
hello everybody, and welcome back to another.net on AWS show. As always, I'm your host, Brandon Minnick, and with us today is our amazing special guest host, Sally. Sally, and welcome back.
Salih Guler 1:26
Hey everyone. So I'm here again. I know you have missed me, so I'm back for more AWS wisdom. You know
Brandon Minnick 1:36
that's right. Yeah, we loved you so much on the last show, we wanted to bring you back, although you have to leave a little bit early, right? It's we only get you for 30 minutes today.
Salih Guler 1:45
You know what? I think I should be fine. I made my arrangements. Hopefully everything is all right, so I will be here for full hours.
Brandon Minnick 1:55
All right, yeah, cancel those other meetings. That's right, exactly. This is time for the.net on AWS show. Well, Sally, I'm so glad you're here. Have you been How's how's your week?
Salih Guler 2:06
Well, it has been crazy, as you know, we call this era pre invent. So we really heads down, focused on the content and everything that we will be delivering to you during reinvent and many of the opera advocates their heads down, creating their talks, or whatever they are working on. These are like redacted, but definitely quite busy. How is it going with you?
Brandon Minnick 2:39
Good, good. Yeah, we were joking before the show, because usually this is when we make announcements like, oh, we just released this on AWS today, and we're joking around that there's not a whole lot of new announcements in the weeks leading up to reinvent which is the huge, huge yearly conference that AWS throws in Las Vegas. If you're ever looking for an excuse to go to Vegas and you want your boss to pay for it. Come hang out with us and reinvent it's where you can find out about all the new features that are coming in, all the latest greatest stuff. You can meet us hang out with all the folks that you see around the internet and our little tech circles. But it'll be a lot of fun, especially if you like Vegas, especially if you love massive conferences that are so big they literally take over the city. But, yeah, no, my week's
Salih Guler 3:32
interesting part is, the interesting part is, right, like one week before, there's also Formula One race, and I don't like if anyone likes it. So the city for two weeks in a complete mayhem, a complete crowded area. And last year there was another one, a college football game as well. And oh my god, but it is definitely an experience to live more honestly or enjoy every year, and meeting everyone there is a great opportunity. And for attendees, it's amazing. You can learn a lot, and all the new things coming out is definitely worth to check out.
Brandon Minnick 4:17
Yeah, is that f1 race happening again this year? Because yeah, yeah, last year they walked off the roads like you couldn't. I couldn't get across the street from my hotel. So format, crazy detours, wow, all right, well, all sorts of fun stuff to look forward to
Salih Guler 4:35
watch out for that, you know, oh,
Brandon Minnick 4:39
man. Well, I'll tell you what. Sally, I wanted to share this with you, just in case you hear it in the background, I said, my week I got to I bought a new network attached service, or network attached storage, Nas, for my home to just back things up. And I thought I'd save a little money by putting in. Um, hard drive disks, instead of SSDs into it, because I was like, I it's just for backups. I don't need, like, the speed, and it'll be fine, but it's here my office, and I forgot how loud hard drive disk. So I This is really embarrassing, but I literally thought that something good, like a critter, had crawled into our attic because I'm hearing the second story, kind of see the slow proof of my video. And I thought I had a scroll or something, because I heard, just hear this, literally called pest control. Yeah, yeah. The guy comes out. He goes, I think that's the problem. And he points at my new ass, like, I'm such an idiot. I just literally haven't owned a HDD instead of an SSD in so many years. I forgot they made so much noise, so I made it about a day and a half, and then I broke down and bought SSDs. They haven't arrived yet, but if you guys hear clicking or scratching in the background. It's, I promise. It's not a scroll my walls that was confirmed by pest control. It's my network attached storage that I love. But, man, it is so loud.
Salih Guler 6:14
No, you're good right now I think the sound is perfect. So if I hear something, I will definitely let you know
Brandon Minnick 6:22
that's okay. That's okay. You know, we have such an amazing guest today, we should need to, yeah, if we've had guests from particular software on before, and they highly, highly recommended bringing our guests Maro on today. Maros, a Solution Architect also at particular software the makers of end service bus. He spends his time helping developers build better.net systems to lever leverage service oriented architecture principles and message based architectures. Maro, welcome to the show.
Mauro Servienti 6:58
Thank you very much. Thanks for having me. That's the most important thing. Thank you. On the on the network attached storage finger, I used to have one, and that's the main reason why I removed it, because four disks, and those four hard drives were constantly moving around the blocks for the radio generation to be aligned, and it was, yeah, especially at
Brandon Minnick 7:28
night. No joke, had to put in noise canceling headphones just like get through the day. But I'm glad you feel my pain, Mario, for folks who haven't met you yet, Maru, who are you and what do you do?
Mauro Servienti 7:43
Well, as you said, I'm a solution architect, at least that's what my business card says for particular software. And I used to say that that's what my business card says, because the particular software, everyone does everything. So as an employee, I get the opportunity to work on marketing related activities from time to time, write some code, help customers in my solution, architectural architecture, role in designing their systems. I do support. So I do, I mean, sort of everyone is a sort of a jack of all trades in the organization. So but from the customer facing perspective, I present myself as a solution architect, and from time to time, I also talk, give talk at conferences, or present webinars or deliver workshops, like I did the customer for in the last couple of weeks a couple of times. And yeah, that's it. That's all about me. I mean, in my private life, I enjoy bike riding a lot and swimming and spending time with the little six years old. One.
Brandon Minnick 8:55
Love that. Hello. We were talking before the show. Mara mentioned he's a little out of the weather thanks to his six year old, but I think you sound great. We appreciate you pushing through it. Yeah, it's so exciting. Yeah, because we've had other guests from particular software. Daniel Layla, so if you haven't caught those episodes, make sure to go back in our catalog and listen to those as well. But Mara is going to share all sorts of goodies for us today, from particular including in service bus, and we've got a whole sample we'll be walking through. But Mara, before we jump into that, there's one question we love to ask everybody on the show, because we are a.net show, and that is, how did you get started with.net
Mauro Servienti 9:43
huh? That's a long story that they can summarize very briefly. It was 1999 so I'm 51 right? I'm getting old, unfortunately, which is a terrible thing, by the way, but it was 1991 1999 Them, and I was working for a small software house, and we were using ASP classic. And because we hated ourselves, we were also using ASP classic with JavaScript instead of the default VP script for the scripting part. And we were subscribed to MSDN universal, the MSDN universal subscription from Microsoft. And at the time, we were getting CDs, so more or less on a monthly basis, we were getting 10s or hundreds, even of CDs with all the new releases or ranging from Windows operating systems to developer tools to services, servers and services and whatnot. And all of a sudden we got the.net 1.0 alpha one as a set of CDs, and we started looking at the thing, saying, What is this thing? And more or less, at the same time, a customer came to us asking for rewrite, for us to rewrite their own sort of internal management system. It was something like in hospital from the high level perspective, and they were looking for a new management software, and we were young and crazy, and so we embarked on this journey to rewrite it, using the alpha two at the time of.net one, which, interestingly, all the name spaces were not system, but were still Microsoft dot something. And between, I remember if it was beta one and beta two, or beta two or beta three, Microsoft changed all the name spaces, and it was a nightmare to fix all of them, because obviously, at the time, refactoring tools were not so good as they are today, so fixing all those computation issues was not fun at all. But that was my approach. My Yeah, that was the way I learned about dotnet, and it all started there.
Brandon Minnick 12:01
I've never, I never knew that. I never knew that everything used to be Microsoft dot crazy, yeah, because you, you went out of your way, like, Hey, here's a new tool. We'll we'll jump in. We trust it. And then Microsoft immediately breaks it, which is crazy, because I feel like since then, Microsoft has gone out of their way, certainly with.net and C sharp, to not introduce any breaking changes. Like, you know, nullable null ability is introduced as warnings, although every project I make those errors, like, there's a new field keyword coming up, I think it's in preview in C Sharp 13, which is also a potential breaking change that's gonna be opt into. So, yeah, to hear that.net or C sharp broke something is,
Mauro Servienti 12:54
to be honest, is still, it was still a beta kind of release, so it was not your official one. So that was probably expected, right? But we were sort of brave, or, if you will, stupid,
Brandon Minnick 13:12
that's Yeah, crazy, right? Yeah, push out a whole product when
Mauro Servienti 13:19
tool chain, yes. And by the way, we made so many mistakes. We made so many mistakes. Yeah, we're like, like, like, a group of excited kids and looking at the shiny new thing, and we said, we want to use it all. We want to use it all. And so we started using, I remember that the first thing we looked at was, okay, how do we how do you do data, data access in in this.net thing, right? Instead of using ODBC, which we've used before, how do we go about using, oh, what is this new shiny thing called data sets? And we started using data sets in the worst possible way, like, a memory representation of your database, loading all the stuff at the application startup time, causing all sorts of issues on servers.
Brandon Minnick 14:11
I feel like that's that's one of those things that, like, it works at first, and then all of a sudden, like, the server just comes to a crawl. And it was like, what happened? Like, I don't know. I didn't change anything. Oh, we got more users.
Mauro Servienti 14:31
Yeah, and if you think about it, it was 1999 so essentially, there was no Internet, there was no Stack Overflow, there was no there were no forums, there's nothing. The only option was books, but we were talking about a preview kind of language and SDK, so there was nothing about it. So we were just groping in the dark, trying to understand how to work around
Brandon Minnick 14:59
it was. Was Visual Studio with IntelliSense even out for those, there
Mauro Servienti 15:04
was a there wasn't accompanying Peter Eliza. Visual Studio coming out. Yeah.
Brandon Minnick 15:10
Wow. So literally, just just feeling around. What is it? What does IntelliSense show me? What can we try? Oh, that's incredible. Mario. I feel like anybody who has any crazy questions, Mara probably has been through all probably so long ago. He's forgotten about all of them, but yeah, that's you've definitely earned a badge of honor. If you
said, you work at a particular software and you've got so much cool stuff to show us today. What do you want to jump into first?
Mauro Servienti 15:51
Yeah, so the first thing I'd like to briefly talk about is what a service bus is, in larger sense, what the particular platform is. So in particular software, we we constantly work on a thing called the the particular platform, which is composed by four things. I don't want to call them products, because they're not really products, that they are interconnected with each other. And those four things are, well, the most famous one is a service bus, and the other three are sort of monitoring operations related kind of tools and Service Bus, for those of you not know what it is, is essentially a messaging framework that sits on top of an existing queuing system and adds a lot lots of functionalities on top of that queuing system. So for example, let me use SQS and SNS as examples. One of the things we do we add on top of SQS and SNS is a thing called routing. So we enable the configuration of what we call an Service Bus endpoint that represents the service, representing a business capability kind of thing, to route messages to other services without the without making it so that the developer needs to know about the details of the underlying tune system, because we take care of all of that. We take care of serialization and deserialization of messages, for example, but we also take care of more advanced kind of functionalities, like retries and failed messages. Let me talk about a little bit what retries are. So for example, when you are receiving a message, one of the common thing in the in cloud environments or in the distributed systems kind of world, is that what might happen is that one of your dependencies is not available when you need it, right? So you receive a message, you pick it up on a message from the queue, and then you try to reach out to a third party web service, or you try to do a SQL query, and the SQL database is too busy, or the third party or service is not available at the time. So those are those kind of transient kind of failures. So retrying is the most common way to solve those kind of problems, and then service bus takes care of that. So for example, according Service Bus draws is it immediately retries, by default the message for five times. If it fails for five times, then it backs off for five seconds, and then it tries again for five times, and then it backs off for 10 seconds, and then it tries again for 15 times, sorry for five times, and then it backs off for 15 seconds. And then if it if it exhausts all these retry options, then it moves the message into an error queue, making it so that the service is not causing the denial of service to itself due to a failing message, but it can continue processing messages that are backing up in the queue at that time, the at which point when the message is in the arrow queue. The second part of the particular platform, those monitoring tools I introduced previously, kick in, and there is a tool called service control that picks up messages from the arrow queue and stores them into its own database. And then we have visualization tools that allow you to manage failed messages. So for example, you can go and then, I'll never demo it. You can go to a tool called service pulse, and service pass shows you failed messages the reason why those messages failed. So when in the.net world, they show you the stack trace and the exception message, and then you can retry those messages from service was and those messages will be replayed into the queue where they failed, originally, assuming that the bug or the temporary failure was fixed, at which point it's safe to try them. And there are various options. And one option is even to edit the message before retrying, because you might realize that, oh, there's a bug in the sender. With the message, so the sender sent something that was containing a typo, for example, let's say a credit card number without the issue. And that never happens, but that never happens, and then you can edit the message or multiple messages before retrying them. Another interesting feature that the service bus provides is called auditing. So you can set up in service bus to audit, to an audit queue every single message that is being successfully processed by an endpoint, and again, from that audit queue, Service Control will consume all those messages, store them in its own database from through a couple of functionalities, provide visualizations on top of your distributed system, which to some extent, is very similar to what X ray does on AWS or other open source kind of tools like Jaeger, for example, using the open telemetry standard nowadays. However, the main difference is that the service control visualization style is more related. It's more logical, so it's more higher level than the infrastructure level that X ray provides, for example, or any kind of visualization tool provides, because the only option for those visualization tools is essentially describe the physical structure of the system, saying, Oh, I have a node here, which is a container, and that container sends messages over there, and then there's an HTTP call going to that web service or to a lambda over there, which, from time to time, is not that interesting to people. But if you're looking for more, higher level kind of the architectural documentation you need more of a logical view of the system. You don't care about instances, for example, you don't get that. You don't care much about the fact that something is being deployed into multiple instances. So you're scaling out something. You're more interested in saying there are messages going to my shipping service, right? How is that shipping service deployed in that specific scenario? I don't mind that much, right? So that's the main idea of those visualization tools. And auditing helps keeping track of all the successfully processed message in a system. So it's a sort of an audit trail or a loaded log, if you will. But then another thing that the service bus provides out of the box is the thing, you call it the outbox butter. So if you, for example, if you, if you, if you live in a software let me call it an unreliable kind of environment, so an environment where you don't have any transactional kind of option. So you're exposed to these kind of problems where you're consuming an incoming message, and at message consumption time, you're trying to store something in a database and then sending out two messages. Those four operations cannot live in a transaction. So there are chances that you're succeeding, storing stuff in the database, sending out one of the messages, but then fail in sending out the second one. Now, what do you do and retrying? Because you might find that you said, Oh, now I failed with the database. What about now? So inconsistency plays a big role in that case. So the outbox pattern allows you to design more reliable systems in that sense. And then Service Bus provides a trans transparent implementation of the output the outlook spotter, where the only thing you need to configure is, where do I store the outbox? And for example, AWS, we support DynamoDB, so you can, you can store the outlooks on DynamoDB, or if your business data are in MySQL, on MariaDB somewhere, or in Aurora, then you can use that to store the outbox button it's already at configuration time and everything else is transparent to users. So essentially, you move from exactly once delivery semantic, which was typical of transactions based queuing systems, to an exactly once processing semantic, so messages can be delivered multiple times, but then from your business core perspective, you see the message on once, because the infrastructure takes care of deduplicating the incoming message for you. And so, for example, thinking about the way SQS works, there's a little there's a little chance that under very high throughput, when you have lots of nodes consuming from single from a single queue, there might be cost messages. So two nodes, they see the same message at the same time, even if the message was sent only once. And the outbox helps you protecting from that kind of scenario, right? So that helps in solving those kind of problems. The further important thing that the service bus does is what we call a service bus. Cycles. So it's this, it's, it's an implementation of distributed workflows. So essentially, the idea is that messages drive you towards a sort of a stateless kind of word.
But unfortunately, the real world is not stateless at all. Right? It's easy to think about distributed system. Oh, everything should be stateless, yeah, sure. But then when you try thinking about, how do I model this business thing into a message based system, you realize that well, but I need state mostly everywhere, right? And so sagas or distributed workflows are a way to introduce state state management into a distributed system. So think about them as a sort of a state machine where messages are triggers trying to change the state of the state machine, and if the change succeeds, then the state machine can publish other messages in the form of events, typically saying, my state is now B instead of a than it was before. And so other parts of the system can react to that. The there are tons of different ways to implement those distributed workflows from the architectural perspective, one of which is the saga pattern. But it's not the only option for a second. We call them sagas because the saga pattern is the one we love the most. But it's, it's all code based. So you can be implementing whatever you want, essentially, because it's, it's a C sharp, so you can be designing your architecture, architecture, your distributed workflows, as you like,
Salih Guler 26:43
I as the person who has the least C Sharp Experience, have a question. Everything that you talked about. They also feel quite technology agnostic. They feel like really well created. So my question is, I'm a C sharp to output. This is perfect for me, but that this also feels like it can be used to solve many issues or any problems that we might have with our other own infrastructures that we built up with other languages as well, right?
Mauro Servienti 27:17
Yes, absolutely. Yeah. Service Bus is designed for.net so we started with C sharp, and we were still a.net kind of shop from time to time. We got, we got, we get questions from customers saying, what about the no DJs implementation? What about the job implementation? What about the Python Python implementation? But that takes a lot of effort for organizations to build up that those skills and knowledge over time, and it's very difficult. So yeah, of course, yeah. And on the on the agnostic thinker, IT Service Bus is essentially an abstraction on top of an of an existing Q and technology, and we try to provide feature parity across different technologies. So for example, let me use a couple of well known examples, right? So in on AWS, whenever we need to send the message, we can use SQS whenever we need to publish an event, it's probably better to use SNS, because that's designed to for to broadcast events to messages to multiple subscribers, right the from the coding perspective, you don't want to know as a developer, am I dealing with SQS now or SNS now? And the Service Bus makes it totally transparent to you, so you just use the API to send the message, or use the API to publish a message and under who the then service bus takes care of deciding what to do with what service, which which with which service. When it comes to Azure Service Bus on Azure, Azure Service Bus provides both queues and exchanges topics. So in that case, from the high level perspective, from your perspective as the developer, as developer and service behaves exactly in the same way. You don't have to change anything. That doesn't mean, though, that you can move your system easily from one implementation to the other, right? So and there, what plays a big role is the kind of transaction guarantees the infrastructure provides, right? So if you think about the let's say MSN Q on Windows, plus SQL Server on Windows, you can rely on distributed transactions. So you don't need the outbox at all, right? It will be exactly once transaction, transactional delivery. No way you get duplicate messages and then no way you need to deal with, oh, data storage failures or partial failures. Can. The thing, right? But then, can you move that code to, I don't know, rabbit and queue plus mobility? And the answer is no, because your code will assume that transactions are a thing that guarantees exactly what's delivered. So as soon as you move across those infrastructure with different delivery guarantees, you need to be very careful that your code, that if your code is completely idempotent, so you can invoke your message handlers multiple times without any kind of side effect, which is an extremely difficult thing to do, then you could be moving across different infrastructure implementations freely. But I never saw that happening in the real life.
Brandon Minnick 30:45
Yeah, it's interesting. As long as your code is 100% idempotent, it's like, is is anybody's code? Like, is that possible? Mara, we've got so much good feedback coming in from the chat. We've got a comment here. I was asking about So, yes, life of disdain. Sally and I work at Amazon for AWS. But Maro, how about you? Your particular AWS partners?
Mauro Servienti 31:15
We're not AWS partners yet, I guess is the and I obviously don't work for AWS, so we, we have a lot of we had a lot of contacts in AWS, and so we exchange a lot of opinions and talk a lot with a large group of people inside AWS in order to understand the direction of some services are going, and what not about the evolution of those services, in order to understand, is there anything we need to take care of before it explodes in our faces without knowing, not knowing anything about it? Yeah, those kind of things,
Brandon Minnick 31:59
of course, right, yeah, yeah. One thing I've I found since, since joining AWS is a AWS very much feels like it's made for your your cloud architects, your folks that are managing infrastructure. And something I love about particular is it? It kind of abstracts that away, and like you were saying, it's more geared towards developers, you know, Sally and I, we're both software engineer for both Well, we are both software engineers, but we're specifically mobile engineers. So we're used to building front end mobile apps, and, you know, we interface with the cloud, and so I might be more of an expert than I am on cloud, but, you know, I can build my APIs and my databases, but am I an infrastructure expert? No, my database expert? No, like, I'm a software engineer, and I just want it all to work, and I feel like that's, that's what we get with with tools like in service bus from particular is that you you can think less about all the infrastructure and everything set up.
Mauro Servienti 33:10
Yeah, absolutely. So one of the things that, for example, about service bus sagas, so as for the outbox you can configure in Service Bus sagas for DynamoDB or the various flavors of SQL databases. AWS provides, we take care of the of indexes of the table layouts or the columns layouts in in DynamoDB and whatnot, and we provide a few configuration options. So for example, when we designed the DynamoDB persistence logic, we got feedback from customers saying, Yep, but we're using the single table design approach for DynamoDB, which essentially makes it so that the entire system uses a single table in DynamoDB and then different schemas for every row in order to simplify queries and whatnot. So we opened the configuration options primarily for DynamoDB, providing more options for customer to say we want to opt in into the single table design so that US service bus adapt to our storage layout, rather than us adapting to your storage layout, which is less of requests in the relational database world, for example, or even in MongoDB, which provides different options when it comes to queries and whatnot. So those are the and on the you need to care less about the infrastructure. That's true. That's one of the our valuable position is focus on business code, we take care of the rest, even if on large systems, especially when it comes to storage layout, you need to be aware of what you're doing, because everything works on your machine. And then as you as soon as you know the system in production data, realize, oh, that doesn't work.
Brandon Minnick 34:57
Works anymore. No. That one environment, Famous
Salih Guler 35:02
last words,
Mauro Servienti 35:04
exactly, exactly.
Brandon Minnick 35:09
I love it. Well, Mara, I know you mentioned you had some demos or code to show off. Shall we jump into it? Yep,
Mauro Servienti 35:17
absolutely. Let me briefly share my screen and time screen. There we go. So one of the things we did recently was to build a demo. Well, it's more of a showcase rather than a demo. That we started from this Gregor hope is loan brokers example. And the example is, it sounds very simple, and it's, it's a typical business use case. So you have a loan broker clients asking for loans, and the loan broker reaches out to banks. Well, first reaches out to a credit bureau to get your credit score, then reaches out to banks for loan, for loan quotes, and then response to the response to the customer, saying, here is your loan, or no one responded, or no one provided a quote. Those are the, essentially the three possible outcomes. And we started with this, this sample, and then we said, Okay, how are we going to implement this on AWS, using a service to us? And the the overall idea is that we have a list of services, and essentially we have a client that is implemented using containers, sending to SQS, messages, requesting loan broker, loans requests. And those loans requests go to this loan broker, other service, again implemented as a container. And that thing does primarily two things. One is, reaches out to the credit bureau. Lambda that is in the demo in the showcase, implemented using JavaScript, so to give more of a polyglot kind of example, and then when the lambda returns back the credit score, what the loan broker does, it sends an internal message to itself that is still a message on SQS saying, Okay, you can now kick off the loan broker policy, or this long running kind of workflow which is time based, right? So what the the workflow does? It publishes an event on SQS, telling to banks, essentially that the quote, a quote, was requested, and then sets a timer, or as we call it, a timeout, to for in the demo, for 30 seconds waiting for bank responses. That's to simulate one of the problems that you have in distributed systems, where, when publishing an event, you have no idea who the subscribers are. So you need to find a way to essentially wake up yourself after a certain amount of time to say, did anyone respond where I am? Now, what should I do next? And so it's it's using this and Service Bus feature that builds a timeout on top of SQS. And timeouts can be seen as a sort of delayed messages or a message for your future self. So what the loan broker is doing is sending a message to SQS, same to SQS. Can you give me this back in 30 seconds, and in that case, it uses the delayed feature of SQS that can delay messages up to 15 minutes. If the delay was longer than 15 minutes, then we have a mechanism to build that kind of longer delay still on SQS without requiring any additional service. So let's say that one of the banks response. Within those 30 seconds, time out, the loan broker picks up the response, still waits for the entire 30 seconds, no one has responded, and then sends back a message to the customer, or to be more precise, it publishes an event, saying, Okay, I got a response, and there are two subscribers. One is the original client, the one requested the loan, and the other one is an email sender, so someone that will notify the customer to via email, for example, and the email sender as a bug, so 5% of the time of the times it fails in publishing that email. And that's to visualize, to understand one of the retry capabilities of the platform, of the Service Bus platform. So if I, if I have a look at the if we have a look at the running containers, everything is now running on my machine, and it's running in everything is running in Docker. And for demo purposes, everything is running again against local stock, so that you don't need to have an AWS connection in order to be able to to run the showcase. But there are instructions on the demo. Read me on how to configure the entire the entire demo to run against proper AWS services. So if we. AWS services.
Brandon Minnick 40:02
Yeah, just to chime in, you can find all of this code at github.com/particular, labs, slash AWS loan broker sample, and we've dropped that link here in the comments so you can click in and follow along.
Mauro Servienti 40:18
Thanks. Good point the without diving for into the code for now, let me give you an overview of the visualization options that the particular platform provides. So in service, bus has native support for open telemetry. So the demo has been running on my machine for the last one hour and one hour and a half or so, and there's been consuming messages for 90 minutes. And if I go to this Grafana dashboard, I can see a few interesting information. Let me make it full screen for whoever is watching this
Brandon Minnick 40:56
resume in the text a little bit too. Yeah. Absolutely. So
hmm, beautiful, yeah,
Mauro Servienti 41:07
okay, so yes, and then this zoom is better. Okay, so they in. There are in the demo contains three main dashboards. One is, let's say, technical kind of dashboard. The other one is a business kind of dashboard, and the third one is a mix of two of them. We just four graphs to four demo purposes again, and what we we're essentially watching four different kind of metrics. So the first one is, quote, average processing time by bank. It's more of a business kind of metric where we're saying, how long are banks taking to process messages, to process our low requests. So as from beginning to end, every loan request, how long is it taking? And then we have more of a technical one on the top right corner, which is processing time by message type. So for every message type used by the entire distributed system in this showcase, what's the time it's taking to process? Where process means successful process, from when it's picked up from the queue to when it's acknowledged back in the queue, saying I got it and I processed it successfully. How long it's stay it's taking for each one of the messages to be processed? And we can clearly see an outlier over there, because one of the banks is misbehaving. So it's taking a lot, a lot of time to to process some of the messages. So there is a clearly outlier, and that outlier is also reflected in the bottom left corner graph, which is failure later by message type. So there's, there are messages never failing. The light, the light blue one is failing from time to time, but the yellow one is failing a lot, right? And it's all obviously, it's all fake. So those banks are simulating all those failures. And finally, we have an even more technical kind of metric that is the Fetched messages rate per service over five minutes, essentially telling us, do we need to scale out anything? So for example, the the top line, the one very high, is behaving very well, but the the bottom one, the orange one, is probably taking too much, right? So it's facing it's it's not fetching enough messages, so there's probably point in scaling it out to see if we can fetch more messages from the queue. But those are metrics, right? So those are interesting for operations people. There's a lot of things that can be deduced from those metrics, but there might be a different way of visualizing those information. And for example, Jaeger allows us to visualize open telemetry traces, and open parametric traces are essentially what the system is doing from a visual perspective. Let me pick one of these big ones so we can see a bunch of things. So for example, in this case, we're seeing that there's a client message that comes in. It goes to the loan broker, as we saw on the slides previously, the loan broker does a bunch of things, and then it reaches out to bank. And one of these banks, the bank free, is failing, right? So bank free is failing more than once, and that's clearly visualized by the the open telemetry trace in Jaeger here, obviously, would be the problem that if I go, if I were to scale out one of the banks, then I have different instances failing at different times. So, as we said before, this kind of visualizations are more lower level than the logical kind of view. So they are based on how the infrastructure is laid out in production, right? And that might not be ideal, as we saw, there are some failing messages in one of the banks, or there might be other failures here and there, because, for example, one of the things that is failing, as I said before, is the email sender, right? How can I visualize those failing messages? So I can use this tool called service pulse, that's part of the particular platform, and I can see here that they have 255 failed messages. Right? As I said, 5% of the messages are continuously failing in the table. So message failed messages are grouped by either failed message groups. So they, in this case, they are grouped by exception type, so they are all failing for the same exception. So there will be only one group, or I can go to all fake messages, and clicking on one of the messages, I can see the exception the entire stack trace. And if I want, I can retry the message here. So I can hit retry and I get a confirmation box. I don't retry it now because there's a chance that it fails again. So what I will do is I go here, click Select All and hit retry for this page, those messages will be retried. And if we're lucky, some of them will fail again, right? So some of these messages will fail again, and we'll see how service parts visualize those multiple failures when, when it happens, let them go through. In the meantime, I go back to the visualization, and we were not lucky, huh?
Brandon Minnick 46:53
Mark is so service pulse built into or it's another product offering from particular is that right? Or is it unique to this? Yeah,
Mauro Servienti 47:03
it's essentially. One thing I haven't said is that service busses, the particular platform, are an open source solution, so they are available on the source the source code is really on GitHub. But in order to run in production, you need a commercial license, and the commercial license gives you access to all the things I'm doing now, service pulse, service service insight that we'll see in a second, and service bus, are the tools available all in the same license. And as you can see here, for example, those free attempts failed again. So what service policy is telling us? Oh, I retried those messages, but they failed again. So the same message failed multiple times, and it allows you to understand how your system is behaving from a more higher level perspective, with more details related to the messaging infrastructure, which is harder to grasp using open diameter matrix and open dramatic traces, because those tools are more agnostic to the to the infrastructure using right so one other thing that service park provides is a kind of a higher level, kind of monitoring techniques or tools. So here we see our endpoints in the system. So we have the free banks, the client email sender and the loan broker, we see for the email sender, all those failures. And then we have those five graphs where we see the queue length, throughput schedule, retries, processing time and critical time. So going left to right, queue length is the how many messages are sitting in a queue waiting to be processed. And that's a clear sign that you need probably to scale out the loan broker, for example, because that there are a lot of messages queuing up in the queue, right? So, and then the throughput is how many messages are consumed per second. Scheduled retries is another interesting metrics, because it allows you to understand is, is my endpoint misbehaving somehow. So it how many times is my end point retrying messages and then maybe failing or succeeding, right? That's the schedule retries. As we can see, the only one failing is bank free adapter. From time to time it goes on, retrying messages. And what one could ask why the email sender is never retrying messages, because we configured the email sender to never retry so that they were going straight to the arrow field for again, for demo purposes, and then the processing time is very similar to the processing time we talked about before. So how, how much time is taking endpoint to consumer message? And the last one is an interesting one, because the critical time is the time it takes for a message from when it's delivered by the sender to when it's successfully processed by the receiver. So it includes also the infrastructure time. So it's telling you if it is too high, something might be. Off at the infrastructure level, which might not be the case for cloud environments, because the the cloud vendor is taking care of the scaling thing. But it much might be more interesting from your on prem kind of installation that it's telling you, yeah, your RabbitMQ cluster is not keeping up. So you might want to have a look into it and try to understand, or if you have something like, let's say if your queuing system has some quotas set, right so that might take up some critical some delivery time, because the queuing system delays your delivered messages to keep up with the quota and not throttle you and stuff like that. Interestingly, what we can do is click on one of these endpoints and dive into more details about it. So we see the those three graphs, but then we see the same graphs per message type or per instance type. So if these endpoints were deployed in multiple instances. We could see, for every instance deployed all how all those instances are behaving in the system. But again,
Brandon Minnick 51:15
if we go back to the screen with the multiple columns of charts on it. I just want to say what I love about this, and it's, it's almost silly, how, how much this helps. Because as software developers, we can, we can dig through logs, we can try to find out, like, Okay, what's really going on here? And you know, you get these reports back from your your customers, that's like, I never got an email and or this never happened, and you're trying to, like, put the pieces together, but by essentially looking at, like, one very small detail at a time, whereas, yes, what we're looking at here, and I'll try to describe it for the folks listening on the podcast, is we have multiple columns, multiple rows. Each column shows a specific point. So like the queue, length, throughput, scheduled retries, and literally, just at a glance, because we're looking at all the different brokers, I can see anomalies. So like in this first column, the last row at the bottom as way more well, has a longer queue, because this is our row for queue length. So, like, that's interesting. Like, something's up here, and like, in the schedule retries column, there's only one that keeps, well, I guess there's a little blip one time another row, but there's one that consistently has to retry. It's like, Ah, what's going on there? And so what I love about these visualizations is kind of like we're saying earlier, where you don't have to necessarily understand everything about your infrastructure and how to build it and how to configure it properly, because these tools allow us, like even me, the mobile software engineer who's not the back end expert to go, oh, well, we got a problem with our queue length. Here it looks like this is retrying all the times. There's probably a failure there somewhere. And it gives you that starting point where to dive in and fix these problems, where otherwise you're getting tickets from support saying customer didn't get their email, or customers transaction and go through and you're trying to, like, why would that happen? But with these, with these awesome charts here in service pulse, I could just see it. I would love to have this just on my wall, like a video screen, immediately I can see what's going on in my system.
Salih Guler 53:38
Do we have the more kind of website that people can see this for their podcast listeners, especially like, if we can share a link for them, they can just drop in.
Mauro Servienti 53:51
Not yet. What I could be doing is deploy this on AWS and it's running on the blast, and then share the the public URL for the for the running demo, for those free services essentially that are available,
Salih Guler 54:09
yeah, I think that would be really good for the folks who listens later on,
Mauro Servienti 54:15
Yep, yeah, absolutely. Or they can download the demo and run it by themselves and see that they're running on it on that machine. One of the duties of the demo is that it it only requires a.net and Docker installed because it's all Docker based. So essentially, once you have.net and Docker installed, you go to the common prompt and then Docker compose up, and everything is configured on your machine and starts up on your machine without you having to install and configure anything other than Docker on your machine, but Brandon your comment was, was on point, right? So especially when you have a distributed system, it's very hard to connect the dots and understand what's going on. So especially. If you start from the narrower, possible, narrow as possible, you, let's say, a log, right? You have no idea what, what was the context of that log? Because most of the time the constant is the context is distributed across across multiple nodes. So you have to understand, okay, I received the message, and I failed. But why I received? Why did I receive that message, where was it coming from, and all those things. So it's probably better to go from a broader to another web context, so to diving into those things, and the service pulse allows you to do that from the monitoring and matrix perspective. But as I said before, one of the functionalities that service bus provides is auditing, and there are a couple of interesting things when it comes to auditing that we can build visualizations on top of auditing. So because messages are connected to each other through two information that are called, one is called correlation ID, and the other one is called conversation ID. So what we could be doing is build a visualization like this one. So what we're seeing now is a sequence diagram, and it's a cycle sequence diagram deduced from the audited messages of the running demo. So what's happening is that service control, as I said before, is consuming messages from the audit queue and is building this nice sequence diagram, telling me, oh, a message has been sent from the client to loan broker. The loan broker set time out for itself and then send the message to itself, and then published a message to banks. The quote requested one, and banks responded, and the quote was sent back to the client and whatnot, right? But another interesting view that we could be building is, okay, the sequence diagram. Diagram is an overview of the system from the logical perspective. We don't see any endpoint instances. We don't see infrastructure here. We don't see containers and container instances or cooperative spots, nothing like that. It's just a logical view, and we can view the same thing from the workflow perspective. So as we said, in service plus sagas an implementation of distributed workflows. So what about we put in the center of our visualization the workflow? Is that itself, so the state machine, and then we see triggers coming in. On the left, there are messages triggering the workflow, and messages on the right are the published event that are going to the rest of the world. And it's a little bit small in this in this tool right now, but we can see, for every message that comes in all the state changes that happened in the workflow. So the state machine, state changed, there were properties and what was the previous state, and the next state after the message has been successfully handled by by the workflow. And that's the that provides you both the best of both worlds. So you can see all the metrics and all the numbers from a high level or low level perspective, and all those visualizations from a lower level kind of perspective in Jaeger, for example, or X ray, because the demo already supports X ray, so it's just a matter of having the correct environment variables defined so that it connects to X ray and then, or those kind of more higher leverage, kind of visualizations where you don't really care about what your infrastructure looks like, you want more of a high level architectural documentation of the system.
Brandon Minnick 58:39
Yeah, I love it. Maru, we only have about a minute left. Uh, somehow we're we're almost out of time again. But, uh, I appreciate you so much for coming on the show, and would love to have you back again to keep diving into this. Like I said, this is so cool, because for lazy software developers like me who don't want all those details. And you know, I just got to fix this ticket. Close this ticket, fix this bug. Tools like these are super, super useful, and I can absolutely see how they'll save hours of my life, but borrow in the meantime, before we see again on the.net show, hopefully, where can folks find you online? Well,
Mauro Servienti 59:21
they can find me on Twitter or LinkedIn. My twitter profile is my full name without spaces, so x.com/or, in a single word, and LinkedIn is goes by by So, and that's those down into my online presence, presences on those two social networks.
Brandon Minnick 59:45
Fantastic. Well, thanks again, Mara for joining us, and thank you for listening, for hanging out with us on the live stream. We live stream every episode of the.net on AWS show every two weeks, so we'll be back in your feed and. In two Mondays. So don't forget to subscribe to the AWS channel on Twitch, download, subscribe to the podcast and your favorite podcast feed, and we'll see you again in two weeks. You
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.