Reduce Stress and Get Your Fridays Back with Observability and OpenTelemetry
An interview and video from AWS Hero Liz Fong-Jones on why observability is a developer's best friend to debug systems, efficiently resolve issues and focus on innovation.
Mark Pergola
Amazon Employee
Published Nov 7, 2023
Last Modified Dec 7, 2023
AWS Hero Liz Fong-Jones is blazing many trails, especially in the field of observability. Liz is a developer advocate, labor and ethics organizer, and site reliability engineer (SRE) with more than 18 years in the field. This unique combination of experiences makes her a force to be reckoned with when it comes to advocating for industry change on a global level.
Liz was also a featured speaker at AWS re:Invent 2023 with her talk Seamless Observability with AWS Distro for OpenTelemetry, featured below. In this brief interview, Liz explains why observability is critical for maintaining reliable, resilient systems, especially in the era of generative AI.
You spoke at re:Invent on OpenTelemetry and observability. Who needs to hear this and what will you help them to do?
“People who are curious about how to best debug their production systems using open standards. I hope people will give OpenTelemetry a try, and if they're already a little familiar, I hope they'll get involved in the user community.
“People who are curious about how to best debug their production systems using open standards. I hope people will give OpenTelemetry a try, and if they're already a little familiar, I hope they'll get involved in the user community.
Open standards like OpenTelemetry help developer teams address the complexities of instrumentation, while also giving them the freedom to decide how to store and query the produced telemetry data. It's flexible, so teams can switch backends without a massive code overhaul, enabling them to collect and send telemetry to their preferred backend systems, so they can avoid vendor limitations and lock-in.
Debugging systems with open standards enables interoperability, allowing diverse system components to seamlessly harmonize and function together. Further, the accessibility and collaborative foundation of open standards makes it easier for developers, engineers, and admins to understand and work with their systems, which is critical for identifying and resolving issues quickly.“
What does someone need to understand before seeing your talk? Are there community content resources that would help them prepare?
“Attendees will likely benefit if they have at least some exposure to the idea that they should be responsible for running and debugging their software in production instead of leaving it to someone else. Even though it is from 2018, I highly recommend this piece by Charity Majors, CTO of honeycomb.io, on testing in production.
“Attendees will likely benefit if they have at least some exposure to the idea that they should be responsible for running and debugging their software in production instead of leaving it to someone else. Even though it is from 2018, I highly recommend this piece by Charity Majors, CTO of honeycomb.io, on testing in production.
OpenTelemetry focuses on making things simple, understandable, and enjoyable for users. Honeycomb is all-in on OpenTelemetry, and we’re continually working hard to make it even more user-friendly for developers, SREs, and DevOps teams. This project is gaining momentum fast, with more widespread adoption and integration, and we see it becoming a key part of the software world that's here to stay.
For developers, OpenTelemetry can boost their efficiency and save them precious time. With OpenTelemetry, developers can more easily integrate monitoring and tracing into their applications. Streamlining the development and debugging process allows developers to focus on work that drives innovation.“
What do you wish you could've covered in the talk, but couldn't because you only had an hour?
“Demoing everything end-to-end about how to add attributes and custom spans all the way to querying it in an observability backend.”
“Demoing everything end-to-end about how to add attributes and custom spans all the way to querying it in an observability backend.”
What's one question you wish someone would ask you about this topic?
“How do I gain sufficient confidence in my systems and their observability to deploy on Fridays?”
“How do I gain sufficient confidence in my systems and their observability to deploy on Fridays?”
How did you become an expert in this area, and why is it an area you are passionate about?
“As one of the first educators around site reliability engineering, I frequently encountered people who were excited about service level objectives (SLOs) but were worried about how they'd debug a top-level user impact statistic all the way down. I realised quickly that observability was a way to help people debug any problem rather than only those they knew to monitor at lower level.”
“As one of the first educators around site reliability engineering, I frequently encountered people who were excited about service level objectives (SLOs) but were worried about how they'd debug a top-level user impact statistic all the way down. I realised quickly that observability was a way to help people debug any problem rather than only those they knew to monitor at lower level.”
How are you seeing GenAI impacting this topic and area of the cloud?
“Oh man, that's a separate talk, but GenAI is proliferating the amount of code that's created without necessarily helping us understand the created code. GenAI systems need observability, and we need to invest in the ability to understand code generated with GenAI or else we'll have systems that have been produced for us that no one knows how to debug.
“Oh man, that's a separate talk, but GenAI is proliferating the amount of code that's created without necessarily helping us understand the created code. GenAI systems need observability, and we need to invest in the ability to understand code generated with GenAI or else we'll have systems that have been produced for us that no one knows how to debug.
Let's suppose you get paged at 3am, for errors coming from a system component that didn't exist 3 months ago, but was created mostly with the help of generative AI. The engineer who built it doesn't know how it works, because all they did was feed a specification in and cut-paste the emitted code into a pull request and do some basic functional testing. It turns out there was an edge case not encapsulated in the original requirements.
Generative AI does enable developers to create more code faster, but the onus is still on humans, at least for now, to maintain and debug the code. What happens if there's a bug in the generated code, and no one, not even the engineer who commissioned the AI to generate the code, understands it? How can we figure out where the bugs are and how to solve them, unless we scale up our understanding and observability into the code?“
About Liz
Liz is currently the Field CTO at Honeycomb, and previously was a site reliability engineer working on products ranging from the Google Cloud Load Balancer to Google Flights. She is also the co-author of the book Observability Engineering. Liz shares her time between Vancouver, BC and Sydney, NSW with her wife Elly, partners, and a Samoyed/Golden Retriever mix. As an AWS Hero, Liz is part of a global community of technology leaders who are enthusiastic about knowledge-sharing, mentorship, and pioneering technological innovation. Connect with Liz and her content at linkedin.com/in/efong, cohost.org/lizthegrey, lizthegrey.com
Liz is currently the Field CTO at Honeycomb, and previously was a site reliability engineer working on products ranging from the Google Cloud Load Balancer to Google Flights. She is also the co-author of the book Observability Engineering. Liz shares her time between Vancouver, BC and Sydney, NSW with her wife Elly, partners, and a Samoyed/Golden Retriever mix. As an AWS Hero, Liz is part of a global community of technology leaders who are enthusiastic about knowledge-sharing, mentorship, and pioneering technological innovation. Connect with Liz and her content at linkedin.com/in/efong, cohost.org/lizthegrey, lizthegrey.com
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.