{"id":5470,"date":"2021-12-21T18:58:00","date_gmt":"2021-12-21T18:58:00","guid":{"rendered":"http:\/\/www.sumologic.com\/blog\/database-monitoring-with-sumo-logic-and-opentelemetry-powered-distributed-tracing"},"modified":"2025-05-08T19:11:32","modified_gmt":"2025-05-09T03:11:32","slug":"database-monitoring-with-sumo-logic-and-opentelemetry-powered-distributed-tracing","status":"publish","type":"blog","link":"https:\/\/www.sumologic.com\/blog\/database-monitoring-with-sumo-logic-and-opentelemetry-powered-distributed-tracing","title":{"rendered":"Database monitoring with Sumo Logic and OpenTelemetry-powered distributed tracing"},"content":{"rendered":"\n<section class=\"e-stn e-stn-fc090504d8c495877f2d795d392908e318d4e1f9 e-stn--glossary-inner-content e-stn--table-of-content\"><div class=\"container\">\n<div class=\"wp-block-b3rg-row e-row row\">\n<div class=\"wp-block-b3rg-column e-col e-col-5b1830f2290e10551cf9d734cb4bbf0c927d83ba e-col--content-wrapper  col-sm-12 col-lg-12 col-xl-12\">\n<div class=\"e-div e-div-5d2e7848f6f43526612539018ddf29050777aacf e-div--card-btn-link\">\n<h2 class=\"wp-block-heading has-eigengrau-color has-text-color has-link-color wp-elements-bdc95819a72d70d8040b521f0f057369\" id=\"importance_of_database_monitoring\">Importance of database monitoring<\/h2>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-8b4bfe04624437f00b31f6c640118fca\">We are living in a data world. Data describes and controls almost every aspect of our life, from the president&#8217;s elections to everyday grocery shopping. Data grows exponentially and so does the complexity of applications that manage that data. We all know the recent shift to microservices and other revolutionary changes that happened in the way we design, develop, deploy and operate modern applications. <\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-877675ec9ab9bce7d62ae4cc51001ead\">There\u2019s one thing that didn&#8217;t change much though \u2014 at the end of the data processing pipeline, there\u2019s always some kind of data warehouse. Obviously, the technology becomes more and more sophisticated and innovative, especially for big-data, NoSQL databases, but the main principle stays the same: the more data the backend can serve in a shorter time the better. And one more: if the database fails, it is a serious, most likely difficult to recover problem and most likely not without long-term consequences, often including data loss. Making sure our data vaults are healthy and operational has never been more important.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-eigengrau-color has-text-color has-link-color wp-elements-d435a6506c989f345ca174bfcf0ad70c\" id=\"monitoring_database_infrastructure\">Monitoring database infrastructure<\/h2>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-7d558789cf9b223fea2e3a8749b4f227\">Sumo Logic already has a lot of existing logs and metrics-based applications that help you get kick-started with database monitoring. If you know your technology stack and you can gather logs and\/or metrics and send them to Sumo, you can start right away with very valuable insights gathered directly from database internal observability signals.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-04257252d7a1140465c83377e1d0ad40\">You can gain visibility into the health, performance, and behavior of our databases with KPIs like failed logins, slow queries, connections or deadlocks for technologies like PostgreSQL, MongoDB, Microsoft SQL, and others.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-fb6cf652d98c84085bcf8bcb74880db2\">If you are using a hosted database service like AWS RDS for example, also data sources like CloudWatch are an interesting way to get more insights into the performance of your cloud database infrastructure. <strong>Sumo Logic makes it easier for you by aggregating data from multiple accounts and namespaces to have all your database health in one place:<\/strong><\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-68ae5aa464d2b012c7c226cb3d6a558e\">These are great insights for database admins and an invaluable source of information, but in this article, I wanted to expand the context a little bit.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-2512fc587b6dfed0e72106f4b54822e9\">Every database is part of a wider application infrastructure, often a shared part, and almost always is just a final, very important, but only one step of multi-tiered transactions executed by users.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-3bbc93737d7bf622f5707d3cb15afdf2\"><strong>Understanding database health and performance from the context of user transactions and overall application performance puts a new light on the ability to observe the system\u2019s internal state, but it is also not a straightforward task to complete.<\/strong><\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-a6fa4cf5808616fa05c3eb5cba0832f2\">Fortunately distributed tracing data comes to help.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-eigengrau-color has-text-color has-link-color wp-elements-7d1868b937260de8739062d7e4aa04fb\" id=\"database_from_the_opentelemetry_tracing_client_point_of_view\">Database from the OpenTelemetry tracing client point of view<\/h2>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-dd25c4447ffbd0fdbbfea318b6adbfd7\">Now you may ask &#8211; what do you mean tracing for a database? I\u2019m not going to risk my precious database and install any 3rd party libraries on it! Besides, you may not even be able to if (as mentioned above) you are using a hosted service for your database infrastructure, right?<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-43e083eceb8f41e467419573eba78727\">Nothing to worry about \u2013- application monitoring instrumentation is prepared to handle database observability completely from the client-side. At last \u2014 that\u2019s what the main purpose of the database is \u2014 to quickly serve client requests and queries. If the client is happy \u2013- the database does a decent job.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-f5a9c643adb26f40330ded199b719c5c\">As you may know already, <a href=\"https:\/\/www.sumologic.com\/blog\/distributed-tracing\/\">application monitoring is currently trending towards OpenTelemetry<\/a> and the project, although only two years old, has already a lot to offer in measuring how the performance of databases looks from the client point of view. Here\u2019s an example of <a href=\"https:\/\/github.com\/open-telemetry\/opentelemetry-java-instrumentation\/blob\/main\/docs\/supported-libraries.md\" target=\"_blank\" rel=\"noopener\">supported technologies (clients) for Java OT auto-instrumentation<\/a>, that don\u2019t require any configuration or coding work to start getting very interesting insights into database health right away, with the granularity of a single query. That includes specific technologies like Cassandra or MongoDB but also more generic database drivers like JDBC. Java here is just a vanilla example, other languages like JS\/Node, Python, .NET, GO, etc. have an equivalent range of supported clients.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-d82720beaf7f7bca7fe6502f8a68f634\"><strong>So, what can we get out of it? Let\u2019s take a look at a few examples from Sumo Logic out-of-the-box dashboards and views.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading has-eigengrau-color has-text-color has-link-color wp-elements-9499f6c4df07102832c96f0b6c7bd198\" id=\"high-level_monitoring_and_alerting\">High-level monitoring and alerting<\/h2>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-ae05d07a9440297c862bace6b265caa5\">Let\u2019s start with some high-level use cases: just show me my databases, what apps do they belong to, what other services do they talk to and support.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-499e5675f1eef35abbb32a7814875513\">We can see above that our application consists of several services like accounts, payments, transactions but also a MySQL database (called mysql_mobile_banking). Another panel is also showing us the dependencies of this database &#8211; what other services are calling it and what is their health:<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-470812c38e0faf4b0e43cfc37b9534c0\">We also have automatically tracked four of the most important operations (groups of queries). These are tracked automatically by the client instrumentation and the Sumo Logic backend automatically calculates KPIs for the most used queries. KPIs like requests, latency, and errors can help you quickly understand the current and historical performance and health of the database as a whole and per individual query group. As you can see dynamic parts of queries are automatically masked\/tokenized. No configuration is required.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-b4193a5b14596643ab803b13f988bfe3\">If you are interested in learning more about these dashboards and Service Maps \u2014 check out <a href=\"https:\/\/www.sumologic.com\/blog\/service-map-dashboards\/\">one of my previous blogs discussing related use cases<\/a>.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-d49ab0bfeee6ff4a8d2206d69552afbc\">Each of these out-of-the-box panels is fully customizable with metrics query driving them, giving you full access to modify the query or change visualizations if that better serves your use case.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-609c2ee40b07cdddac3f0340854d89ba\">If you are interested in getting proactive notification about spikes or threshold breaches, you can add a monitor right from the window above. This lands you in the \u201cNew Monitor\u201d configuration screen below, where you can, for example, set up an anomaly-based rule that will generate an alert and send a notification to the chosen alerting integration channel when any sudden spike occurs.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-28636129bd8338499aa112e1ae665a4a\">If you are interested in more detailed information, for example, KPIs on the operation level \u2013- you can also drill down to query group-specific dashboards where you can track, observe and set up monitors for KPIs on this very granular level. Below, for example, we can see that our selected operation has quite a long latency of over 95 seconds, which is not different than usual (compared to last week) and also has some occasional errors not exceeding 3% on average.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-b7482a70ab92ba1d5619bd7bd6f9069a\">So, what do we do to learn more about it? How can we investigate example transactions where this query was so long or the ones where it had errors? What if we got an alert about a sudden spike and we want to investigate?<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-12c77c48be48e42319d3e1acbffd545e\">In Sumo Logic dashboards, you can always click on the chart and get the entities panel that not only recognizes a related entity automatically (here the database service and its selected operation) but also provides contextual drill downs to logs and traces for this entity. Let\u2019s click \u201cTraces\u201d to learn more.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-eigengrau-color has-text-color has-link-color wp-elements-f1ed60696499eb741e69da5590ea12bb\" id=\"drill_down_to_a_single_query_level\">Drill down to a single query level<\/h2>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-9cd76d9fc1b57cd4cc1bcaaabebb8142\">We landed on a traces list that shows us end-to-end user transactions that have been using previously analyzed query operation to complete. We already see that their load times (\u201cDuration\u201d column) are not so quick, to say the least, and the majority of the time is spent in the database service (pink mysql_mobile_banking).<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-9768fa8d1f5ba225512430eed64e3637\">The database is at fault here and has a lot of room for optimization. If we are interested in an even more detailed and granular view, we can drill down to any of these transaction traces, to see precisely what the load sequence looked like during the execution of the transaction.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-70671cc0ddad13fc5927bf442ad09b62\">Sometimes here we can learn things like non-optimal, sequential database calls in large numbers which always are subject to optimization, but here actually are just 3 queries from accounts service (dark brown) to our database taking a lot of time (over 32 seconds for highlighted query). Something for the database developer to look at, if we are expecting these transactions to run faster, although, to be fair, there\u2019s also a considerable processing time inside olive-colored payments-service too (middle of the chart), contributing to 30% of the total end-to-end transaction time.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-c8821609c3ddc6887b639eed1f0bf01a\">An interesting thing about OpenTelemetry spans, which carry the information about client calls like database queries, is the metadata that is automatically added to them to tell the wider context. Besides a full database statement, we can also learn about the database connection, name, system, and user that has executed this query. This is also a good place to insert your custom metadata &#8211; anything that can help you troubleshoot this faster.<\/p>\n\n\n<div class=\"e-img \">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1054\" height=\"484\" src=\"http:\/\/www.sumologic.com\/wp-content\/uploads\/13-Drill-down-to-a-single-query-level.png\" alt=\"13-Drill down to a single query level\" class=\"wp-image-5469\" title=\"\"><\/figure>\n<\/div>\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-d1a4b69a4c7cca968962aabddf1c4eb1\">It can be any information about the user type, customer profile, transaction context, even a dollar value if that\u2019s important. You can leverage it not only here, in the detailed view, but also during aggregated ad-hoc analysis that we are going to cover next<strong>.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading has-eigengrau-color has-text-color has-link-color wp-elements-a4dacd13d76c9eb00dfdbdcf52c08c08\" id=\"ad-hoc_analysis_of_raw_data_via_span_analytics\">Ad-hoc analysis of raw data via Span Analytics<\/h2>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-d863e1c61d63cf2691f039020ac8032c\">Many classic APM tools don&#8217;t give you full access to raw data. It\u2019s just too much to handle for their backends. Sumo Logic is different \u2014 we are a true cloud big data platform and can allow you to analyze all your data in full fidelity, regardless of cardinality, with full details including custom metadata, without any need to pre-configure the schema.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-b2c0371795a5baf10f3c2c089a27899c\">This can be very useful when you are looking for the needle in the haystack, but also helps you to perform any kind of custom analytics on top of that data.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-ba037233d4b62213d5ca370e6faf9fdb\">Aren&#8217;t out-of-the-box KPIs good enough? Create your own. You can directly use Sumo Query Language for this, or a new Span Analytics interface that <a href=\"https:\/\/www.sumologic.com\/blog\/queryless-span-analytics\/\">makes it easy to achieve even for novice users<\/a>.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-0be47fbaa0b3d3be2a56f505562b0492\">Let\u2019s say that we want to analyze 98 percentile of duration, but only for database queries that have \u201c<em>price<\/em>\u201d text in them and have finished successfully (statuscode=OK). We want to visualize this in the form of a time series of 1minute granularity, individually per each database statement string. Here\u2019s how such analysis works in the Span Analytics (\u201cSpans\u201d) interface:<br><\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-243f9921916547de5b340083befe6875\">You can access this UI directly from the \u201c+New\u201d menu or via a drill down navigation and \u201cOpen in\u201d button by selecting the \u201cSpans\u201d link.<\/p>\n\n\n\n<p class=\"has-delft-blue-color has-text-color has-link-color wp-elements-1ccfb2cb0a576f65da389f9dd46903c3\">As we can see even without direct instrumentation of the database code itself we can get a ton of useful details about the performance and health of databases, just by looking from the perspective of their clients. This user and transaction-centric view, combined with additional metrics and logs coming directly from your database instances provide you a full set of visibility to observe, monitor, and troubleshoot your precious components of application infrastructure \u2014 the home of your valuable data.<br><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div><\/section>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":252,"featured_media":0,"template":"","meta":{"_acf_changed":false,"show_custom_date":false,"custom_date":"","featured":false,"featured_image":0,"learn_more_label":"","image_alt_text":"","learn_more_type":"","show_popup":false,"learn_more_link_file":0,"event_date":false,"event_start_date":"","event_end_date":"","place_holder_image_url":"","post_reading_time":"6","notification_enabled":false,"notification_text":"","notification_logo":"","notification_expiration_time":0,"is_enable_transparent_header":false,"selected_taxonomy_terms":{"blog-category":[128,141],"blog-tag":[]},"selected_primary_terms":[],"learn_more_link":[],"featured_page_list":[],"notification_enabled_post_list":[],"_gspb_post_css":"","_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"4668,71369,71176","_relevanssi_noindex_reason":"","inline_featured_image":false,"footnotes":""},"blog-category":[128,141],"blog-tag":[],"class_list":["post-5470","blog","type-blog","status-publish","hentry","blog-category-application-observability","blog-category-opentelemetry"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.sumologic.com\/wp-json\/wp\/v2\/blog\/5470","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sumologic.com\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.sumologic.com\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.sumologic.com\/wp-json\/wp\/v2\/users\/252"}],"version-history":[{"count":2,"href":"https:\/\/www.sumologic.com\/wp-json\/wp\/v2\/blog\/5470\/revisions"}],"predecessor-version":[{"id":17016,"href":"https:\/\/www.sumologic.com\/wp-json\/wp\/v2\/blog\/5470\/revisions\/17016"}],"wp:attachment":[{"href":"https:\/\/www.sumologic.com\/wp-json\/wp\/v2\/media?parent=5470"}],"wp:term":[{"taxonomy":"blog-category","embeddable":true,"href":"https:\/\/www.sumologic.com\/wp-json\/wp\/v2\/blog-category?post=5470"},{"taxonomy":"blog-tag","embeddable":true,"href":"https:\/\/www.sumologic.com\/wp-json\/wp\/v2\/blog-tag?post=5470"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}