In today’s fast-paced technology landscape, ensuring optimal user experience and efficient resource utilization is crucial. Even minor performance bottlenecks can significantly impact user experience and business success. With the rising popularity of ARM architectures like AWS Graviton, benchmarking applications across different architectures is essential for making informed decisions. Comprehensive benchmarking helps evaluate the suitability and potential benefits of ARM-based instances for specific workloads. This blog post demonstrates how to leverage AWS managed open-source services like Amazon Managed Service for Prometheus, Amazon Managed Grafana, and open source wrk tool to benchmark Java applications across x86 and ARM architectures. By comparing performance metrics, you can make data-driven decisions, identify bottlenecks, and optimize for better user experience and resource efficiency.
This post provides a step-by-step guide to setting up a benchmarking environment using open- source observability tools like Amazon Managed Service for Prometheus, Amazon Managed Grafana, and the open source wrk HTTP benchmarking tool. The process begins with creating an AmazonElastic Kubernetes Service (EKS) cluster with node groups for both Intel (x86) and Graviton (ARM) instances. A containerized Spring Boot Java application is deployed to the EKS cluster, running on both x86 and ARM-based instances. But feel free to use the application of your choice as well here. To monitor the performance of the application, the Node Exporter is deployed to collect system-level metrics, and the AWS Distro for OpenTelemetry Collector (ADOT) is configured to scrape these metrics and send them to Amazon Managed Service for Prometheus. An Amazon Managed Grafana workspace is then set up and connected to the Prometheus workspace to visualize the collected metrics. The benchmarking process itself is carried out using the wrk HTTP benchmarking tool, which is deployed as a Kubernetes job. The wrk tool is configured to send HTTP requests to the Java application running on both x86 and ARM instances, simulating a CPU-intensive workload. The latency and requests per second (RPS) metrics are captured and compared across the two architectures. Also since this post is focused on Java applications, Java Flight Recorder (JFR) and JDK Mission Control (JMC) are used for managing, monitoring, and profiling the Java application purposes.
You can clone the following repository from GitHub for the various artifacts required to benchmark Java applications using open-source observability tools. This repository provides a comprehensive set of files and configurations, including Kubernetes manifests for deploying the Java application on both Intel (x86) and Graviton (ARM) instances within an Amazon EKS cluster, monitoring components like Node Exporter and OpenTelemetry (OTEL) Collector for collecting system-level metrics and sending them to AWS Managed Prometheus.
To begin the benchmarking process, you need to create an Amazon EKS cluster. The provided repository includes a YAML file named eks-cluster-config.yaml, which you can use to create the cluster with the following command:
Once the cluster creation is complete, you’ll notice that it has two node groups: one for Intel (x86) instances and another for Graviton (ARM) instances. You can verify this by running the following command:
As you can see, the m6i-ng node group is for Intel (x86) instances, while the m7g-ng node group is for Graviton (ARM) instances. Having these two separate node groups allows you to deploy and benchmark your Java application on both architectures simultaneously within the same EKS cluster.
Since the EKS cluster infrastructure is in place, lets try to create a simple micro service. A Spring Boot microservice was created with a REST endpoint performing CPU-intensive calculations on random integers to simulate a heavy workload. This application serves as a benchmark for evaluating performance under load.You can build the app using the below command.
And if you want to run the application locally, you could do so with the below command.
To prepare the Java application for deployment, we need to containerize it and push the container image to the Elastic Container Registry (ECR). Follow these steps.
First, create a new Docker buildx instance and list the available instances:
Build the container image for both AMD64 (x86_64) and ARM64 architectures:
Enable the experimental CLI features and push the multi-arch container image to the public.
To verify the multi-arch build, you can inspect the container image using the following command:
This will display information about the container image, including the supported architectures:
As you can see, the build supports both AMD64 (x86_64) and ARM64 architectures, allowing you to deploy the Java application on both Intel and Graviton instances within the EKS cluster.
With the EKS cluster and the containerized Java application ready, we can proceed to deploy the application to both AMD (x86) and Graviton (ARM) instances within the cluster. Let’s start with the deployment targeting the Intel m6i instances:
This command will deploy the Java application to the Intel node group and expose it as a Kubernetes service. Next, we’ll deploy the Java application to the Graviton m7g instances using the same approach:
At this point, the Java application is successfully deployed and running on both Intel (x86) and Graviton (ARM) instances within the EKS cluster, ready for benchmarking and performance evaluation.
Before benchmarking the Java application across x86 and ARM architectures in the EKS cluster, we’ll deploy the Prometheus Node Exporter. Node Exporter is a lightweight tool that collects and exposes system metrics like CPU, memory, disk I/O, and network traffic via an HTTP endpoint for Prometheus to scrape. This enables comprehensive monitoring, troubleshooting, and performance analysis of the host systems running the application during benchmarking. To set up the monitoring and observability components, we need to create an Amazon Managed Service for Prometheus workspace and an Amazon Managed Grafana workspace. Follow these steps:
Once the Amazon Managed Service for Prometheus workspace is created, deploy the Node Exporter to collect system-level metrics:
Deploy the OpenTelemetry (OTEL) Collector configuration to scrape metrics from Node Exporter and send them to the Amazon Managed Service for Prometheus workspace:
To create an Amazon Managed Grafana workspace, you first need to create an IAM role. This role will be assigned to the workspace and used to access AWS data sources:
Next, create the Amazon Managed Grafana workspace configured to use SSO for authentication and the IAM role created above:
Once the Amazon Managed Grafana workspace is created, you can add users and log in using AWS IAM Identity Center (formerly AWS SSO) or SAML. After logging in, navigate to Apps → AWS Managed Apps and configure the AMP workspace created earlier.
With these steps, you have set up the monitoring and observability components, including Amazon Managed Service for Prometheus for metric collection, Node Exporter for system-level metrics, OTEL Collector with ADOT for scraping and sending metrics, and Amazon Managed Grafana for visualizing the collected metrics using a pre-configured dashboard.
With the Java application deployed on both x86 and ARM instances in EKS, and monitoring set up with Node Exporter, it’s time to benchmark the application across architectures. For this, the wrk HTTP benchmarking tool is used. I t can generate accurate mixtures of HTTP connections, payloads, and request types beyond simple GETs, making it suitable for benchmarking diverse workloads. Java Flight Recorder (JFR) bundled with the JDK is utilized. JFR profiles low-level runtime events, helping diagnose performance issues, memory leaks, and other problems in Java applications. The recorded JFR data is visualized using JDK Mission Control for deeper insights into the application’s behavior during benchmarking across architectures.
This code defines a Kubernetes container named “wrk” that runs the wrk HTTP benchmarking tool against a specified service URL. It uses the ruslanys/wrk:ubuntu image, sets 32 threads (- t32), keeps 1024 HTTP connections open (-c1024), runs the benchmark for 5 minutes (300 seconds) (-d300s), and benchmarks the service URL. This container will execute a 5-minute HTTP load test on the given URL, simulating 1024 concurrent connections using 32 threads. Typical output includes metrics like latency (e.g., 121ms) and requests per second (e.g., 359 RPS), providing insights into the service’s performance under load.
While running the wrk HTTP benchmarking test, you can monitor the Node Exporter dashboard in Grafana to observe the CPU load on the system. The key metric to focus on is the “sysload” or system load average, which represents the demand on the CPU resources by running or waiting processes. The dashboard shows that during the benchmark, the 15-minute system load average reached 264%, indicating a significantly high CPU utilization on the system.
Figure 4: Amazon Managed Grafana dashboard showing different metrics for x86 instance
Let’s visualize the JFR results using the JDK Mission Control. As you could see below. lets focus on the lock instances. In JDK Mission Control, the “Lock Instances” view displays information about lock contention and blocking events that occur within a Java application. Locks are a fundamental mechanism in Java for managing concurrent access to shared resources, and lock contention can lead to performance issues and synchronization problems. The Total Blocked Time for java.util.TaskQueue and java.lang.Object are at 664ms and 501ms respectively.
This works the same way as above but this time the final argument http://corretojdk-svc- graviton.default.svc.cluster.local:8080 is the ARM URL that wrk will benchmark. So this container will run a 5 minute HTTP benchmarking test against that service URL, using 32 threads and keeping 1024 HTTP connections open concurrently. The typical output will be as below for example with the latency of 92ms and RPS of 573.
Figure 6: Amazon Managed Grafana dashboard showing different metrics for ARM instance
From the Node exporter dashboard, now the sys load 15 min average here at 192% which is less than the results coming from x86.
Here in this case, using the JDK Mission Control, as you could see the Total Blocked Time for java.util.TaskQueue and java.lang.Object are at 148ms and 64ms respectively. The same way the distinct thread and total count also reduced compared to the earlier x86 results.
While the sample output shows better latency, requests per second (RPS) metrics, better sysload average from Grafana dashboard and reduced lock instances from JDK Mission Control, for the ARM architecture compared to x86, it’s important to note that these results are specific to the workload used in this particular benchmarking scenario. Benchmark results can vary significantly depending on the nature of the workload, the application’s characteristics, and the underlying hardware and software configurations. This sample app highlights the potential performance of ARM processors, but it should not be considered a definitive or universal comparison. Different types of workloads, such as those involving heavy computational loads, memory-intensive operations, or I/O-bound tasks, may yield different results. It’s crucial to evaluate performance on a case-by-case basis, taking into account the specific requirements and characteristics of the application being benchmarked. The key takeaway is that while ARM architectures have shown promising performance in certain scenarios, their suitability for a given workload should be carefully assessed through comprehensive benchmarking and testing. It’s important to note that this experiment does not aim to prove the superiority of one architecture over the other. The intent is solely to demonstrate the process of benchmarking Java applications across different architectures, without drawing conclusions about the overall superiority of x86 or ARM.
Use the following commands to delete resources created during this post:
To optimize application performance and resource utilization, we demonstrated benchmarking a Java application across x86 and ARM (Graviton) architectures on Amazon EKS. Using open- source observability tools, we captured metrics like latency, requests per second, system load, and lock contention instances. While results varied based on workloads, comprehensive benchmarking proved crucial for evaluating architectural suitability.
As a call to action, you can leverage the power of benchmarking and observability tools to make informed decisions about adopting ARM-based architectures like AWS Graviton for your specific workloads, ensuring optimal performance and resource efficiency.
Siva Guruvareddiar is a Senior Solutions Architect at AWS where he is passionate about helping customers architect highly available systems. He helps speed cloud-native adoption journeys by modernizing platform infrastructure and internal architecture using microservices, containerization, observability, service mesh areas, and cloud migration. Connect on LinkedIn at: linkedin.com/in/sguruvar.
Imaya is a Principal Solution Architect focused on AWS Observability tools including Amazon CloudWatch, AWS X-Ray, Amazon Managed Service for Prometheus, Amazon Managed Grafana and AWS Distro for Open Telemetry. He is passionate about monitoring and observability and has a strong application development and architecture background. He likes working on distributed systems and is excited to talk about microservice architecture design. He loves programming on C#, working with containers and serverless technologies. LinkedIn: /imaya.