Back to Posts

Performance: from nothing to something

In this series I’m writing about my little journey on writing high performance web application using Asp.Net (disclaimer: performance of my application is still not high enough yet!) with focus on knowing your targets, understanding the complexity of those targets, and working towards the targets. As a side note, I’m using the terms “high performance” and “highly scalable” interchangeably.

To start with, the most interesting observation is that in order to design and develop high performance application you need to have a clear definition about what kind of performance you want to have. Throughout my career so far, I have gone through a few levels of definitions:

1. I didn’t know/I didn’t care

That was when I just started working as a developer and my only target was to write code that could work!

2. I thought I wrote good code

I applied a few best practices such as using StringBuilder to concatenate many strings, avoiding boxing and unboxing, or using dictionary for fast look up etc. As a result, I could proudly (and wrongly) claim that I knew how to write high performance code and that my application could highly perform.

3. The “perceptive” high performance application

This was when I faced a little bit reality check: Usually a customer came and told me that my application was too slow to load a page or to serve a request and that I needed to make it faster. No problem then! Let fire up my favorite profiler tools to find out some stupid SQL queries or backend code or JavaScript. At the end of the day, code did run faster and the customer was happy.

The problem of all the above cases is that they produced an illusion about performance capacity of my applications as well as my skills. As it turned out, I had absolutely no evidence to prove that performance of my application was good (and in fact, more often than not it sucked). The third case, while had some output value to my customers, was actually very limited because it worked just for those customers in some specific condition. One unfortunate aspect of cases like this is that it creates an illusion about performance capacity of my applications as well as my skills. I have seen many developers, including myself, who designed and claimed to have “high-performance and scalable” systems but only took this kind of performance criteria into account.

Don’t get me wrong. I knew that sometimes perceptions are more important than numbers/metrics and that there is something called perceptual speed index. However, I believe that it is even a higher level of performance because it requires that both metrics and perceptions must be good and thus I consider it out of the scope of this post.

4. The ugly truth or SLAs

In the context of this blog post, I will focus on SLAs that are relevant to performance of web applications. SLA stands for “Service Level Agreement” which defines (again, in the context of this blog post) in exact numbers the level of performance that your applications must satisfy. In my opinion, the rule of thumb when it comes to performance is that you must always work with numbers. For example, do not say “my application is very fast” but say “my application is very fast because it can serve all requests in under 10ms”. The 3 most popular SLAs are: Availability: While availability is not a pure SLA for performance, I mention it here because what is the point of having a high performance application that is not available to use?

Availability 99.9%
Availability 99%

For instance, a week has 10080 minutes which means if SLA is 99.9% I will have 10 minutes down time/week for maintenance or upgrade. I gotta do it quickly!


I learned about the definitions for all concepts below from Why building highly scalable applications is hard It is the best article about writing highly scalable software I have ever read and I just can’t recommend it highly enough.


Throughput is a measure of the rate that work can be completed and is measured in operations per second.

System Throughput Throughput
User account page 100 users/second/node 280 users/second/3 nodes
GET a single user via RESTful API 200 requests/second/node 550 requests/second/3 nodes

The numbers are straightforward: for every 1000 API requests, at least 999 of them must be served within 800ms.


Latency is a measure of how long it takes an operation to complete and is normally measured in seconds.

System Response time Percentage Response time Percentage
User account page 1000ms 90% 2000ms 99.9%
GET a single user via RESTful API 500ms 90% 800ms 99.9%


Concurrency is the number of operations that are being processed at any moment. Concurrency = Latency x Throughput.

So if latency of a GET request is 0.5s and my application can serve 200 requests/second, its concurrency is 0.5 * 200 = 100 concurrent active requests.

Taking all the above into account, my application must be able to:

  • GET user requests/second/node.
  • Serve 200 GET user requests/second/node and each request must take maximum 800ms.
  • Serve 200 GET user requests/second/node and each request must take maximum 800ms during a long period of time without any hiccup: 24 hours or 24/7.
  • Serve 200 GET user requests/second/node and each request must take maximum 2000ms during a long period of time without any hiccup: 24 hours straight or 24/7 when your database has 100.000 users.

As a side note, all SLAs should be defined with a minimum requirements for hardware configurations. By the time of this writing, a popular server may sport 8 cores with 8-16GB memory. For systems that need other hardware such as database servers, you should define minimum configurations for them too.

Read Next

Notes for creating an SSL CSR