Our Articles

A Monthly Article from our Speakers

Current Article of the month

Articles of 2020

Articles of 2019

Articles of 2018

Articles of 2017

Articles of 2016

Articles of 2015

Articles of 2014

Articles of 2013

Articles of 2012

Articles of 2011

Articles of 2010

Articles of 2009

Articles of 2008

Articles of 2007

Articles of 2006

Articles of 2005

Articles of 2004

Articles of 2003

Articles of 2002

John KneilingHighly Available Web Application Design

by John Kneiling

August 2004


For many reasons, the Web has become the framework of choice for enterprise and departmental applications. Many organizations, however, are wondering whether they will be reliable enough to support important business processes. What can we do to make these applications highly available?

There are three High Availability (HA) factors: availability, reach-ability, and performance. In an HA application system, all necessary system services are operating correctly. All of these services are reach-able, which means that they can be invoked over the network. And all services must perform to minimum standards; network, application, and server resources.

We can’t improve availability unless we monitor it. There many metrics, but let’s stick with the basics. Overall system availability is first, and that means that all necessary business services. Next is reach-ability; to what extent are these services accessible from designated locations in the organization? There are two ways to measure performance: round-trip response time measures the consumer view, while transaction throughput gives a provider view.

Redundancy satisfies two goals. It improves availability by removing a single point of failure. If we replicate resources closer to the user, we also reduce path length, and avoid potential bottlenecks. This is called “locality of reference” and is another way of optimizing our data and application resources. The most important tactics for planned redundancy are server fail-over/load balancing, and deploying multiple configurations of our resources.

Load balancing allows web and web application servers to direct requests to available machines. This technology also helps the system cope with crashes, by redirecting requests to ‘hot standby” resources, such as Database Server 2 in figure 1. Load balancing spreads the load over multiple machines, increasing the capacity beyond the capabilities of any one server. The incremental capacity increase preserves server investments, and provides a low-cost way to achieve high availability by removing a single point of failure.

Load balancing tools use configuration knowledge, such as the current status and latency of each machine in the configuration. Some tools support multiple locations by redirect each request based on the client’s location in the Internet. These tools are generally embedded in Web, Web Application, and E-Commerce servers.

Figure 1 – Load Balancing and Fail-Over

Session failure is the ability of an application to survive if user sessions aqre lost. This capability must be designed into the system, and is usually implemented in Web application servers. Having no session fail-over means that all unsaved work on the session is lost, e.g. if a consumer is shopping, the shopping cart and all other information would be destroyed. This data is often called TSD (Transient Session Data).

There are two approaches to this problem: Database persistent sessions and memory-to-memory sessions. Both are often called “Session Clustering.” A database persistent session stores and maintains session data in a database during execution. Then, if the server fails, the user’s session is transferred transparently to another web application server. There is a performance penalty – the session data must be logged and retrieved to maintain state. Memory-to-memory sessions are similar, except that the session data is replicated other servers in the cluster. This approach eliminates the database logging and retrieval overhead.

Implementing session failover can be confusing if you are using Java. First of all, “Session Clustering” is defined as an open standard in the Java Server API. But it is implemented differently with different products, so don’t count on session clustering across servers that run different Java products.

If you are using a J2EE, do not use Session EJBs to store TSD. These beans maintain state information as in-memory data to allow data caching, which improves performance. The problem is that if the application fails, all TSD is lost, and the users must start over.

A better alternative is to design stateless session EJBs. These beans do not maintain in-memory session status. Each method in the stateless session executes independently, and does not rely on in-memory data. For instance, a shopping cart can be maintained on a persistent object, and then moved to the DBMS by the applications data access code. When the user is finished shopping and wants to proceed to checkout, the selected articles are retrieved from the DBMS (again by the data access code), and the transaction is complete. If this stateless session fails, the clustered session fails-over to an alternate application server (see figure 1 above). At checkout, the new server reads the shopping cart list from the DBMS.

To implement this in Java, we can extend the shopping cart object, by adding an object called SelectedItems, which is Java ArrayList of MerchandiseItems. At checkout, the application will retrieve SelectedItems from the object. This assumes that we are using memory-to-memory clustering.

This is how it works: If a server fails, objects will be part of the user’s session, and if we are using session clustering, the user’s session will fail-over transparently. Performance will improve because SelectedItems is a bean (in memory), and not the DBMS. This approach works best for small amounts of data. Large amounts of data should not be stored as user sessions objects. This example also assumes that we are using a development approach that isolates data access code, such as Model-View-Controller (MVC).

To be available, a high performance application must be scalable. Horizontal scaling means that a cloned application runs on two or more servers configured for fail-over. Vertical scaling places multiple instances of an application on a single server. This provides additional capacity and server lever fail-over. In figure 2, App 1 scaled horizontally and vertically, while App 2 is scaled horizontally only.

The most accurate way to plan redundancy is to perform a full capacity analysis. First, determine the relationship between resources and the server user base. Catalog the total number of users, average and peak concurrent user load, required application Random Access memory per instance, hard disk space, etc. If a system or application fails, there must be enough capacity to support the average user load, including web and application servers, memory, and other resources. Some organizations have determined that servers should not run higher that 40% of capacity under an average load.

Figure 2 – Scaling and Cloning

Developing a Highly Available system is not about technology. It is about design application system to meet business requirements by making high performance applications available and reach-able. This means removing single points of failure, providing horizontal scaling, and isolating data code with an approach such as MVC. Other important factors include using session clustering and fail-over, and treating transient data with high-speed recover in mind. Robust web applications are a challenge, but one that we can meet with the right design approach.