HIGH AVAILABILITY SOLUTIONS
In context of IT operations, the term High Availability refers to a system (a network, a server array or cluster, etc.) that is designed to avoid loss of service by reducing or managing failures and minimizing planned downtime.
A system is expected to be highly available when life, health, and well-being – including economic well-being – are at stake.
In information technology, system or component availability is expressed as a percentage of yearly uptime. Service Level Agreements (SLAs) generally refer these availability percentages in order to calculate billing. Using the unachievable ideal of 100% availability as a baseline, the goal of the highest levels of service availability is considered to be “five nines” – 99.999% availability.
HIGH AVAILABILITY MANAGEMENT
High availability can be achieved only with thorough planning and consistent monitoring.
A good starting point for high availability planning involves the identification of services that must be available for business continuity, and those that should be available.
For each level of service, from must to should, it is also worthwhile to decide how far the organization is willing to go to ensure availability. This should be based on budget, staff expertise, and overall tolerance for service outages.
- Network availability: How available is your network, compared to the SLA with your Internet Service Provider (ISP)? Check this with Network Internet Control Message Protocol (ICMP) echo pings, via your network monitoring software.
- Bandwidth usage: How much bandwidth does your system consume, at both peak and idle times? Get this information from managed routers and Internet Information Services (IIS) log analysis. Use it to plan bandwidth allocation for known peaks (end-of-year crushes, key shopping days, etc.), and avoid inadequate bandwidth scenarios.
- HTTP availability and visibility: Are you monitoring system HTTP requests – internally, per ISP, and per geographic location? Problems with internal requests can serve as an early warning of outward-facing problems. Track HTTP requests from ISP networks to determine whether or not users of these networks can access your service, and monitor requests from different geographic locations to ensure users from anywhere in the world are able to use your services.
- System availability: Are you keeping track of abnormal and normal operating system, database, and enterprise server system shutdowns?
- Performance metrics: Do you monitor the number of users that visit your site or use enterprise applications, and compare these numbers to latency of requests and historical CPU utilization? Have you grouped servers by function, and do you monitor disk capacity and I/O throughput? Do you check fiber channel controller and switch bandwidth, and keep an eye on overall system memory usage?