There are three bottlenecks to Wide Area Networks (WANs) that don’t exist in other environments, and these are Bandwidth, Latency and Cost. (And in true three pronged problem fashion, you can take your pick of two out of three).
In many cases, data center, campus and branch networking are much easier to deal with than the WAN. The presence of many high capacity ports and short distances between equipment and end-users means that most issues can simply be fixed with an additional cable. Why is this not the case for WAN?
We detail the three challenges, their impacts and mitigations and then form a conclusion.
Challenges of the three bottlenecks:
Latency
The big challenge of the Wide Area Network is the speed of light. The speed of light regulates how fast data moves across a network, and becomes a function of geographic distance. Increasing distance will increase latency. It is typically measured in milliseconds (ms).
Bandwidth
The first major challenge is bandwidth. This is the limitation imposed by the size of the smallest network link in the chain to carrying volumes of data between the two sites. This is commonly expressed in bits per second (b/s, or Kb/s, Mb/s, Gb/s or soon Tb/s), and relates to the data throughput of a link.
Within a site, it’s relatively easy to string an extra cable between two switches to provide extra capacity.. But the challenge of the WAN is that as well as distance (see latency) service providers want to reduce the number of cables in their core networks (see cost).
Cost
The big element that a lot of network managers don’t have control over is how much they can spend on the links necessary to make the connections between the sites. Cost can be managed by either reducing the bandwidth between sites or downgrading the type of connection used to connect to the network (removing redundancy, reducing QoS, selecting Internet transport, rather than MPLS).
Impacts of the three bottlenecks:
Latency
The impact of the extra latency is seen as the delay in voice conversations with someone overseas, which becomes much more pronounced with a satellite link in the conversation (the so-called satellite delay).
Similarly digital communications are also affected by the delay in propagating the information from one end of a link to another, and for protocols such as TCP/IP, this can mean a significant reduction in performance as sending additional data is reliant on signalling that previous data has been correctly received.
The end-user of an application will see this as unresponsive applications, delays in updating information displays once data is entered, or a delay at the end of speech in voice connections (the latency effect is not so visible in video)
Bandwidth
The impact of not enough bandwidth is seen with video in a downgrading of the quality of the remote image, a reduction in frame-rate. With voice, modern adaptive codecs will adjust the quality downwards, which may end up with a more robotic sounding source. With data, a lack of bandwidth is seen in the slow transfer of data, an increase in the response time of applications as data is held at various points across the network awaiting space to be sent.
When bandwidth pressures are extreme, packet-loss will increase as the network is forced to buffer increasing amounts of data into a smaller pipe.
Cost
The impact of cost is in the selection of the a lower capability circuit for a site, either in bandwidth (see above), or in resiliency (choosing one instead of two accesses), or in capability (removing QoS, or using an Internet service instead of MPLS, for example)
Mitigations for the three bottlenecks:
Latency
Until ansible technology is invented, latency is something that we have to live with. Even if a direct route through the earth could be achieved, this would only reduce the latency between London and Sydney by approximately 25%).
Over the past few years, the use of very remote servers has led to improvements in the protocols used which minimize the effects of latency (particularly in reducing the number of conversation turns). It is still possible, however, to deploy poorly written applications which have challenges even in low (national) latency WANs.
TCP has extensions to support larger amounts of traffic in-flight across the network, as well as minimizing the impact of lost packets on these bigger data flows. The ancient SMB protocol has in the last few years become more robust to deal with the smaller block sizes that used to cause problems in file replication between servers. The use of local proxies for some of the services has also mitigated the impact of latency (and bandwidth) by serving content locally.
Bandwidth
Local proxies can minimize the amount of data traversing the connection to a site, by serving content locally. For a long time, Riverbed and similar technologies have provided mechanisms to compress repeated data being sent across the connections, by using patterns stored locally and sending a token representing these across the link.
Cost
The use of both Internet and MPLS circuits to provide dual connections (and resilience) to a site, whilst increasing available bandwidth (Internet bandwidth is considerably cheaper than MPLS in most environments) is becoming more common, especially with the advent of SD-WAN to manage the more complex topology these environments create.
Examples
In the interests of keeping this individual post short, I’ll put the various examples of latency, bandwidth and cost impact in separate posts, and link them here.
Conclusion
In short, there are many mitigations that can be made to reduce the challenges of the three bottlenecks, but they will vary for each network implementation and site. As SD-WAN becomes more predominant in the WAN market-place, it becomes easier to balance the requirements across multiple networks all behaving as one, as well as to provide service chaining to minimize the effects of latency and bandwidth.