Go With The Flow
As the competitive environment for communications services providers (CSPs) continues to increase, they are being asked to do more with less. Forced to invest in their networks to meet the rising bandwidth requirements of their subscribers, they at the same time are being asked to cut operational costs to try to meet profitability targets and competitive pricing pressures. Adding fuel to this fire, Universal Services Fund (USF) reform is threatening to replace existing operational subsidies with broadband capital incentives -- creating an even greater need to find new operational efficiencies. The CSP conundrum is clear: more network traffic to manage and more equipment out in the field, yet fewer resources and dollars with which to manage it.
Addressing this arising challenge head-on, CSPs are looking for ways to replace many of their most time- and resource-consuming activities. One such area is network service and management. Historically, the lack of visibility into the way subscribers use their IP/Ethernet network has been one of the biggest causes of growing operational costs, primarily because their only path to resolution has been through truck rolls and trial-and-error. CSPs have had to rely on little more than hunches and intuition to troubleshoot networks. In order to remain competitive, CSPs can’t maintain this approach.
Thankfully, as broadband infrastructure is upgraded to meet growing service demands, new tools are being introduced that provide more reliable methods of network diagnosis and trouble resolution.
Removing the Guesswork
Flow analysis software is a new category of software tools that removes the guesswork from troubleshooting IP/Ethernet networks. These software products have a number of key features that are essential for understanding CSP network issues, including:
1. Real-time or near real-time display of subscriber information like bit rate and packet rate on a per-application and per-destination-IP-address basis.
2. Sophisticated recording and playback capabilities that enable past network traffic to be displayed with all the same information available in real-time.
3. Automated intelligence that uses historical patterns to perform network surveillance for malicious or
questionable activity.
For some time now, CSPs have recognized and taken advantage of the benefits of a unified IP/Ethernet network. But those benefits have come with a major cost: the loss of network control and security. Before the Internet, a voice connection consumed a DS0, and there was nothing in the voice “content” that could take down other DS0s. Consequently, there was no need to protect the voice network from a “malicious” DS0!
By contrast, a skilled hacker today with a 1 Mbps connection can take down an entire (10 Gbps) network. Today, it is estimated that in the U.S. the average cost of a security breach is $7.2 million, according to a 2010 Ponemon Institute (www.ponemon.org) benchmark study sponsored by Symantec. This problem raises awareness that not all content is equal, because the content in a connection can affect the network. While the vast majority of traffic is well-behaved, some can be disruptive either through malicious or careless acts. This fact has proven to be one of the main contributors forcing CSPs to realize that visibility into network traffic is imperative.
Just knowing that there is a problem in some portion of the network bandwidth is usually not enough information to take corrective action. The aggressive approach of severing the connection to the Internet from suspicious segments of the network is not a realistic option. The ideal solution would be blocking the problematic bandwidth and allowing the rest to pass, but that requires clear network visibility and the ability to successfully isolate the issue.
To effectively address these problems, the CSP needs tools to analyze specific flows. The term flow is introduced here to be a one-way series of packets that all have the same 5 characteristics:
1. Source IP address
2. Destination IP address
3. Transport protocol (TCP or UDP)
4. Source port number
5. Destination port number
This flow of packets is the most basic entity used to analyze network traffic.
Four Flow Factors
Given that flows are the subject of the analysis, what are the necessary features of a flow analysis application needed by CSPs?
Feature 1: Real-Time Reporting. For network problems that are constant or persistent, one only needs to know the long-term average of the flow bandwidths. Often, problems are intermittent, and in those cases, the tools need to be near real-time, with second-by-second reporting capabilities. Updating flow information at 30-minute, 15-minute or even 5-minute intervals is generally not sufficient to diagnose even the most common problems.
Feature 2: Per-Subscriber/Per-Application Granularity. More and more, network problems cannot be diagnosed solely by bandwidth, but require more specific, detailed information. For each flow, one needs both subscriber granularity, like the subscriber IP address, and application granularity, like far endpoint identifications, the application name, the bit rate, and packet rate.
Feature 3: Recording and Playback Capabilities. For transient network problems, it is important that the flow analysis tool continuously records all network flows and retains their full information. This gives an operator the ability to re-play recorded traffic in order to identify a single problem.
Feature 4: Automated Network Surveillance. As all-IP/Ethernet networks become more sophisticated and complex, monitoring for the signatures of conceivable threats becomes increasingly difficult for the network operator. Worse, the thresholds for these signatures are not static, but change with time-of-day, day-of-week, and even with changes in consumer behavior. Thus, it is imperative that the flow analysis tools have the ability to continuously monitor for every signature. Most importantly, the tools should have the ability to learn what is typical behavior so they can automatically set threshold values for alerts and alarms.
The Solution
The dashboard in Figure 1 showcases the overall status of a CSP network. Overall bit rate, frame rate, total applications and total endpoint connections are displayed on a second-by-second basis. This dashboard view gives the operator a general idea of the status of the network with a quick glimpse. When alerted to the possibility of a problem, the operator can use some of the other screens to quickly ‘drill down’ to the appropriate level in the network. Often this requires going down to the per-subscriber/per-application level of granularity.

Figure 1. Top level display for a typical flow analysis software tool.
Figure 2 shows a subscriber-level view that includes the pertinent flow information for a single subscriber, including a display of the bandwidth being used for each connection based on the endpoint IP address, as well as the bandwidth used for each application. Toward the bottom are displays of bit rate, packet rate, number of connection endpoints and total active applications. These graphs are enormously helpful in debugging subscriber problems.

Figure 2. Typical flow analysis of a single subscriber.
Possible Scenario
Consider a typical situation in which a CSP Help Desk receives a call from a subscriber. This subscriber has a home office, and he is complaining that he isn’t getting the 10 Mbps bandwidth for which he is paying. In what follows, we walk through a typical session between a customer service representative (CSR) and the subscriber.
Upon taking the call, the CSR enters the subscriber’s ID into the flow analysis tool’s search window, which brings up a monitor of the subscriber’s bandwidth usage. (See Figure 3.)

Figure 3. Single subscriber monitor window showing high bit rate and packet rate.
The Bit Rate trend chart on the lower left of the figure reveals that the subscriber is indeed consuming his entire 10 Mbps / 1 Mbps bandwidth. Because these charts are updated in near real-time, the CSR can ask the customer to perform diagnostic steps and quickly see the effects. For example, the subscriber can shutdown his computer to see if this is generating the bandwidth. Alternatively, he can shutdown other computers one at a time, or even disconnect his Wi-Fi to see if a neighbor is using his service.
Let’s assume the subscriber’s own computer is the root of the problem, but the subscriber claims he is not running any applications. The CSR notices the Packet Rate trend chart, in the lower center of the figure, indicates a relatively high packet rate. By dividing the bit rate by the packet rate, one sees that the average packet size is roughly 100 Bytes. Packets this small, for such a high bandwidth usage, are typical of Denial of Service (DoS) attacks. The upper right of the figure indicates that email (SMTP) packets are generating nearly the entire upstream and downstream bandwidth. The CSR can make the reasonable conclusion that the subscriber computer is indeed infected with an email DoS virus.
But what if the packet rate is not unusually high, but looks instead as shown in Figure 4? Here, we have additional information with the trend chart on the right showing Endpoint Count. This shows the number of connections that are uploading or downloading data is unusually high. This is a signature of peer-to-peer filing sharing like BitTorrent. The upper left portion on the figure displays the actual IP addresses for each of these endpoints. The CSR can reasonably conclude that someone at the subscriber’s residence is running a file sharing program.

Figure 4. Single-subscriber monitor window showing high endpint count.
Indeed, flow analysis software tools have tremendous potential, as CSPs can use them to identify new revenue sources, optimize network resources, or make informed decisions about capital equipment upgrades.
As we continue to invest in broadband infrastructure modernization, these improvements are critical to help CSPs improve key business functions, quickly identify network bottlenecks, and ultimately better service their end-users.
David Cleary is a solutions marketing director at Calix. He has more than 10 years of experience in telecoms and marketing. For more information, visit www.calix.com.
What’s your take on this subject? Leave a comment and get the conversation going.
