Deep Packet Inspection
Deep Packet Inspection. Sounds like something out of a James Bond thriller. A secret agent infiltrates an organization, gathers, and analyzes its deepest secrets -- and disappears into the night. No one is the wiser.
Come to think of it, this description isn't far off the mark. There is now a means (electronic, of course) of capturing significant bits of information from the headers (or even payload) of packets being transmitted across the Internet. And no one will be the wiser.
We all know that in packet switching a message (data or digitized voice or video) is broken up into a large number of "bundles" of bits. These bundles (i.e., packets) have a header attached and that header tells the routers in the Internet where the message has come from and where it is going. It is then left to the router to find a route headed in the proper direction, seize that route, send the message, and drop the circuit. Somehow at the destination these packets are put in the proper order and delivered to the recipient.
Clearly, doing anything meaningful with these packets is impossible. Half a word goes by way of Saint Louis, and half by way of Albany, New York. Individually these packets are rather meaningless.
WRONG! They are not meaningless at all, and an organization or government able to make sense of all this could have a very important leg up.
A packet has 2 main parts: the header and the payload. The header provides all the necessary information regarding the packet itself (e.g., source, destination, length, etc.), and the payload is the message being sent.
There are a number of different protocols used for sending messages on a particular link, but the most popular is version 4 of the Internet Protocol. It's called IPv4. We'll deal with it.
There are 14 fields in the header, structured as follows:
1. Version. 4 bits. Describes protocol is being used. For IPv4 (origin of the title: IPv4).
2. Header Length. 4 bits, and thus a maximum value of 16. The minimum value for this field is 5. Since we are dealing with 32-bit words, and 8-bit bytes, the minimum value equates to 5 words x 32 bits = 160 bits = 20 bytes. Being a 4-bit value, the maximum length of this field is 15 words x 32 bits = 480 bits = 60 bytes.
3. Differentiated Services Code Point. 6 bits In simplest terms, this describes the type of service (e.g., streaming video, VoIP, etc.).
4. Explicit Congestion Notification. 2 bits. This is an option feature; if both endpoints support it, it identifies network congestion.
5. Total Length. 16 bits. Length of the whole packet. both header and payload. (This is properly called the datagram). The minimum length datagram is 20 bytes, (a 20-byte header, and a 0 byte payload. The maximum length is of a 16-bit word = 65,535 bytes.
6. Identification. 16 bits. Identifies parts within a packet.
7. Flags. 3 bits. Says whether a packet can be subdivided.
8. Fragment Offset. 13 bits. Packet sequence in a message.
9. Time to Live. 8 bits. Prevents a packet from wandering around the network forever. This time-to-live field is specified in seconds. In practice it is somewhat simpler; each router that the datagram goes through decrements the time-to-live count by 1. When this field hits zero (0), the packet is discarded, and a message is sent back to the sender with this information.
10. Protocol. 8 bits. Identifies protocol used in the data portion. (There are more than one.)
11. Header. Checksum. 16 bit. Used for error checking. As a datagram passes each router the checksum of the header is compared with the value of this field. If there is a mismatch, the packet is discarded.
12. Source address. 32 bits. Address of sender.
13. Destination address. 32 bits. Address of receiver.
14. Options. Available for special uses.
The need for many of these fields is obvious: certainly the network must know where the message (i.e., the packet) is coming from and going to. It has to know when the header is finished and the payload beginning, and also when the payload is finished. The fragment offset will permit a received packets to be put in the proper sequence. The Time-to-Live field will prevent a messed-up message from going around the network forever.
Capturing the data in a packet is, conceptually, no big deal. A piece of hardware, with an input port, an output port, and a monitor port, is inserted in a link. And because Internet Service providers route all of their customer's traffic, these ISPs are able to monitor web-browsing habits in a very detailed way.
It's interesting to note that the providers of certain browsers are now moving to add no-tracking software to their product. The information gathered at this network tap can either be Deep Packet Capture, in which case an entire packet -- header and payload -- is captured, or Partial Packet Capture, in which case just the header is captured. The use of partial packet capture can reduce storage requirements and avoid legal problems.
So what does one do with all this captured data?
- Identify security breaches. Pinpoint the source of an intrusion, and verify that the traffic being moved belongs to authorized people.
- Identify data leakage. It's important to determine what files have been sent out from the network.
- Troubleshooting. Pretty hard to figure out what is happening if you can't monitor what is happening.
- Targeted advertising. For better or worse, someone somewhere is monitoring our Internet habits.
- Lawful intercept. Here's where things get touchy. All telcos and Internet Service providers and VoIP providers are required, when requested, to provide a means of tapping a particular circuit. This is the CALEA requirement.
- Copyright enforcement. Similar to lawful intercept.
- Detecting and identifying data loss. Credit card numbers were stolen? What credit cards?
- Forensics. Apply triggers to capture certain data. Analyze, in depth, what has taken place.
We did not mention one other use, for better or worse. And that is an application of Network Neutrality. Should all Internet messages be treated equally? Should a large file being downloaded have the same rights to passage as streaming video? In the first case a delay of a few milliseconds (or even a second) is no big thing. In the second case there is an obviously unpleasant disruption in the transmission. To put it in somewhat technical terms, some applications (streaming video, VoIP) depend on low latency. Other applications do not. But if an ISP applies such logic, there may well be too many opportunities to mis-use the capability.
Almost all proponents of Packet Capture are quick to say that they are not at all interested in the payload; that all they want to do to help the network by examining the header. But who can be sure?
The technology of the Internet, and of packet switching, certainly make any sort of packet capture possible. If treated properly, the network will be more efficient. If treated improperly, a great deal of harm could result.
It would seem that privacy is a thing of the past.
What’s your take on this subject? Leave a comment and get the conversation going.
