Tuesday, 16 January 2007

traceroute - a very useful troubleshooting tool which reveals the bottlenecks on the Internet.

I am sure anyone who is at the least Internet savvy, will be aware that to move data from one point say A to another point B across the Internet, it has to pass through a number of intermediary points say C, D,E.... But what many won't know is that your data is not transferred in one piece when it is sent over the net, rather, it is split into chunks of say 1500 bytes each, then each chunk is enclosed in what is known as a packet which contain some additional data such as the destination IP address and port number apart from some other details which provide the unique identity to the packet and finally it is sent across the net.

While the packets travel the path from point A to point B, each packet may take a different path depending upon diverse factors and eventually they are merged together in the same order at the receiving end to provide the document you sent in the first place.

The intermediate gateways through which the packets pass through before they reach the final destination are known as hops. So for data to travel from point A to point B on the net, it has to go through a number of hops.

Linux & Unix being network operating systems have a number of powerful tools which aid the network administrator to find out a wealth of data about their network and the Internet. One such tool is the ubiquitous traceroute.

The tool traceroute is available in all Unix and Linux distributions and is used to find out the potential bottlenecks in between your computer and a remote computer across the net. The usage of this tool is quite simple and is as follows:
# traceroute <domain or IP address>
Usually you have to be root to run this tool as it resides in the /usr/sbin directory. But if you use the full path, then you can run this tool as a normal user as follows:
$ /usr/sbin/traceroute <domain or IP address>

For example, this is the output I received when I ran a trace on the www.yahoo.com domain from my machine.
$/usr/sbin/traceroute www.yahoo.com

traceroute to www.yahoo.com (69.147.114.210), 30 hops max, 40 byte packets
1 10.2.71.1 (10.2.71.1) 21.965 ms 22.035 ms 22.111 ms
2 (ISP) (ISP gateway) 22.510 ms 25.716 ms 26.073 ms
3 61.246.224.209 (61.246.224.209) 69.212 ms 59.778 ms 63.334 ms
4 59.145.6.1 (59.145.6.1) 65.632 ms 64.750 ms 64.868 ms
5 59.145.11.69 (59.145.11.69) 63.562 ms 64.219 ms 63.742 ms
6 203.208.143.241 (203.208.143.241) 318.632 ms 307.733 ms 316.650 ms
7 203.208.149.25 (203.208.149.25) 317.534 ms 308.116 ms 307.507 ms
8 203.208.186.10 (203.208.186.10) 245.835 ms 247.878 ms 248.862 ms
9 so-1-1-0.pat1.dce.yahoo.com (216.115.101.129) 286.774 ms 289.702 ms so-1-1-0.pat2.dce.yahoo.com (216.115.101.131) 326.470 ms
10 ge-2-1-0-p141.msr1.re1.yahoo.com (216.115.108.19) 324.044 ms 324.497 ms 326.011 ms
11 ge-1-32.bas-a1.re3.yahoo.com (66.196.112.35) 333.479 ms 333.019 ms ge-1-41.bas-a2.re3.yahoo.com (66.196.112.201) 292.967 ms
12 * * *
13 * * *
14 * * *
15 * * *
.
. //Truncated for brevity
.
29 * * *
30 * * *
As you can see from the output spewed by traceroute, it defaults to a maximum of 30 hops. The first line of the output gives the IP address of the yahoo.com domain which is 69.147.114.210, the maximum number of hops traceroute will keep track of the packets before it reaches the destination and the size of the packets which is 40 bytes.

The next 30 or so lines show the IP address or domain name of the gateway servers through which the packets pass through as well as the time in milli-seconds of the ICMP TIME_EXCEEDED response from each gateway along the path to the host. traceroute program utilizes the IP protocol's time to live (TTL) field. By default, it starts with a TTL value of 1 but this value can be changed with the -f option.

Now lets take a closer look at the output of traceroute to the yahoo.com domain as shown in the listing above. As you can see, the second hop is always to ones ISP's gateway as shown by the address (I have removed the address of my ISP's gateway). On the same line, followed by the IP address, there are three time values in milli seconds. There are three values because traceroute by default sends simultaneously, 3 packets of 40 bytes each. And the three time values are the time taken to send the packets and receive a ICMP TIME_EXCEEDED response from the gateway. Put another way, these three values are the round trip times of the packets. So for the three packets to reach my ISP's gateway, and get an echo back, it takes 22.510 milli seconds, 25.716 ms and 26.073 ms respectively as is displayed by the values of the 2nd hop.

Lets look at the 5th and 6th hop in the output above. If you compare the times, you will find a drastic increase in the times. If it is 63.562 ms for the 5th hop, it is 318.632 ms for the 6th hop. This is because up till the fifth hop, the gateway servers were within the Indian sub-continent itself. Where as the gateway of the 6th hop is in Singapore and so it takes that much more time to get a reply. Generally, smaller numbers mean better connections.

Check out the 11th hop. It shows two domains with one domain for the first two packets and a different domain for the third packet.

And from 12th hop onwards I get a series of time outs as shown by the asterisks. So my trace of the www.yahoo.com domain resulted in a series of time outs and did not complete. The problems could be one of the following:
  • The network connection between the server on the 11th hop and that on 12th hop is broken.
  • The server on the 12th hop is down.
  • Or there is some problem with the way in which the server on the 12th hop has been setup.
To make sure, I did a ping of the www.yahoo.com domain and as expected, I received 100% packet loss as shown by the ping output below.
$ ping -c 2 www.yahoo.com
PING www.yahoo-ht2.akadns.net (69.147.114.210) 56(84) bytes of data.

--- www.yahoo-ht2.akadns.net ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1009ms
Usually this means I will not be able to access the concerned domain. But in yahoo.com's case, I was able to access the domain without any problem as in all probability, their website is mirrored across a number of servers spread across the world. So if one server is down, the query is re-routed to the next nearest server.

traceroute is a very useful tool to pin-point where the error occurs on the internet. It can also be used to test the responsiveness of a domain or server. For example, If your route to a server is very long (takes over 25 hops), performance is going to suffer. A long route can be due to less-than-optimal configuration within some network along the way.

Similarly, if you see in a trace output, a large jump in latency (delay) from one hop to the next, that could indicate a problem. It could be a saturated (overused) network link; a slow network link; an overloaded router; or some other problem at that hop. It can also indicate a long hop, such as a cross-country link or one that crosses an ocean (compare the timing of the 5th and 6th hop in the yahoo.com trace output above).

No comments:

Post a Comment