Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Network Troubleshooting Tools and Strategies

News See Also Recommended Links Recommended Papers Tutorials Reference

Debugging

ping traceroute netstat route ndd ngrep netcat
ifconfig nmap ntop rsync TCPDump Humor Tips

The ping, traceroute, ngrep and other network tools are indispensable  tools for troubleshooting networking problems.  They are preinstalled both on Solaris and Linux. We will use Solaris as an example below. Network troubleshooting means recognizing and diagnosing networking problems with the goal of keeping your network running optimally. As a network administrator, your primary concern is maintaining connectivity of all devices (a process often called fault management). You may also continually evaluate and improve your network's performance. Because serious networking problems can sometimes begin as performance problems, paying attention to performance can help you address issues before they become serious.

Like in any investigation you need to avoid jumping to conclusion and calmly collect all relevant facts.  You can use famous "How to solve it "approach. Among more network specific issues:

In general, there is no one correct way to determine the root cause of a networking problem. Like any troubleshooting of complex systems this is more art then science and the success depends both on your IQ and the level of experience with the environment.  However, there are a heuristics that  you can follow:

Troubleshooting Commandments

  1. Create a backup of the faulty system before fixing anything. Backup can be done only for configuration files or for the complete system. Complete backup is important as troubleshooting is a high stress activity and it is easy accidentally to destroy some files. Ghost is a great tool for performing quick complete backups and Ghost 2003 works with Linux ext filesystems.  With the current sized of  USB flash drives available most system partitions can be backuped on a flash drive. Such backup also can be indispensable if the fault disappears on its own: faults that fix themselves often come back on their own too.
     
  2. Before changing and file always create a baseline.  That prevents you from the most typical mistake in troubleshooting: losing the initial configuration.
     
  3. Simplify your environment, if possible. Where possible try to remove routers and firewalls from the networking path affected. Often problems are introduced by network devices. This is typical for example for home environments with cheap routers like Linksys.

    In enterprise environment left hand often does not know what right is doing and similar effects can be due the fact that someone may have upgraded a router's operating system or a firewall's rule set.

    Patches are just special kind of upgrade and can introduce problems too.
     
  4. Have a testing plan. Make sure that you can replicate the reported fault at will. This is important because you should always attempt to re-create the reported fault after effecting any changes. You need to be sure that you are not changing or adding to the problem.
     
  5. Document all steps and results. This is important because you could forget exactly what you did to fix or change the problem. This is especially true when someone interrupts you as you are about to test a configuration change. You can always revert the system to the faulty state if you backed it up as suggested earlier.
     
  6. Where possible, make permanent changes to the configuration settings. Temporary changes may be faster to implement but cause confusion when the system reboots after a power failure months or even years later and the fault occurs again. Nobody will remember what was done by whom.

Using ping as a Troubleshooting Tool

The ping utility sends ICMP echo request packets to the target host or hosts. Once ICMP echo responses are received, the message target is alive, where target is the hostname of the device receiving the ICMP echo requests, is displayed.

# ping problem.host.com
problem.host.com is alive

The -s option is useful when attempting to connect to a remote host that is down or not available. No output will be produced until an ICMP echo response is received from the target host. The -R option can be useful if the traceroute utility is not available.

Statistics are displayed when the ping -s command is terminated.

# ping -s problem.host.com

Another useful troubleshooting technique using ping is to send ICMP echo requests to the entire network by using the broadcast address as the target host. Using the -s option with the broadcast address provides good information about which systems are available on the network:

# ping -s 172.20.4.255

Using ifconfig as a Troubleshooting Tool

The ifconfig utility is useful when troubleshooting networking problems. You can use it to display an interface's current status including the settings for the following:

Be aware that there are two ifconfig commands. The two versions differ in how they use name services. The /sbin/ifconfig is called by the /etc/rc2 . d/S30sysid. net startup script. This version is not affected by the configuration of the /etc/nsswitch. conf file.

The /usr/sbin/ifconfig is called by the /etc/rc2 .d/S69inet and the /etc/rc2 . d/S72inetsvc startup scripts. This version of the ifconfig command is affected by the name service settings in the /etc/nsswitch. conf file.
Power user - Use the plumb switch when troubleshooting interfaces that have been manually added and configured. Often an interface will report that it is up and running yet a snoop session from another host shows that no traffic is flowing out of the suspect interface. Using the plumb switch resolves the misconfiguration problem.

Using arp as a Troubleshooting Tool

The arp utility can be useful when attempting to locate network problems relating to duplicate IP address usage. Determine the Ethernet address of the target host. You can do this by using the banner utility at the ok prompt, or the ifconfig utility at a shell prompt on a Sun system. Armed with the Ethernet address (also known as the MAC address) use the ping utility to determine if the target host can be reached.

Use the arp utility immediately after using the ping utility and verify that the arp table reflects the expected (correct) Ethernet address.

The following example demonstrates this technique.

Working from the system three, use the ping and arp utilities to determine if the system one is really responding to system three.

First, determine the Ethernet address of the host called one.


problem.host.com# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST, IPv4> mtu 1500 index 6 inet 128.50.2.1 netmask ffffff00 broadcast 128.50.2.255 ether 8:0:20:76:6:b
problem.host.com#


The ifconfig utility shows that the Ethernet address of the hme0 interface is 8:0:20:76:6:b. The first half of the address, 08:00:20 shows that the system is a Sun computer. The last half of the address, 76:06:0b is the unique part of the system's Ethernet address.

Search the Internet to determine the manufacturer of devices with unknown Ethernet addresses.

2. Use the ping utility to send ICMP echo requests from system three to system problem.host.com.


ping problem.host.com problem.host.com is alive

3. View the arp table to determine if the device that sent the ICMP echo response is the correct system, 76:06:0b.


three# arp -a
Net to Media Table: IPv4
Device IP Address Mask Flags Phys Addr ------ -------------------- --------------- ----- ---------------

08: 00 : 20 : 76: 06: 0b 08: 00 : 20: 8e : ee : 18 08: 00 : 20: 7a: 0b:b8 08:00:20:78:54:90 00: 60:97:7f:4f:dd 01: 00 : 5e: 00 : 00: 00

Output from the arp utility will appear to hang if name resolution fails because the arp utility attempts to resolve names. Use the netstat -pn utility to obtain similar output.

The table displayed in step 3 proved that the correct device responded. If the wrong system responded, it could have been quickly tracked down by using the Ethernet address. Once located, it can be configured with the correct IP address.

Many hubs and switches will report the Ethernet address of the attached device, making it easier to track down incorrectly configured devices.

The first half of the Ethernet address can also be used to refine the search. The previous example showed a device, presumably a personal computer, as it reported an Ethernet address of 00:60:97:7f:4f:dd. A quick search on the Internet reveals that the 00:60:97 vendor code is assigned to the 3COM corporation.

Using snoop as a Troubleshooting Tool

The snoop utility can be particularly useful when troubleshooting virtually any networking problems. The traces that are produced by the snoop utility can be most helpful when attempting remote troubleshooting because an end-user (with access to the root password) can capture a snoop trace and email it or send it using ftp to a network troubleshooter for remote diagnosis.

You can use the snoop utility to display packets on the fly or to write to a file. Writing to a file using the -o switch is preferable because each packet can be interrogated later.

problem.host.com# snoop -o tracefile
Using device /dev/le (promiscuous mode)

You can view the snoop file by using the -i switch and the filename in any of the standard modes, namely:

Verbose is most useful when you are troubleshooting routing, network booting, Trivial File Transport Protocol (TFTP), and any network-related problems that require diagnosis at the packet level. Each layer of the packet is clearly defined by the specific headers.

View the snoop output file in terse mode and locate a packet or range of packets of interest. Use the -p switch to view these packets. For example, if packet two is of interest, type:

problem.host.com# snoop -p2,2 -v -i tracefile

Using ndd as a Troubleshooting Tool

Use extreme caution when using the Solaris ndd utility because the system could be rendered inoperable if you set parameters incorrectly. Use an escaped question mark (\?) to determine which parameters a driver supports. For example, to determine which parameters the 100-Mbit Ethernet (hme) device supports, type:


# ndd /dev/hme \?
? (read only)
trans ceiver_inuse (read only)
link_status (read only)
link_speed (read only)

... ... lance_mode
ipg0
#
(read and write) (read and write)

Routing/IP Forwarding

Many systems configured as multi-homed hosts or firewalls may have IP forwarding disabled. A fast way to determine the state of IP forwarding is to use the ndd utility.

problem.host.com# ndd /dev/ip ip_forwarding 0

This example shows that the system is not forwarding IP packets between its interfaces. The value for ip_forwarding is 1 when the system is routing or forwarding IP packets.

Interface Speed

The hme (100-Mbit Ethernet) Ethernet card can operate at two speeds, 10 or 100 Mbits per second. You can use the ndd utility to quickly display the speed at which the interface is running.

ndd /dev/hme link_speed 1

A one (1) indicates that the interface is running at 100 MBits per second. A zero (0) indicates that the interface is running at 10 MBits per second.

Interface Mode

The hme interface can run in either full-duplex or half-duplex mode. Again, the ndd utility provides a fast way to determine the mode of the interface.

# ndd /dev/hme link_mode
 

One (1) indicates that the interface is running in full-duplex mode. A zero (0) indicates that the interface is running in half-duplex mode.

Using netstat as a Troubleshooting Tool

You can use the netstat utility to display the status of the system's network interfaces. Of particular interest when troubleshooting networks are the routing tables of all the systems in question. You can use the -r switch to display a system's routing tables.

# netstat -r

Although interesting, the displayed routing table is not of much use unless you are familiar with the name resolution services, be they the /etc/hosts, NIS, or NIS+ services. The problem is that it is difficult to concentrate on routing issues when any doubt can be cast on the name services. For example, someone could have modified the name service database, and the system msbravo may no longer be the IP address that you expected. Using the -n switch eliminates this uncertainty.

# netstat -rn

# ifconfig -a

This routing table is much easier to translate and troubleshoot, especially when combined with the information from the ifconfig -a utility.

lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
inet 127.0.0.1 netmask ff000000
 

hme0: flags=863<UP, BROADCAST, NOTRAILERS, RUNNING, MULTICAST> mtu 1500
 

inet 129.147.11.59 netmask ffffff00 broadcast 129.147.11.255 hme0: flags=863<UP, BROADCAST, NOTRAILERS, RUNNING, MULTICAST> mtu 1500
 

inet 172.20.4.110 netmask ffffff00 broadcast 172.20.4.255 #


The verbose mode switch, -v displays additional information, including the MTU size configured for the interface:

# netstat -rnv

Using traceroute as a Troubleshooting Tool

The traceroute utility is useful when you perform network troubleshooting. You can quickly determine if the expected route is being taken when communicating or attempting to communicate with a target network device. As with most network troubleshooting, it is useful to have a benchmark against which current traceroute output can be compared. The traceroute output can report network problems to other network troubleshooters. For example, you could say, "Our normal route to a host is from our router called router1-ISP to your routers called rtr-a1 to rtr-c4. Today, however, users are complaining that performance is very slow. Screen refreshes are taking more than 40 seconds when they normally take less than a second. The output from traceroute shows that the route to the host is from our router router1-ISP to your routers called rtr-a1, rtr-d4 rtr-x5, and then to rtr-c4. What is going on?"

The traceroute utility uses the IP TTL and tries to force ICMP TIME_EXCEEDED responses from all gateways and routers along the path to the target host. The traceroute utility also tries to force a PORT_UNREACHABLE message from the target host. The traceroute utility can also attempt to force an ICMP ECHO_REPLY message from the target host by using the -I (ICMP ECHO) option when issuing the traceroute command.
The traceroute utility will, by default, resolve IP addresses as shown in the following example:

# traceroute 172.20.4.110
traceroute to 172.20.4.110 (172.20.4.110), 30 hops max, 40 byte packets
1 129.147.11.253 (129.147.11.253) 1.037 ms 0.785 ms 0.702 ms
2 129.147.3.249 (129.147.3.249) 1.452 ms 1.569 ms 0.766 ms
3 * dungeon (129.147.11.59) 1.320 ms *

You can display IP addresses instead of hostnames by using the -n switch as shown in the following example. In this example, the hostname dungeon for IP address 129.147.11.59 on line 3 is no longer resolved.


# traceroute -n 172 .20.4. 110
traceroute to 172.20.4.110 (172.20.4.110), 30 hops max, 40 byte packets
1
129.147.11.253
0.954 ms
0.657 ms
0.695 ms
2
129.147.3.249
0.844 ms
0.745 ms
0.771 ms
3
129.147.11.59
0.534 ms *
0.640 ms

Common Network Problems

Following is a list of some common problems that occur:

The user statement, "My application does not work" is just a tip of an iceberg and the user often does not understand what exactly is not working by jumping to conclusions that can mislead you in troubleshooting. Never believe the user story.   You need ask the user very specific question to uncover the real story. Among questions to consider: 

Layers-based troubleshooting

When troubleshooting networks, some people prefer to think in layers, similar to the TCP/IP Model while others prefer to think in terms of functionality.

Using the TCP/IP Model layered approach, you could start at either the Physical or Application layer. Start at either end of the model and test, draw conclusions, move to the next layer and so on.

The Application Layer

A user complains that an application is not functioning. Assuming the application has everything that it needs, such as disk space, name servers, and the like, determine if the Application layer is functional by using another system.

Application layer programs often have diagnostic capabilities and may report that a remote system is not available. Use the snoop command to determine if the application program is receiving and sending the expected data.

The Transport Layer and the Internet Layer

These two layers can be bundled together for the purposes of troubleshooting. Determine if the systems can communicate with each other. Look for ICMP messages that can provide clues as to where the problem lies. Could this be a router or switching problem? Are the protocols (TFTP, BOOTP) being routed? Are you attempting to use protocols that cannot be routed? Are the hostnames being translated to the correct IP addresses? Are the correct netmask and broadcast addresses being used? Tests between the client and server can include
using ping, traceroute, arp, and snoop.

The Network Interface Layer

Use snoop to determine if the network interface is actually functioning. Use the arp command to determine if the arp cache has the expected Ethernet or MAC address. Fourth generation hubs and some switches can be configured to block certain MAC addresses.

When troubleshooting connectivity problems here are some useful questions:

The Physical Layer

Check that the link status LED is lit. Test it with a known working cable. The link LED will be lit even if the transmit line is damaged. Verify that a mdi-x connection or crossover cable is being used if connecting hub to hub.
 

Selected Troubleshooting Scenarios

Multi-Homed System Acts as Rogue Router

For example system A can use telnet to contact system B, but system B cannot use telnet to contact system A. Further questioning of the user revealed that this problem appeared shortly after a power failure.

For troubleshooting use the traceroute utility to show the route that network traffic takes from system B to system A. If the traceroute output reveals route that goes via additional system (let's call it system C) you have a rogue router problem.

Often that happens due to the fact that system C had been modified by an end-user. For example an additional interface was added, bit the user did not add  /etc/notrouter file to the system. In this case, after rebooting the system, it came up as a router and started advertising routes, which confuses the core routers and disrupts network traffic patterns.

Faulty Cable

For example users on network A could not reach hosts on network B even though routers R1 and E2 appeared to be functioning normally.

First you need to verify that the routers R1and R2 were configured correctly and that the interfaces are up.
They you need to verify that  systems A and B were up and configured correctly.

They you need to use the traceroute utility to discover the actual route from system A to system B.

For example the traceroute output shows that the attempted route from system A on network net-1 goes through router R1 as expected. But the trafficnever reaches router R2 though.

Investigate the router R2 log files. For example is they show that  the interface to network net-2 is flapping (going up and down at a very high rate) and corrupt routing tables you can suspect that the cable is a problem,

To solve this problem, replace the network net-2 cable to router R2. If it fixes the problem then it was faulty and causes intermittent connections.

Duplicate IP Address

Reported Problem:  Systems on network net-1 could not use ping past router R1 to a recently configured network, net-2.

You must be "root" or the sys to perform some of the other troubleshooting step in the previous examples. Suggested steps:

  1. Verify that the T1 link between the routers R1 and R2 is functioning properly.
  2. Verify that router R1 can use ping to contact router R2.
  3. Verify that system A can use ping to reach the close interface of router R2. System A cannot use ping on the far interface of router R2, though.
  4. Confirm that systems on network net-1 can use ping to reach router R1.
  5. Check that systems on network net-2 can use ping to reach router R2.
  6. Determine that the routers are configured correctly.
  7. Verify that the systems on network net-1 and network net-2 are configured correctly.
  8. Make sure the systems on network net-1 can communicate with each other.
  9. Verify that systems on network net-2 can communicate with each other.
  10. Log onto router R1 and use traceroute to display how the data is routed from router R1 to router R2.
    traceroute reported that the traffic from router
  11. R1 to router R2 was going out the network net-1 side interface of the router instead of the network net-2 side as expected. This indicates that the IP address for router R2 may also exist on network net-1.
  12. Check the Ethernet address of router R2; compare the actual address with the contents of router R1's arp cache. The arp cache revealed that the device was of a different manufacturer than expected.

To solve the problem track down the device on network net-1, system C, that has an illegal IP address (one that is the same as the network-net-1-side interface of router R2). This resulted in a routing loop as the routers had multiple best-case paths to take to the same location (which were actually in two different sites).

Correct the duplicate IP address problem on system C and make sure communications work as expected.

Duplicate MAC Address (Mostly Sun environment problem)

Reported Problem: After adding an additional Ethernet interface to your host, the system performance is very poor.

Troubleshooting (as user root):

Notice from the previous ifconfig output that all the interfaces have the same MAC address. Host C is on different subnet, so this is not a problem.  This would cause problems because packets that leave either qfe0 or qfe1 would not be guaranteed to receive a response since both interfaces are broadcasting themselves as the source for those packets.


Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index


Old News ;-)

[Mar 24, 2007] freshmeat.net Project details for Tcpreplay

Tcpreplay 3.0.beta13 released

Tcpreplay is a set of Unix tools which allows the editing and replaying of captured network traffic in pcap (tcpdump) format. It can be used to test a variety of passive and inline network devices, including IPS's, UTM's, routers, firewalls, and NIDS.

Release focus: Major bugfixes

Changes:
This release fixes some serious regression bugs that prevented tcprewrite from editing most packets on Intel and other little-endian systems. Some smaller bugfixes and tweaks to improve replay performance were made.

Author:
Aaron Turner [contact developer]

[Dec 28 2006] BigAdmin Submitted Article Network Troubleshooting Tips for the Solaris 9 OS by Ross Moffatt

  • Contents

    CHAPTER 11

    Follow these guidelines while troubleshooting an IP network:

    To troubleshoot an IP network

    1. Ping successfully. If you can ping successfully, you have verified IP communications between the network interface layer and the internet layer. The Ping command uses the Address Resolution Protocol (ARP) to resolve the IP address to a hardware address for each echo request and echo reply. 2. Establish a session with a host. If you can establish a session, you have verified TCP/IP session communications from the network interface layer through the application layer. Note   If you are unable to resolve a problem, you may need to use an IP analyzer (such as Microsoft Network Monitor) to view network activity at each layer.

    The first goal in troubleshooting is to make sure you can successfully ping an IP address. Ping a host with its host name only after you can successfully ping the host with its IP address.

    To troubleshoot the network interface and internet layers by using the Ping command

    1. Ping the loopback address to verify that TCP/IP was installed and loaded correctly. If this step is unsuccessful, verify that the system was restarted after TCP/IP was installed and configured. 2. Ping your IP address to verify that it was configured correctly. If this step is unsuccessful, view the configuration by using the Network application in the Windows NT Control Panel to verify that the address was entered correctly, and verify that the IP address is valid and that it follows addressing guidelines. 3. Ping the IP address of the default gateway to verify that the gateway is functioning and configured correctly. If this step is unsuccessful, verify that you are using the correct IP address and subnet mask. 4. Ping the IP address of a remote host to verify the connection to the wide area network. If this step is unsuccessful:
  • Make sure that IP routing is enabled.

     

  • Verify that the IP address of the default gateway is correct.

     

  • Make sure that the remote host is functional.

     

  • Verify that the link between routers is operational.

     

  • After you can successfully ping the IP address, ping the host name to verify that the name is configured correctly in the HOSTS file.

    Verifying TCP/IP Session Communications

    The next goal in troubleshooting is to successfully establish a session. Use one of the following methods to verify communications between the network interface layer and the application layer.

    To establish a session with a Windows NT–based computer or other RFC-compliant NetBIOS-based host, make a connect with the Net use or Net view command. If this step is unsuccessful:


    To establish a session with a non-RFC-compliant NetBIOS-based host, use the Telnet or FTP utility to make a connection. If this step is unsuccessful: A home directory already exists for this service. Creating a new home directory will cause the existing directory to no longer be a home directory. An alias will be created for the existing home directory. This message is a warning only. It appears when the new home directory you are trying to add already exists. The maximum number of home directories allowed is one per virtual root.

    Invalid Server Name

    While trying to connect to a server, you typed an invalid server name. Try to connect again and make sure you type the name correctly.

    More than 1 home directory was found. An automatic alias will be generated instead.

    When getting the directory entries from the server, Internet Service Manager has determined that a duplicate exists. This duplicate may have been added by using the Registry Editor or in some other way.

    No administerable services found.

    While trying to connect to a server, you typed the name of a server that has no installed services that Internet Service Manager can administer. That is, WWW, FTP, and gopher services have not been installed on the computer you connected to.

    The alias you have given is invalid for a non-home directory.
     

    You’re trying to assign the alias ‘/’ to a non-home directory. This alias automatically means home.

    The connection attempt failed because there’s a version conflict between the server and client software.

    This message is an RPC error message. The RPC interface does not match what is expected. This should happen only if you are running a beta admininstration tool or server. The official error is RPC_S_UNKNOWN_IF.

    The service configuration DLL ‘filename’ failed to load correctly.

    The named service configuration DLL (for example, W3scfg.dll) failed to load. The DLL or one if its dependencies could be missing or corrupted. Generally this is a setup problem. Run the Setup program and select Remove All, then reinstall Microsoft Internet Information Server.

    Unable to connect to target machine.

    This message is an RPC error message that appears while executing an API. The computer could be offline. The system error was EPT_S_NOT_REGISTERED or RPC_S_SERVER_UNAVAILABLE.

    Unable to create directory.

    The directory name or path you typed in in the New Directory Name box cannot be created. It could be an invalid path, or a file may already exist that has this name.
  • [Dec 1, 2006] Troubleshooting TCP-IP Communication Issues by Neil Cashell

    15 May 2000 (support.novell.com) This document addresses communication issues that generate about a third of the support calls coming into the TCP/IP group at Novell Technical Support. We recommend that anyone who is implementing TCP/IP in a NetWare 5.x environment read and understand the information presented here.

    This article is divided into two parts: understanding the concepts behind IP routing, and troubleshooting common TCP/IP problems. A follow-up article will explain some of the TCP/IP tools that are available for use in troubleshooting problems in a TCP/IP environment.

    Concepts Behind TCP/IP Routing

    The majority of connectivity issues involve problems with routing table entries. Every packet being processed by a TCP/IP host has a source and destination IP address. Upon receiving each packet, the IP protocol examines the destination address of the packet, compares it with entries in its local routing table, and then decides what action to take:

    1. If the destination IP address is itself (that is, to a local application such as GroupWise, BorderManager Proxy Server, etc.), the packet is passed up to a protocol layer above IP.
    2. If the packet is destined for another known network, the packet is forwarded through one of the locally-attached network adapters. (This assumes that the TCP/IP host has multiple interfaces and has routing enabled.)
    3. If neither of the above apply, the packet is discarded.

    The TCP/IP routing table can maintain four different types of routes, listed below in the order that they are searched for a match:

    IP compares the destination IP address of the packet that it is processing with the entries in the table. If IP finds that a host entry exists and matches the destination IP address, it will forward the packet to the next hop associated with that host entry. Host entries are usually found in routing tables when ICMP (Internet Control Message Protocol) has added the entry because of the pathMTU algorithm, or from an "ICMP redirect" call. To check this, load the TCPCON utility at the server console prompt and look at the IP Routing Table option to verify if the protocol associated with that route is ICMP.

    IP has three classes of addresses: Class A, Class B and Class C. Each class contains a default subnet mask (for instance, Class A has 255.0.0.0. as a default subnet) until a class of addresses is broken into extra networks (i.e., subnetted). However, once the network is subnetted, the IP address will not have the default subnet mask.

    So if IP doesn't find a host entry, but does find a subnet entry that matches the packet's destination IP address, IP will forward the packet to the next hop associated with that subnet entry. Subnet entries exist when RIP2 (Routing Internet Protocol v2), OSPF (Open Shortest Path First), or static entries have been added to the routing table through a non-default subnet mask.

    If IP doesn't find a subnet entry in the TCP/IP routing table but does find a network entry that matches the destination IP address, IP will forward the packet to the next hop associated with that network entry. (Customers running in default NetWare TCP/IP mode will have network entries.)

    Finally, if IP doesn't find a network entry, but does find that a default route entry exists, IP will forward the packet to the next hop associated with that default entry. The default route is most commonly inserted as a static route through NetWare's server console INETCFG utility. However, the route may also be learned via RIP or OSPF. Failure to at least have a default route can often lead to communication problems on the network.

    If an IP packet match has not been found in the TCP/IP routing table at this stage, the packet is simply dropped and an ICMP "destination unreachable" message is triggered to notify the sender that the host or network is unreachable.

    When a TCP/IP communication problem occurs, the most common reason is that a route entry doesn't exist for the network or host with which you are trying to communicate. When this is the case, you can either add a route entry or try to figure out why the route is missing.

    Troubleshooting Common TCP/IP Problems

    When troubleshooting any networking problem, it is helpful to take a logical approach. Some questions to ask are:

    Troubleshooting a problem "from the bottom up" is often a good way to quickly isolate what's wrong and come up with a solution. The "bottom up" approach from an IP routing perspective is to start by verifying that the problem is not related to the physical layer (cabling, hubs, switches, and so on) or ARP (Address Resolution Protocol). Next, you ensure that the IP routing table is functioning correctly. Finally, you check to see whether the problem is at a generic TCP/UDP or application level.

    A Problem-Solving Pattern

    A pattern is just that: It is not a firm set of rules—it's a set of guidelines. If you follow a troubleshooting method consistently, it will help you to find solutions more easily. You will be able to zero in on the root cause of the issue and quickly resolve it. One nice thing about this pattern is that it is neither Linux- nor TCP/IP-specific. You can apply it to a variety of problems—I make no promises about in-law problems, though.

    To try to set this pattern into context, each step of the pattern is described in its own section. Nine steps are involved in the pattern, as shown in Figure 1.

    Figure 1 A nine-step problem-solving pattern

    Step 1: Clearly Describe the Symptoms

    There's no good way to attack a problem until you know what the problem really is. Far too often, system and network administrators hear a rather poor (if not outright misleading) description of the problem. It's then your job to dig in and find out what's really going on.

    As you can probably guess, you'll need some interviewing skills to get a clear description of the symptoms from a user. People don't want to hide the truth from you, but they often have predetermined the problem, coloring their perception of the issues involved.

    It's a good idea to take notes as you're talking with someone, periodically summarizing the problem description as you go. This can help you spot follow-up questions to ask the user. It can also help jog a user's memory for other tidbits.

    Never hesitate to call or email the user back with further questions to clarify the situation. It is certainly better to get all the answers you need up-front, but the reality is that you might not know all the questions that you need to ask until you've gotten your hands dirty working on the problem. If you need more detail, go get it.

    Holding your interview at the customer's location also gives you a chance to say, "Show me." This enables you to see what the user is doing and perhaps to identify some more key points about the problem. Sometimes it will also reveal the problem as one of those transient things that just won't show up when you're there to see it.

    If you run into a problem that you can't reproduce, you have yet another problem on your hands what to do about it. The best thing is often to set up a monitoring plan with the user. Get all the details that you can, and tell the user to call you back when the problem recurs. Leave the user with a list of questions to try to answer when calling you back. On your end, you should maintain a log so that you can track details about the problem.

    There is no good rule to determine when a problem is clearly stated. This is fairly objective. If you think it's clear enough, it probably is. If you're not sure, try to describe the problem to someone else. (It really doesn't matter whether that person understands networking. In fact you could try explaining it to a house plant it's the process of talking through the problem while describing the symptoms that helps clarify things for you.)

    As you're talking with people about the problem, see if there are other hosts with the same symptoms. If people haven't seen this problem, ask them to try to reproduce it. If there isn't anyone else available, try to reproduce it yourself. Knowing whether this problem affects a single host, a local group of hosts, or all the hosts on a network will help you when you hit Step 2.

    Some key questions that you should know the answers to are listed here:

    What applications or protocols are affected?

    Step 2: Understand the Environment

    When you have a clear description of the symptoms, you must be able to understand the environment that the problem occurs in to effectively troubleshoot it. Gaining this understanding is really a twofold job: It requires both identifying the pieces involved in the problem and understanding how those pieces should act when they are not experiencing the problem.

    The first task typically means creating a subset of your network map, showing the portions of the network that are involved in the problem. Sometimes this new map will be a logical map, and sometimes you will want to draw it out.

    The second task, understanding how things should be behaving, is made much easier if you look at a snapshot of how your network acted before the problem occurred. These snapshots are called baselines and are covered in more detail in Chapter 7 of Networking Linux. In the absence of a baseline, you will need to create a model of the proper behavior of the network from your understanding of its layout, components, and configuration.

    Step 3: List Hypotheses

    Having made a list of the affected systems (in Step 2), we can begin to list potential causes of the problem. It's safe to brainstorm at this stage because we will be narrowing our search later. In fact, it is better to be overly creative here and end up with extra hypotheses than to miss the actual cause and chase blind leads.

    Just like the maps of the problem environment, your list of hypotheses doesn't need to be anything formal. A mental list is normally fine; something scrawled on a piece of scratch paper is even better. Sometimes, though, you'll want a formal document; big network issues affecting lots of people just cry out for formal documents (well, at least the managers involved cry a lot).

    Step 4: Prioritize Hypotheses and Narrow Focus

    This is the step where we stop making work for ourselves and start making our jobs easier. Although we've just made a list of things that could be the problem, we don't want to research every item on the list if we don't have to. Instead, we can prioritize the potential causes and chase down the most likely ones first. Eventually, we'll either solve the problem or run out of possible causes (in which case we need to go back to Step 3).

    As you're prioritizing your list, pay particular attention to recent changes. These are often the source of your problems. Changes meant to improve the environment often have unintended consequences.

    Step 5: Create a Plan of Attack

    Now that you've identified the most likely causes of the problem, it's time to disprove each of the possible causes in turn. As each of the potential causes is eliminated, you narrow your search further. Eventually you will reach a problem that you can't disprove, and your most recent attempt will have corrected the problem.

    One thing you don't want to do is make changes in many areas at once. Making one change at a time, working on only one component per change, ensures that you'll be able to identify the modifications that actually fixed the problem.

    You don't need a hard and fast plan for the follow-up steps to take if a test doesn't solve or identify the problem. However, you should at least think about where you're going to go next. Your prioritized list will be of great help as you make plans for the future. Don't be too surprised if your plans take a slight detour, though crystal balls are notoriously vague.

    A final step in preparing your plan is to review it with those holding a stake in solving the problem. This probably includes management, the customer suffering the problem, and anyone working with you in troubleshooting.

    Step 6: Act on Your Plan

    With a plan in place and reviewed by those with a stake in solving the problem you're prepared to act.

    While you're acting on the plan, take good notes and make sure that you keep copies of configuration files that you're changing. Nothing is worse than finishing off a series of tests, finding that they didn't solve the problem, and then discovering that you introduced a new problem and can't easily back out your changes. It can also be disheartening to have insufficient or misleading information to report at the conclusion of your test.

    Step 7: Test Results

    You'll never know whether your test has done anything without checking to see if the problem still exists. You'll also never know whether you've introduced new problems with your changes if you don't test. Testing gives you confidence that all is as it should be.

    I recommend that you make it a practice to keep a suite of tests that exercise the main functionality of your network. Each time you run into a problem, add a test or two to check for it as well. Given a suite like this and a system to run all the tests, you can feel confident that your network is solid at the end of the day.

    Step 8: Apply Results of Testing to Hypotheses

    This is the pay-off step. If your testing has isolated and solved the problem, you're almost done. All that remains is to make the changes introduced in your test a permanent part of the network. If you haven't solved the problem yet, this is where you sit down with your results and your list of hypotheses to see what you've learned.

    If the most recent test solved your problem, this step is unnecessary. You've found the problem and (hopefully) corrected it. If your efforts haven't solved the problem (or if you've created a new one), you need to look at how the data from this test affects your prioritized list of possible causes. Does your prioritization need to change? Are more possibilities pointed out by this test? If the test didn't identify and solve your problem, did it eliminate this possible cause? If not, what further tests are needed to make sure that this possible cause isn't the root of your problem?

    Step 9: Iterate as Needed

    Most often, you won't need to go all the way back to Step 1 or 2. Instead, you'll be able to go back to Step 4 to reprioritize and refocus. You might find that the things you learned in your most recent test point you in a slightly different direction. It is also possible that you will find another possibility in this case, you can jump back to Step 3 and add it to your list.

    If you've completely run out of possible causes or found additional information, you might even want to go all the way back to Step 1 and restate the problem just to make sure that you've not missed the mark completely.

    This article is excerpted from Networking Linux: A Practical Guide to TCP/IP by Pat Eyler (New Riders Publishing, 2000, ISBN 0735710317). Refer to Chapter 6 of this book for more detailed information on the material covered in this article.

    [Nov 3, 2004] Keep an Eye on Your Linux Systems with Netstat By Carla Schroder Using Netstat For Surveillance And Troubleshooting by Carla Schroder

    Two of the fundamental aspects of Linux system security and troubleshooting are knowing what services are running, and what connections and services are available. We're all familiar with ps for viewing active services. netstat goes a couple of steps further, and displays all available connections, services, and their status. It shows one type of service that ps does not: services run from inetd or xinetd, because inetd/xinetd start them up on demand. If the service is available but not active, such as telnet, all you see in ps is either inetd or xinetd:

    $ ps ax | grep -E 'telnet|inetd'
    520 ?      Ss      0:00 /usr/sbin/inetd

    But netstat shows telnet sitting idly, waiting for a connection:

    $ netstat --inet -a | grep telnet
    tcp      0      0     *:telnet      *:*      LISTEN

    This netstat invocation shows all activity:

    $ netstat -a
    Active Internet connections (servers and established)
    Proto Recv-Q Send-Q Local Address Foreign Address State
    tcp      0      0     *:telnet      *:*      LISTEN
    tcp      0      0     *:ipp      *:*      LISTEN
    tcp      0      0     *:smtp      *:*      LISTEN
    tcp      0      0     192.168.1.5:32851      nest.anthill.echid:ircd     ESTABLISHED
    udp      0      0     *:ipp      *:*
    Active UNIX domain sockets (servers and established)
    Proto RefCnt Flags Type State I-Node Path
    unix 2 [ ACC ] STREAM LISTENING 1065 /tmp/ksocket-carla/klaunchertDCh2b.slave-socket
    unix 2 [ ACC ] STREAM LISTENING 1002 /tmp/ssh-OoMGfFm666/agent.666
    unix 2 [ ACC ] STREAM LISTENING 819 private/smtp

    Your total output will probably run to a couple hundred lines. (A fun and quick way to count lines of output is netstat -a | wc -l.) You can ignore everything under "Active UNIX domain sockets." Those are local inter-process communications, not network connections. To avoid displaying them at all, do this:

    $ netstat --inet -a

    This will display only network connections, both listening and established. Already netstat has earned its keep- both the telnet and smtp services are running. This is bad, because I don't want to have either a telnet or smtp server running on this machine. So now I know I need to turn them off, and re-configure my startup files so they won't start at boot.

    How do you know what services you want running? That is a mondo subject for another day, and an important one. For example, if your system has been compromised, this is one place to find evidence of a Trojan horse or other malware phoning home. In this example, ipp is Internet Printing Protocol, which belongs to CUPS (Common Unix Printing System.) If you want your printer to work, this needs to be here. The connection on 192.168.1.5:32851 is my active IRC (Internet Relay Chat) connection. Refer to your /etc/services file to learn more about TCP and UDP ports, and the services assigned to them.

    What It Means

    "Proto" is short for protocol, which is either TCP or UDP. "Recv-Q" and "Send-Q" mean receiving queue and sending queue. These should always be zero; if they're not you might have a problem. Packets should not be piling up in either queue, except briefly, as this example shows:

    tcp      0      593   192.168.1.5:34321 venus.euao.com:smtp      ESTABLISHED

    That happened when I hit the "check mail" button in KMail; a brief queuing of outgoing packets is normal behavior. If the receiving queue is consistently jamming up, you might be experiencing a denial-of-service attack. If the sending queue does not clear quickly, you might have an application that is sending them out too fast, or the receiver cannot accept them quickly enough.

    "Local address" is either your IP and port number, or IP and the name of a service. "Foreign address" is the hostname and service you are connected to. The asterisk is a placeholder for IP addresses, which of course cannot be known until a remote host connects. "State" is the current status of the connection. Any TCP state can be displayed here, but these three are the ones you want to see:

        LISTEN- waiting to receive a connection
        ESTABLISHED- a connection is active
        TIME_WAIT- a recently terminated connection; this should last only a minute or two, then change back to LISTEN. The socket pair cannot be re-used as long the TIME_WAIT state persists.

    UDP is stateless, so the "State" column is always blank.

    A socket pair is both sides of a TCP/IP connection, like this example for a locally-attached printer:

    localhost:ipp      localhost:34493      ESTABLISHED

    Or a telnet connection to a remote server:

    192.168.1.5:34437      65.106.57.106.pt:telnet    ESTABLISHED

    A socket is any hostname-port combination, or IP address-port.

    Continuous Capture

    Because all these things change often, how do you capture the changes? Run netstat continuously with the -c flag and record the output:

    $ netstat --inet -a -c > netstat.txt

    Then check email, start and stop services, surf the web, log in to a telnet BBS and play Legend of the Red Dragon; then review your capture file to see what it all looks like.

    Borken DNS

    If netstat is taking too long, or not resolving a hostname at all, give it the -n flag to turn off DNS lookups:

    $ netstat --inet -an

    Checking Interfaces

    netstat can help diagnose NIC problems. Use the -i flag when you're troubleshooting a flakey connection, and you suspect your NIC:

    $ netstat -i
    Kernel Interface table
    Iface   MTU Met   RX-OK RX-ERR RX-DRP RX-OVR   TX-OK TX-ERR TX-DRP TX-OVR Flg
    eth0    1500 0     28698      0      0      0   33742      0      0      0 BMRU
    lo      16436 0           14      0      0      0         14      0      0      0 LRU
    
    You should see large numbers in the RX-OK (received OK) and TX-OK (transmitted OK) columns, and very low numbers in all the others. If you are seeing a lot of RX-ERRs or TX-ERRs, suspect the NIC or the patch cable. This is what the flags mean:

    B = broadcast address
    L = loopback device
    M = promicuous mode
    R = interface is running
    U = interface is up

    Resources

    Linux Network Administrator's Guide, by Olaf Kirch & Terry Dawson

    HTTP Performance Overview -- contains several links to NET tools

    CAIDA Measurement Tool Taxonomy

    tcptrace


    Recommended Links


    In case of broken links please try to use Google search. If you find the page please notify us about new location
    Google     

    Tutorials

    IBM Redbooks TCP-IP Tutorial and Technical Overview

    lessons_learned ver1.4

    Implementing, Managing, and Troubleshooting Network Protocols and Services Troubleshooting TCP-IP Connections

    Recommended Papers

    InformIt/What is a Baseline? By Pat Eyler. Article is provided courtesy of New Riders.

    Sometimes when you talk to a seasoned system or network administrator, he'll tell you that he knows that something is wrong when things don't feel right. This isn't an admission of paranormal powers; it's just a shorthand method for explaining that these experts know how their system or network is supposed to behave and that it isn't acting like that now. These administrators have created a baseline for their environment. Not all of them have done it formally, but the ones who have will have gained significant added benefits.

    A Baseline Defined

    Several things make up a baseline, but at its heart, a baseline is merely a snapshot of your network the way it normally acts. The least effective form of a baseline is the "sixth sense" that you develop when you've been around something for a while. It seems to work because you to notice aberrations subconsciously because you're used to the way things ought to be. Better baselines will be less informal and may include the following components:

    Network traces

    Network Traces

    In Chapter 10, "Network Monitoring Tools" we discussed the ethereal network analyzer. This tool's capability to save capture files (or traces) enables you to maintain a history of your network. If the only traces you have saved represent your troubleshooting efforts, you won't have a very good picture of your network.

    You also need to be aware that a lot of things will influence the contents of the traces you collect. Weekend vs. weekday; Monday or Friday vs. the rest of the week; and time of day are all examples of the kinds of factors that will affect your data. Running ethereal (or some other analyzer) at least three times a day, every day, and saving the capture file will give you a much clearer idea of how things normally work.

    Utilization Data

    Several tools can give you a quick look at your network's behavior: netstat, traceroute, ping, and even the contents of your system logs are all good sources of information.

    The netstat tool can show you several important bits of information. Running it with the -M, -i, and -a switches are especially helpful. I typically add the -n switch to netstat as well this switch turns off name resolution, which is a real boon if DNS is broken or IP addresses don't resolve back to names properly. The -i switch gives you interface specific information:

    [pate@cherry sgml]$ netstat -i

    Kernel Interface table

    Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg

    eth0 1500 0 0 0 0 0 39 0 0 0 BRU

    lo 3924 0 36 0 0 0 36 0 0 0 LRU

    [pate@cherry sgml]$

    The -M switch gives information pertaining to masqueraded connections:

    [pate@router pate]$ netstat -Mn

    IP masquerading entries

    prot expire source destination ports

    tcp 59:59.96 192.168.1.10 64.28.67.48 1028 -> 80 (61002)

    tcp 58:43.75 192.168.1.10 206.66.240.72 622 -> 22 (61001)

    udp 16:37.72 192.168.1.10 209.244.0.3 1025 -> 53 (61000)

    [pate@router pate]$

    The -a switch gives connection-oriented output (this output has been abbreviated):

    [pate@cherry pate]$ netstat -an

    Active Internet connections (servers and established)

    Proto Recv-Q Send-Q Local Address Foreign Address State

    tcp 0 0 0.0.0.0:6000 0.0.0.0:* LISTEN

    tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN

    tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN

    udp 0 0 0.0.0.0:111 0.0.0.0:*

    raw 0 0 0.0.0.0:1 0.0.0.0:* 7

    raw 0 0 0.0.0.0:6 0.0.0.0:* 7

    Active UNIX domain sockets (servers and established)

    Proto RefCnt Flags Type State I-Node Path

    unix 1 [ ] STREAM CONNECTED 1332 /tmp/.X11-unix/X0

    unix 1 [ ] STREAM CONNECTED 1330 /tmp/.X11-unix/X0

    unix 0 [ ] DGRAM 440

    [pate@cherry pate]$

    The traceroute tool is especially important for servers that handle connections from disparate parts of the Internet. Setting up several traceroutes to different remote hosts can give you an indication of remote users connection speeds to your server.

    The ping tool can help you watch the performance of a local or remote network in much the same way that traceroute does. It does not give as much detail, but it requires less overhead.

    When users connect to services on your hosts, they leave a trail through your log files. If you use a central logging host and a log reader to grab important entries, you can build a history of how often services are used and when they are most heavily utilized.

    Work/Problem Logs

    You will likely find yourself touching a lot of the equipment on your network, so it is important that you keep good records of what you do. Even seemingly blind trails in troubleshooting may lead you to discover information about your network. In addition, you will find that your documentation will be an invaluable aid the next time you need to troubleshoot a similar problem.

    Some people like to carry around a paper notebook to keep their records in; others prefer to keep things online. Both camps have good points, many related to information access. If you keep everyn't have it handy, it es you no good. Similarly, if everything is online and the network is down, you're in bad shape.

    My preference is to keep things online, but in a cvs repository. Then you can keep it on a central server or two while also keeping a copy on your laptop/PC/palmtop. If you like, you can even grab printouts. A nice benefit to this is that several people can make updates to documentation and then commit their changes back to the cvs repository when they've finished.

    I won't get into the Web vs. flatfile vs. database vs. XML vs. whatever conflict. They all have benefits. Choose the right option for your organization, and stick to it. The important bit is that you have the data, right?

    Network Maps

    A roundly ignored set of baseline information is the network map. If you have more than two systems in your network and don't have a map, set down this book for 20 minutes and sketch something out. It doesn't have to be pretty, just reasonably accurate. Are you back? Good. Now that you have a map showing what is where, we can get back to work.

    Most people want to deal with two kinds of maps. The first is a topological/physical map, which shows what equipment is where and how it is connected. The second is a logical map. This shows what services are provided and what user communities are supported by which servers. If you can combine these two maps, so much the better; color coding, numeric coding, and outlined boxes are all mechanisms that can help with this. A sample map is shown in Figure 1.

    Figure 1 A sample network map

    Like the information discussd that you keep your maps online and in a couple of places. (cvs can be a good solution here as well.) Nicely done maps also look good on your wall, not to mention that this is a convenient place to find them when a problem breaks out and you need to start troubleshooting.

    Equipment Records

    You should also have accurate records of the hardware and software in your network. At a minimum, you should have a hardware listing of each box on the network, a list of system and application levels (showing currently installed versions and patches), and configurations of the same. If you keep this in cvs, you'll also have a nice mechanism for looking at your history.

    If you decide to keep these records, it is vital that they be kept up-to-date. Every time you make a change, you should edit the appropriate file and commit it to cvs. If you fall behind, you'll miss something, and then you'll really be stuck.

    This article is excerpted from Networking Linux: A Practical Guide to TCP/IP by Pat Eyler (New Riders Publishing, 2000, ISBN 0735710317).



    Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    Disclaimer:

    Last modified: August 09, 2009