Essential Tools for Good Network Management
Author: Eduardo Barasal Morales – Coordinator of the autonomous systems training area at CEPTRO.br/NIC.br
Maintaining an operational and stable network is essential for maintaining the services that use it. This means that network administrators need to permanently monitor and pay attention to the proper performance of their network, mainly because changes in communications occur naturally on the Internet. One minute everything is stable, and packets are following the correct path; the next, a simple change in route propagation occurs and packets start following other paths. Sometimes these changes have no impact, but in other cases they can be very harmful to the institutions that manage their networks.
Knowing these tools and how to use them is of great help for the daily work of network administrators and the proper operation of their networks. There are many tools available on the market, many of which are free and open source. They allow a better understanding of how each network is part of the Internet (interactions between the different Autonomous Systems), checking which networks are connected to a given Internet Exchange Point (IXP), examining how their BGP announcements are reaching the Internet, detecting potential IP address hijacks that may be happening at a given moment, analyzing delays in network communication, and inferring additional insights.
This article will present a small set of simple, free, and often unknown tools that are available to help network administrators solve and identify some of the most common communication problems in their networks.
To do so, we will show some problem situations and then exemplify how these tools can help.
Problem Caused by Third Parties
Network administrators often encounter problems on their networks that are caused by third parties. Sometimes these are intentional, such as a denial-of-service attack which floods the targeted links and overloads the systems, or a prefix hijack that causes the loss of communication (Figure 1).
Other times, they are caused by configuration errors, such as a route leak that routes packets through unintended paths and can slow down communication (Figure 2).
Identifying these problems is not an easy task for network administrators, especially as they are often external and typically last for a short time. This is why external tools that store information over time help network administrators understand and provide data so they can contact the third parties involved and solve the problems.
Hurricane Electric BGP Toolkit – https://bgp.he.net/
Hurricane Electric BGP Toolkit is a set of web tools that network administrators can use to extract relevant information about their own autonomous system and other autonomous systems across the Internet. It allows querying Whois (a service used for querying information on the registered assignees of Internet resources), the Internet Routing Registry (IRR, a routing information database), viewing graphs showing the connectivity between autonomous systems, graphs and prefix announcements, number of peers; Resource Public Key Infrastructure (RPKI), a tool for preventing prefix hijacks), IXPs and much more.
The toolkit is intuitive and easy to use and does not require registration. To use it, a network administrator must go to the website and search for an IP address, a prefix, an ASN or a domain name. Included below are some images that show the information that can be explored with this tool.
This first image (Figure 3) shows the information related to ASN (22548), its originated and announced prefixes (IPv4 and IPv6), whether everything is OK with its use of RPKI (IPv4 and IPv6), observed BGP peers (IPv4 and IPv6), the number of IXPs to which it is connected, and other data. This information is extremely important for network administrators because, in addition to allowing them to check the data for other Autonomous Systems, it allows them a macro view of how others are viewing their announcements.
This second image (Figure 4), a continuation of the screen shown in Figure 3, shows peer count charts as well as originated and announced prefixes (IPv4 and IPv6) over time, among other data.
The third image (Figure 5) shows the connectivity of IPv4 routes propagated between nearby Autonomous Systems. The fourth image (Figure 6) shows the same graph, in this case for IPv6 routes. The two images above allow us to analyze an interesting situation: route propagation paths for IPv4 and IPv6 routes are not always the same. One of the reasons for this is that not all transit providers (upstreams) have deployed IPv6 in their networks as they have IPv4. While according to Google measurements IPv6 traffic already exceeds 40%, many Autonomous Systems still do not use IPv6, which causes asymmetries in the paths when compared to IPv4.
The fifth image (Figure 7) shows the Whois information. It presents the prefixes assigned to the AS, the name of the company and the person responsible for the AS, its routing policies (if any), and the point of contact.
This last image (Figure 8) contains information related to IRR entries. Here we can see the routing policies specified by the AS.
While the tool provides additional elements, we have only highlighted a few to exemplify the importance of the application. However, we suggest that network administrators take a close look at the entire set of data, as each piece of information can provide them with knowledge that can help them solve various problems.
BGPlay – https://bgplay.massimocandela.com/
To use this tool, all that a network administrator needs to do is go to the application and select the parameter and the time interval they wish to study. The tool will then generate a video that intuitively shows the evolution of routing between Autonomous Systems over time. A screenshot of BGPlay has been included below (Figure 9) to help readers better understand how the application works.
As a curiosity, some famous cases of prefix hijacking were recorded by this tool and made available on YouTube for later reference, such as the hijacking of the prefix of YouTube itself by Pakistan Telecom in 2008, which is available at https://youtu.be/IzLPKuAOe50.
Problems in Third-Party Networks
It is common for network administrators to answer requests from customers complaining that they are unable to access a certain website or web service on another network. In this case, the network administrator asks himself whether this is the customer’s problem, a problem with their network, or a problem with the service they wish to access, as shown in Figure 10.
One of the ways to determine the cause of this problem is using “Downdetector” and “Down for Everyone or Just Me.”
Downdetector – https://downdetector.com/
Downdetector collects status reports from a series of sources, including Twitter and reports submitted on their websites and mobile apps, validates, and analyzes these reports in real-time to detect service disruptions. To ensure the reliability of the data, Downdetector only reports an outage when multiple incidents are identified over the same period of time.
Thus, a network administrator simply needs to check whether the service their client is trying to access is experiencing an outage which is affecting other users. If so, the issue is with the content that the customer is trying to access, not with their network.
The following images (Figures 11, 12 and 13) show how the tool works using a search for Pokémon Go service as an example:
Figure 11 shows a graph of the number of incidents reported in a 24-hour period. It is important to note that Downdetector calculates a baseline (dashed line) and only reports an outage when the number of reports is significantly higher than the baseline for the time of day.
Figure 12 shows the problems most commonly identified by Downdetector related to the service in question (in this case, Pokémon Go) at the time. This information may even be shared with users to help them understand what is happening with the service.
Figure 13 shows that the tool itself allows reporting problems with a particular service (in this case Pokémon Go) to contribute to the analyses.
Down for Everyone or Just Me – Website https://downforeveryoneorjustme.com/
This tool allows checking whether a website is inaccessible for everyone or just you. By entering a domain name, the network administrator triggers a series of tests on multiple servers spread across different networks. If these servers are experiencing difficulties accessing this website, the tool concludes that the problem is on the server that hosts the website and is not the user’s problem, as seen in Figure 14 below:
On the contrary, if the servers can access the website, the conclusion is that the issue is the user’s problem, as shown in Figure 15 below.
It is also worth noting that this tool works for both IPv6 and IPv4, prioritizing IPv6 if the queried domain has AAAA and A records. In the future, they intend to individualize the test and allow network administrators to select the protocol they wish to check (this information is not available on the website but was provided to us via email by their support staff).
This article presented four free and easy-to-use tools. However, many other tools are available that are very important for network administrators, including PeeringDB, Looking Glass, BGPmon, RIPEstat, BGP Alerter, and many others.
For more information on this subject, I recommend two presentations available on YouTube. The first is a complete tutorial prepared for NIC.br Training Week available at https://youtu.be/W-02c1g9Ltk (in Portuguese); the second is a short talk given during LACNIC 37 which is available at https://youtu.be/ud3v-a_kRaY (in Portuguese), https://youtu.be/OIZaWCCbmuM (in Spanish) and https://youtu.be/xa9isOWD5ks (in English).