Title: IPv6 Fix Author(s): CHO Kenjiro, JINMEI Tatuya, KUNITAKE Koichi, SHIMOJO Toshio, TAKEUCHI Sohgo, YAMAMOTO Kazuhiko Created: 2005/02/07 Modified: 2005/03/11 Abstract Though most of the spec, implementations and operations of IPv6 are well, a little flaws might give bad impression to users. For instance, if a user starts using IPv6 and he feels that IPv6 is much less comfortable than IPv4 due to the flaws, he would step back to IPv4. Such things actually happens. The WIDE project thus started an activity to fix the spec flaws, implementation flaws and operation flaws of IPv6. The name of the working group to handle this is IPv6 Fix (v6fix). The aims of the working group are as follows: - If the spec has flaws, we will fix them. - If implementations have flaws, we will fix some of them if possible, and we will inform the others to vendors. - If operations have flaws, we will inform them to operators. Also, we establish a system so that we can understand status of the flaws, fixed or not yet. - We will write documents which describes things up above both in Japanese and English, then make them open. Table of Contents 1. Background and overview YAMAMOTO Kazuhiko 2. Harmful Effects of the "On-link Assumption" KUNITAKE Koichi 3. DNS servers and resolvers JINMEI Tatuya, TAKEUCHI Sohgo 4. TCP connection establishment CHO Kenjiro 5. Quality of the IPv6 Internet CHO Kenjiro 6. Firewalls SHIMOJO Toshio Section 1 Motivation and Overview 1.1 Background IPv6 Fix is motivated by several incidents which give bad impression to users. The followings are examples of such incidents. 1) When a user stayed in a hotel. He tried to connect his Windows XP, which is enabled IPv6 with "ipv6 install", to the net but he failed. He asked a staff of the hotel and the staff answered, "Please type 'ipv6 uninstall'". He thus did it and succeeded to connect to the net. This is because the system in the hotel does not implement IPv6 functionality well. 2) When an ISP installed BIND 9 to one of its services, users complained that it becomes uncomfortable to gain access to the net with their web browser. This is because BIND 9 tries to use IPv6 transport that BIND 9 cannot use. [Note: this is a well-known issue, and will be fixed in BIND 9.2.5 and 9.3.1.] 1.2 Overview When an application starts communication, it takes the following procedure: 1) Looking up DNS with a given name Destination addresses are obtained. 2) Trying to establish a connection A pair of destination address and source address is found. 3) Exchanging data Parameters including its path MTU is negotiated. This procedure should be progressed smoothly. If an IPv6 address pair of destination and source is not found, falling back to IPv4 is quickly carried out. We will explain flaws found in each steps. 1.2.1 Looking up DNS with a given name When an application can use both IPv4 and IPv6, it should resolve both A RR and AAAA RR with a given name. A mis-implemented resolver resolves AAAA RR even if IPv6 is not available. (Since IPv6 cannot be used, the resolution of AAAA RR is wasting. Also it gives chances to make mistakes to applications.) The cause of this problem can be found in the DNS server side. For instance, when a query of AAAA RR is received, the DNS server replies with a wrong answer. Also a kind of DNS server tries to make an IPv6 transport even if it is not available. BIND 9 up above matches to this. 1.2.2 Trying to establish a connection Next a pair of available destination and source should be searched. A list of destination addresses is already resolved with DNS at this moment. For an destination address in this list, a proper source address is selected and a connection is tried to be established. If succeeded, the search is finished. Otherwise, the same procedure is taken for a next destination. A flaw of the IPv6 spec in this procedure is pointed out. That is called "on-link assumption" defined in Section 5.2 of RFC 2461. The spec specified as follows: If a default route does not exist and a destination address is an IPv6 global address, the peer should be assumed to be on the same link. Practically speaking, the peer is not on the same link in many cases, the connection establishment is stalled until TCP is timeout(Section 2). Another problem is found in operation. For instance, a server is providing both mail service and web service. And the mail service is ready for IPv6 while the web service is not. In this situation, it is likely that a operator configures DNS with CNAME as follows: server.example.com. IN A 192.0.2.1 IN AAAA 2001:DB8::1 www.example.com. IN CNAME server.example.com. mail.example.com. IN CNAME server.example.com. The server is reachable in IPv6. But since the web service is not provided with IPv6, when a connection is tried to be made, it is kept waiting for returning of TCP RST. Because this interval time is short, we need not to take this problem seriously. However, another serious problem also occurs. That is, a connection cannot be established from the inside of an IPv6 to IPv4 translator. In this case the following configuration is appropriate: www.example.com. IN A 192.0.2.1 mail.example.com. IN A 192.0.2.1 IN AAAA 2001:DB8::1 1.2.3 Exchanging data After a connection is established, it is important to use a proper route and to be able to receive ICMPv6 for smooth communication. If an IPv6 in IPv4 tunnel is deployed improperly, it is likely that a slow IPv6 in IPv4 tunnel is used instead of a better route(See Section 5). If there is a firewall which is implemented to drop ICMPv6 or which set to do so, the problem that path MTU cannot be negotiated occurs(See Section 6). 2. Harmful Effects of the "On-link Assumption" Although the IETF has discussed the protocol specifications carefully so that we can smoothly migrate to IPv6 from IPv4, undesirable behavior have sometimes been found in the specification through the actual deployment. A typical example is the "on-link assumption". 2.1 On-link assumption overview According to Section 5.2 of RFC2461, if there is no default route, IPv6 nodes assume that all destinations are on-link. This feature is useful when two nodes manually configured with addresses of different prefixes try to connect each other with those addresses. 2.2 The problem of on-link assumption As mentioned in Section 2.1, we assumed on-link assumption was useful; however, it turned out that it was harmful. For example, in the case where dual-stack clients gain access to a server which has A and AAAA RRs, no problem occurs when they have IPv6 connectivity. However, in many cases, they have IPv4 connectivity only. According to the specification, IPv6 nodes should still assume that all destinations are on-link if there is no default route, and thus the clients try to make an IPv6 connection unsuccessfully. This will result in undesirable delay while trying to connect to an off-link host and falling back from IPv6 to IPv4. The delay at least contains the timeout of neighbor discovery, which is typically 3 seconds, but in practice it often takes much longer (see Chapter 4). 2.3 Implementation In this section, we summarize how different implementations treat the on-link assumption and discuss the effect. 2.3.1 Windows XP SP1 and SP2 The on-link assumption is implemented in Windows XP. But when IPv6 connectivity is not provided, this implementation tries to connect in the following order: 1) By using IPv4 connectivity 2) By using IPv6 with the on-link assumption if the first attempt fails Therefore, it will not generate the above delay. 2.3.2 Linux Linux kernel 2.4.x and from 2.6.1 to 2.6.8 implement the on-link assumption, and have the problem described in Section 2.2. But there is a work-around; we can avoid this problem by deleting the route of "::/0" for every interface when we do not have IPv6 connectivity. Linux kernel 2.6.9 has removed the on-link assumption based on the latest clarification in [ID-NDBIS], and so will future versions. 2.3.3 BSD variants The on-link assumption is in general implemented for BSD variants, but it is disabled by default, exactly due to the problem we saw. Thus, the problem does not occur in typical cases. KAME snapshots and some recent versions of BSD variants even removed this assumption altogether, based on the latest clarification in [ID-NDBIS]. 2.3.4 MacOS X The problem does not occur. 2.3.5 Solaris9/10 Solaris9/10 implement the on-link assumption, and the problem can happen on those. However, the getaddrinfo() library (at least for Solaris 10) apparently prefers IPv4 addresses over IPv6 ones when the system does not have global IPv6 addresses (it prefers IPv6 when the system has a global IPv6 address). Thus, the problem should happen only in some atypical situations (e.g., when the network administrator advertises IPv6 global prefixes without providing IPv6 connectivity). 2.4 Resolution This is in essence a flaw of the specification. The problem was already identified in the IETF, and they are now revising the specification. When RFC 2461 is updated, the on-link assumption will be deprecated. [ID-NDBIS] A workaround at the DNS resolver described in Section 3.4.1 can also help mitigate the problem here in the typical case, i.e., when a host does not have a global IPv6 address but tries to connect to a global IPv6 destination. 3. DNS servers and resolvers 3.1 The problem Many network applications now support IPv6, most of which send DNS queries for A and AAAA RRs using the getaddrinfo() library function. The applications then try to connect to the returned addresses one by one, and communicate to the remote node once a connection is established with one of the addresses. The typical implementation of this procedure tries to connect to IPv6 addresses first when the returned addresses contain both IPv4 and IPv6 addresses. This behavior assumes that the DNS queries are handled properly and that a connection attempt to an unreachable address fails immediately, quickly falling back to a successful connection setup with a reachable one. In the actual operation, however, these assumptions are not necessarily met due to non-compliant DNS servers (described below), a harmful behavior in the IPv6 protocol specification (see Chapter 2), or characteristics of the TCP specification and implementation (see Chapter 4). In such cases, the application (and its users) may need to wait for a long period before establishing a connection even with a reachable address. The application may also fail making a connection which should actually be established. Regarding the example of web browsing, it means a long delay in loading a web page or failure to browse a page which would be accessible in the IPv4-only network. Details of DNS server misbehavior on AAAA queries are described in [ID-AAAA], which will also be briefly mentioned below in Sections 3.2 and 3.3. The essential solution to this problem is to fix these servers. But at the same time, it is possible to avoid or mitigate the problems with some workaround at the DNS clients (resolvers), including the getaddrinfo() library. We will discuss the possibility in Section 3.4. Note: technically, "DNS servers" are categorized into "authoritative servers" and "caching servers" based on their functionality (this categorization is also used in [ID-AAAA]). In this chapter, "DNS servers" mean "authoritative servers" unless explicitly noted otherwise. 3.2 Some cases of misbehaving DNS servers In this section, problematic cases of misbehaving DNS servers are described. Details are described in [ID-AAAA]. When a host name under a DNS zone has an A RR but does not have a AAAA RR, the authoritative DNS server(s) of the zone should return the following response to a AAAA RR query. - the response code (RCODE) is 0 (it means no error) - the answer section is empty A caching server and a resolver which receive this response can smoothly proceed with proper processes. As noted above, however, this is not always the case. The followings are the problematic cases of misbehaving DNS servers that return incompliant responses. A) Ignore queries for a AAAA RR Most resolvers supporting IPv6 have a fallback mechanism where they first send queries for a AAAA RR, then they send queries for an A RR if it cannot be resolved. If the queries of the AAAA RR are ignored, it can take a fatal timeout for the resolver to resolve the name, annoying end users who are using the resolver through network applications. B) Return RCODE 3("Name Error", or as known as "NXDOMAIN") "Name Error" indicates that the queried DNS server does not have any RRs of any type for the queried name. With this response, a resolver may immediately give up and never fall back. Therefore, an application program cannot communicate with any other clients even if the communication of IPv4 is available. C) Return other erroneous codes (than RCODE 3) Some authoritative DNS servers return a response with RCODE 4 ("Not Implemented"), RCODE 2 ("Server Failure"), and RCODE 1 ("Format Error") to the AAAA query. This is not correct, but most resolvers fall back to A queries of the same in this case, and thus this behavior is relatively harmless. However, caching servers do not cache the fact that the queried hostname has no AAAA RR, increasing the load of the authoritative server and it waste network bandwidth; those caching servers send AAAA queries to the authoritative server whenever they receive a query for the name from the resolver. D) Return a broken response There is a case where the response contains an IPv4 address in the RDATA field which should actually contain an IPv6 address to a AAAA query. The caching server behavior to this type of response varies; BIND8 caching servers send the broken response to the resolver transparently; BIND9 caching servers discard the broken response, and returns an error of RCODE 2 ("Server Failure") to the resolver. The same problem as the previous case can happen with the latter type of caching servers. E) Make Lame Delegation This type of DNS servers returns an authoritative response to A queries, but returns an "inauthoritative" response to AAAA queries (i.e., the "AA" bit is not set in the response), causing the situation of "lame delegation". Some older versions of BIND8 caching servers can suffer from this behavior, since it will stop using a remote DNS server for some period once it detect the server is "lame". Additionally, these BIND8 caching servers simply returns an error of RCODE 2 ("Server Failure") for queries in the zone which is managed by the "lame" DNS server throughout the period. In this case, the resolver will never be able to get the correct response even if it falls back from AAAA to A after receiving the error. 3.3 A survey of misbehaving DNS servers As described in the previous section, misbehaving DNS servers have bad effects on users, and can block smooth migration to IPv6. The IPv6 Fix project aims to remove such DNS servers from the Internet and to make a sound IPv6 environment. In this section, we show our activities towards this goal. We take measures described below to reduce the number of misbehaving DNS servers. 1. develop a tool to find misbehaving DNS servers 2. perform surveys on the net using the tool, and identify misbehaving sites 3. obtain vendor and version information of the DNS servers from system administrators of the sites 4. report the results to the vendors and ask for fixing the implementation We developed a tool for survey and surveyed domains under ".JP". This survey is a joint effort with JaPan Registry Services (JPRS). * Tool for the survey This tool is a perl script which sends probing DNS queries to the DNS servers, analyzes the returned packets, and outputs the results. In this survey, we repeated the following steps for the names provided by JPRS: 1. determine a domain name to survey example.jp 2. determine authoritative DNS servers for the domain ns.example.jp, ns.wide.ad.jp 3. determine a host name to send queries for the survey. Our tool first tries to send some "well-known" host names to the server, generated by adding "www" or "ftp" to the domain, to see if the host name for the survey exists. send A RR queries for www.example.jp to ns.examle.jp 4. if the tool can determine an existing host name to send queries in step 3., it checks the behavior of the authoritative DNS servers found in step 2 by sending a AAAA RR query for the host name. It then outputs the results. A AAAA RR query of www.example.jp to ns.example.jp A AAAA RR query of www.example.jp to ns.wide.ad.jp * Surveys of ".JP" domains We performed the survey on the JP domains using the domain list of .JP as of November 2004 and a server computer provided by JPRS. The survey took 5 days. The results of the survey are described below (only statistics information of the results is described due to a privacy issue): domain DNS server some problems found 0.04% 0.11% no problems found 82.16% 84.39% unknown 17.80% 15.50% An shown in the above table, we collected the results on both per-domain basis and per-server basis. Using the example of "example.jp", the "domain result" means the result for the "example.jp" domain, and the "server results" means the results for the "ns.example.jp" and "ns.wide.ad.jp" servers. There can be several reasons for the "unknown" case. For instance, the queried DNS server may simply be down temporarily. Also it is possible that the queried domain does not have the queried name generated based on a wild guess in step 3 above. The followings categorize the misbehaving DNS servers (and the ratio of those) per problem type: A) ignore queries of a AAAA RR: 4.7% B) return RCODE 3 ("Name Error"): 4.7% C) return RCODE other than "Name Error" 8.5% D) return a broken response: 0.0% E) lame delegation: 82.1% Hereafter we plan to contact sites's system administrators to obtain vendor and version information of the misbehaving DNS servers, and to report its results to the vendors. We are also planning to perform surveys on other domains. We will discuss how to manage that (e.g, it may require collaboration with other organizations). 3.4 Workarounds at the resolver side The typical behavior of resolvers described in Section 3.1 should not cause a problem as long as DNS servers show compliant behavior and falling back from IPv6 to IPv4 works smoothly. However, as shown in the above sections or other chapters in this document, these assumptions are not necessarily the case. And, as a result, the resolver's behavior can be a source of various problems. Moreover, these problems are blocking smooth migration to IPv6. In this section, we will discuss workarounds to these problems at the resolver side. Specifically, we will propose the following two approaches: a. limiting AAAA queries to the case where they are really necessary b. shortening the timeout period for AAAA queries under a certain condition We will describe the details of these proposals and referential implementation in the succeeding subsections. 3.4.1 Limit AAAA queries It is possible to mitigate or avoid most DNS-related problems if we can limit DNS queries for AAAA RRs to the case where those queries are really necessary. For instance, we do not need the ability to communicate over IPv6 in an environment where IPv6 connectivity is not provided in the first place, and an ordinary user do not need AAAA DNS queries, either. We can completely avoid the problems described so far in such a network by only sending A queries and make communication over the IPv4 address(es) returned. Thus, we propose to define "the case where AAAA queries are necessary" as "when the resolver node has IPv6 addresses other than link-local addresses (*)". In practice, this means the case where the node has a global IPv6 address. [(*)Note: for simplicity, we regard the loopback address as a link-local address in this discussion.] When a host does not have an IPv6 address other than link-local ones, it does not make much sense to try to make communication over IPv6 for normal network applications. It is therefore effective to detect whether AAAA queries are necessary based on the existence (or non-existence) of non-link-local IPv6 addresses. And, in fact, it is the typical network configuration for a non-IPv6 user who runs an IPv6-capable operating system that the only available IPv6 addresses are link-local. If we can provide a workaround with those users which avoids problems caused by AAAA DNS queries, we can also avoid a hasty negative reaction to IPv6 deployment from such users, thereby implicitly promoting smooth deployment to IPv6. Regarding the standard getaddrinfo() library, we can implement the proposed approach by suppressing AAAA queries when the AF_UNSPEC address family is specified in its third argument, the "hints" structure, and the host calling getaddrinfo() does not have a non-link-local IPv6 address. Suppressing AAAA queries this way is quite a different behavior than that of deployed implementations of getaddrinfo(). However, it still conforms to the standard; according to Chapter 6 of RFC3494 [RFC-V6API], AF_UNSPEC in the hints structure simply means that the caller can accept any types of addresses, and does not require anything on the callee side. Meanwhile, the proposed behavior depends on the typical usage that the result of a getaddrinfo() call is specifically used for the destination address of further communication at the calling application. On the other hand, if the application assumes both A and AAAA queries are sent by specifying the AF_UNSPEC family in "hints", which might be the case for a DNS operation or diagnose tool, the proposed behavior would break the expected result. But, properly speaking, the API specification does not guarantee the assumption on the result of getaddrinfo() by specifying AF_UNSPEC in the hints structure. That is, if the application wanted to make sure that both A and AAAA queries are issued, it would have to call getaddrinfo() two times, one call with AF_INET, and the other with AF_INET6 in the hints. Or in other words, if an application that has the assumption encounters a problem due to the assumption mismatch, it is the application, not the library function, which should be fixed, particularly when the application is expected to be portable. Additionally, although a DNS diagnose tool might have such an assumption, the actual existing implementations typically use a lower-level interface to DNS, rather than a high-level function such as getaddrinfo(). Thus, the proposed change should not actually cause a real problem on existing or future applications using getaddrinfo(). 3.4.2 Shorten the timeout for AAAA queries The workaround shown in the previous subsection should be potent, but it cannot be applied to networks that have actually IPv6 connectivity, where DNS Queries for AAAA RRs are always necessary. In this subsection, we will consider a workaround to the problem in this environment caused by DNS servers ignoring AAAA queries. Widely deployed resolver implementations based on ISC BIND [BIND] first send queries for AAAA RRs. If it gets a response or when the waiting timer for the queries expires, the resolver sends queries for A RRs. Similarly, if it gets a response or the timer expires, the resolver combines all the responses, if any, and returns the entire results to the calling application. This behavior will suffer from a long timeout of about 1 minute (*) in the first part, if the responding DNS server ignores the AAAA query, and will end up with A RRs as the final result. [(*) recent versions of BIND9 caching servers reduce the maximum timeout to 30 seconds when it cannot get any response from authoritative servers, in which case BIND9 returns a response with an error of RCODE 2 ("Server Failure"). If the getaddrinfo() library sends its queries to a BIND9 caching server behaving this way, the maximum period to wait will also be reduced accordingly.] In order to mitigate the scenario, we propose the following modification which reduces the hopeless timeout: 1. the resolver first sends A queries. 2. it then sends AAAA queries. If it got a successful response to the A queries at step 1, it sets the timeout period to a smaller value, so that it will not have to wait for a long period even if the responding DNS server ignores AAAA queries. This way, the resolver can get a quick response from a DNS server that responds to A queries correctly but ignores AAAA queries, a typical misbehaving DNS server. Thus, the next timer for the AAAA query would be shortened, and the resolver can reduce the unnecessary timeout. 3.4.3 Implementation We have implemented the proposed approaches in the getaddrinfo() library function of KAME's weekly snapshots for FreeBSD. The implementation is available in snapshots of 20041129 and later. In this implementation, we use "max(1 second, T * 2)" for the shortened timeout period described in Section 3.4.2 (where T is the round-trip time of the preceding A query when it succeeds). This should provide a reasonable timeout period even when the remote DNS server is located far from the resolver and the round-trip time is relatively large. That is, we can expect getting both A and AAAA responses from a server which is compliant but has a long round-trip time. As a side note, it should be noted that the getaddrinfo() library is generally expected to return IPv6 addresses first, followed by IPv4 addresses, in order to realize the default address selection rule specified in RFC3484. Even though the proposed approach first tries A queries (for IPv4 addresses), the implementation then sorts the entire addresses so that the resulting list of addresses follow the specified ordering of RFC3484. 4. TCP Connection Establishment In TCP connection attempts, it sometimes takes long time to fall back from IPv6 to IPv4. As described in Section 3, IPv6-capable applications typically call getaddrinfo() to obtain the IP addresses of the host to communicate with and, if multiple addresses are available, try to connect to them one by one until the connection is successfully established. Many implementations prefer IPv6 addresses to IPv4 addresses when the peer is dual-stack so that they first try to connect by IPv6 and fall back to IPv4 only after the IPv6 connection attempt fails. If the peer has multiple IPv6 addresses, each of them is tried in sequence. When a RST or an ICMP hard error is received, the connection establishment fails immediately so that it takes only one RTT to move on to the next available address. However, when a connection attempt fails by exceeding the retransmission count threshold or by timeout, it takes much longer to fall back. There are 2 major cases in slow fallback during IPv6 TCP connection attempts. (1) Retransmission count exceeds the threshold This happens when ICMPv6 soft errors are returned. The TCP specification [RFC-TCP] states that TCP must not abort the connection when receiving ICMP soft errors since soft errors are considered to be transient network failures. Therefore, even when TCP receives ICMPv6 soft errors, TCP ignores them and continues to retransmit packets. TCP implementations derived from 4.4BSD have a threshold of retransmission count during the connection establishment phase so that a connection attempt fails typically after about 10 seconds when the retransmission count exceeds the threshold. There are 3 soft errors defined in the specification: ICMP Destination Unreachable messages with code 0 (network unreachable), code 1 (host unreachable), and code 5 (source route failed). Although ICMPv6 did not exist when the specification [RFC-TCP] was written, the corresponding ICMPv6 Destination Unreachable messages are code 0 (no route to destination) and code 3 (address unreachable). (2) Timeouts A timeout occurs when TCP receives no response during the connection establishment phase. TCP implementations derived from 4.4BSD have a separate timer for connection establishment, and the timer expires after 75 seconds by default. The intrinsic problem is difficulties in realizing a transparent fallback mechanism which tries to connect to one of multiple addresses. Nowadays a similar problem exists even in the IPv4-only world since it becomes common to have one DNS host name being mapped to multiple mirror server addresses. However, in the current dual-stack Internet, the number of hosts which register IPv6 addresses in DNS but are not reachable by IPv6 is not negligible. Accordingly, it gives an impression to users that connections take longer to establish when IPv6 is enabled. In principle, all addresses registered in DNS should be reachable, and the right action is to fix unreachable hosts and networks. We will discuss this issue in Section 5. Nonetheless, it is difficult to solve problems on the other side of communication so that, in practice, we need to alleviate problems by expediting the fallback mechanism of TCP. 4.2 Counter Measures Expediting the TCP fallback mechanism is just a temporary measure against reachability problems. Since our goal is to reduce hurdles of IPv6 deployment, we focus on fallback from IPv6 to IPv4. (1) Retransmission count exceeds the threshold There is a proposal [ID-SOFTERROR] to quicken TCP's fallback by treating soft errors as hard errors only during the connection establishment phase. When the soft error specification was defined, normal hosts had only one IP address so that, even though ICMP errors indicate the destination was not reachable, TCP had no choice but to keep retransmitting packets in the hope of failure recovery. Now that it is normal for a host to have both IPv4 and IPv6 addresses, it is possible to immediately fall back to another address if one address is detected as unreachable by ICMP soft errors. The issue is to find a good balance between the possibility of failure recovery in the near future and the possibility of successful connection by falling back to the next address, which depends on the communication environment. For example, in a dual-stack network with fragile IPv6 support, it is effective to immediately fall back to IPv4 when receiving ICMPv6 soft errors. On the other hand, in a network with unstable conditions in the physical layer, it is more effective to keep retransmission. TCP derived from 4.4BSD employs an intermediate solution so as to abort the connection earlier during the connection establishment phase by ICMP soft errors. Considering drastic changes with the communication environment in the last decade, it is reasonable to take more aggressive fallback strategies. At the same time, we should be careful not to break the features of TCP, especially its ability to work under extremely unstable conditions. We also need to think about flexibility to cope with future changes in the IPv6 connectivity situation. (2) Timeouts When TCP does not receive any response for SYN and then timeouts, there is no effective measure. There is no other event than the timeout, and the timeout period cannot be radically shortened. It is not effective to slightly shorten the timeout period. Alternatively, it is possible to connect to multiple addresses in parallel. However, parallel connection attempts waste resources and require drastic changes in the existing code base so that it is not very attractive. Another possible measure is to cache the address information of unreachable hosts and not to use them. The first access to a problematic host takes time as before but the unreachable addresses are excluded for further accesses. However, the effects do not seem to be worth the required changes. To summarize, we do not have a good measure when TCP receives nothing from the network and then timeouts. It looks better to solve the cause of the problem for the TCP timeout case. 4.3 Standardization and Implementation Status A proposal to treat ICMP soft errors as hard errors during the connection establishment phase has been under discussion within the IETF TCP Maintenance and Minor Extensions (tcpm) working group. Many people in the IPv6 community believe that it is necessary for IPv6 deployment and support this proposal. On the other hand, some of the TCP folks are conservative in changing the TCP behavior. At this writing (December 2005), the working group reached consensus to document it as a possible implementation and not to put on the standard track in an effort to balance the both opinions. The document will be an Informational RFC that describes the problems and advantages in a neutral manner. As for implementations, Linux employs the behavior to immediately abort the connection attempt by ICMP soft errors during the connection established phase since the kernel version 2.0.0 released in 1996. KAME also employs this behavior since December 2004. 4.4 Security Consideration The proposed quick fallback by ICMP soft errors does not weaken TCP's resistance to attacks. The current TCP aborts the connection by ICMP hard errors so that it is currently possible for an attacker to try to inject false ICMP hard errors. The proposed behavior does not make such a attack easier to do. 5. The Quality of the IPv6 Internet In order to make users migrate from IPv4 to IPv6, the quality of the IPv6 Internet must be better than, or at least equal to that of the IPv4 Internet. If the IPv6 Internet is worse than the current IPv4 Internet, average users will not migrate to IPv6 spending their time and efforts. However, in reality, the quality of the IPv6 Internet falls far behind that of the IPv4 Internet. When an IPv6 user encounters one of problematic IPv6 sites, the frustrated user tends to hastily conclude that the problem lies with IPv6 itself. Such a situation has become a serious hurdle for deploying IPv6. There are several causes such as experimental operation, a lack of peering, and poorly configured tunnels. There is a dilemma in IPv6 deployment that casual IPv6 connectivities have been provided to promote IPv6 but it often increases experimental use of IPv6 and, as a result, it degrades the overall quality of the IPv6 Internet. Moreover, the transparent design of IPv6 makes it harder to detect and solve problems. IPv6 is designed to coexist with IPv4 and, if there is a problem with IPv6, fall back to IPv4 automatically. As a result, users can communicate through IPv4 without noticing problems with IPv6, which leaves the IPv6 problems unsolved. Those who use IPv6 for daily work know by intuition that most IPv6 sites are fine but only a small number of IPv6 sites have problems. If this is the case, we can improve the quality of the IPv6 Internet by fixing these problematic IPv6 sites. In order to verify this assumption, we have been investigating the quality of the IPv6 Internet by measuring and comparing it with that of the IPv4 Internet since January 2004. The details of our research can be found in [DUALSTACK]. 5.1 Problems often found in leaf sites - Experimental use of IPv6 These sites introduced IPv6 experimentally but did not make IPv6 fully operational. They do not use IPv6 for their daily work so that they are unaware of problems with their IPv6 networks. In addition, these sites often leave unused IPv6 addresses registered in their DNS. - Poorly Configured Tunnels Tunnels are necessary to introduce IPv6 into the existing networks but, at the same time, tunnels can be easily misused. One of the major IPv6 network quality problems is caused by those tunnels which disregard the underlying physical topology. Poorly configured tunnels exhibit poor performance due to reduced available bandwidth and increased response time. We have been observing many poorly configured tunnels, especially with old 6bone tunnels. There are cases where domestic communication goes through an IPv6 tunnel broker in another continent. Tunnels are often left untended after instractural changes. In addition, it is difficult to diagnose problems of tunneled networks. As a result, the use of tunnels often leads to the degradation of network quality. 5.1 Problems often found in backbone networks Our research shows that problems in backbone networks are less serious than those in leaf networks but there are rooms to improve when compared with the IPv4 Internet. - A lack of IPv6 peering and IPv6 paths IPv6 packets are often travels further than IPv4 packets to the same destination since there are a limited number of IPv6-capable IXes, and the number of IPv6 peering at those IXes are still low. IPv6-capable paths are also limited partly because network equipment in use is not fully IPv6 capable. Detecting quality problems is more difficult than detecting reachability problems so that network operators often overlook issues with network quality. Our next step is to develop a better method to feed back the results of our IPv6 network quality check to the responsible parties in order to improve the quality of the IPv6 Internet as a whole. 6. Firewalls In today's IPv4 Internet, a site is often secured by a firewall, which is located at the edge of the site. To secure client nodes within the site, a firewall generally prohibits the traffic from outside to inside, except for ones permitted by the site administrators. In IPv6, a site is also secured by a firewall. But badly configured or implemented firewall can be problematic in terms of the deployment of IPv6. 1)ICMPv6 In a security-conscious site, a firewall is also installed to defend their servers publicly open to the Internet. Such firewalls are often configured to drop ICMPv4 packets to prevent ICMPv4 DoS attacks. If all the ICMPv6 packets are dropped at the firewall in the same manner as IPv4, even the permitted communication may fail because of the lack of the control information via ICMPv6. 2)DNS Some firewall does not pass DNS packets larger than 512 byte, since the size of DNS packets is normally limited to that size. However, the packet size can exceed 512 byte in case of IPv6 (EDNS0[RFC-EDNS0] is a typical example). In that case, such firewall obstructs the IPv6 communications. 6.1 Problems owing to firewalls Here is the list of ICMPv6 packets which should not be discarded by a firewall. * Destination Unreachable (Type = 1) * Packet Too Big (Type = 2) * Time Exceeded (Type = 3) When the above ICMPv6 packet is discarded in the middle, it leads to an IPv6 communication problem due to the following reasons. - Destination Unreachable If this ICMPv6 packet is discarded, a packet sender cannot know what happened regarding its sending packet: e.g. successfully reached, discarded in the middle, transmission failed due to a lack of routes, or transmission failed due to a lack of listening application on the target host. The lack of this information on the packet sender leads to a delay in TCP establishment, because packet sender has to wait for a TCP timeout before taking different measures to cope with the communication trouble (see 4.1). - Packet Too Big If this ICMPv6 packet is discarded, a packet sender cannot learn the path MTU, which leads to a communication failure when a large packet is advertised from the sender, due to a lack of proper packet fragmentation at the sender. - Time Exceeded If this ICMPv6 packet is discarded, a packet sender sometimes fails in communication when the actual number of hops is greater than the hoplimit in its packet, because the sender cannot be sure whether it should adjust the hoplimit in the transmitted packet. For instance, traceroute does not work beyond a firewall since it expects the return of this ICMPv6 packet from every router in the middle. - EDNS0 There is a similar problem regarding DNS packets. A DNS resolver cannot resolve a hostname via IPv6 when a DNS packet size exceeds 512 byte using EDNS0 option and a firewall cannot forward such DNS packets. In BIND9.3-based resolver (e.g. FreeBSD 5.3-RELEASE), this can be evaded by setting "edns-udp-size" option. This is, however, an ad- hoc solution, and it is necessary to correct the firewall implementation for an essential solution. 6.2 Firewall inspection tool We are now developing a firewall inspection tool to investigate the above-mentioned problem and find out where this problem exists in the IPv6 Internet. This tool will be freely available for everyone who would like to test specific firewall devices. This inspection tool generates an artificial data packet to generate the above-mentioned ICMPv6 or DNS packets behind a firewall. Then it inspects whether the expected packet returns from the target-host or the target-firewall. Here is the detail of test tool specification. - Destination Unreachable There are two different kinds of this ICMPv6 Error: Port-Unreachable and Destination Unreachable. Each of them are inspected in a different way. In case of Port-Unreachable ICMPv6 Error inspection, this tool sends a packet to the target-host behind a firewall using a port number not used in that host. Then it checks whether an ICMPv6 Error returns from the firewall or the target-host. In case of Address-Unreachable ICMPv6 Error inspection, this tool sends a packet to non-existing address in a reachable prefix behind a firewall. Then it checks whether an ICMPv6 Error returns from from the firewall or the target-host. - Packet Too Big This tool sends a packet to the target-host behind a firewall with a packet- size greater than the path MTU. Then it checks whether an ICMPv6 Error packet returns from the firewall or the target-host. - Time Exceeded This tool sends a packet to the target-host behind a firewall with a hop-limit smaller than the actual number of hops. Then it checks whether an ICMPv6 Error packet returns from the firewall or the target host. - EDNS0 This tool sends a DNS Query to the target DNS server behind the firewall, and inspect the response from the DNS server or the firewall. 6.3 Future Works We are implementing a firewall inspection tool on FreeBSD, which will be available in public. We will also port it for Windows to make it available for much more number of people. Using this tool, we will conduct the investigation of firewall by ourself or asking firewall vendors, summarize the result, and publicize the result . REFERENCES [ID-NDBIS] T. Narten et al, "Neighbor Discovery for IP version 6 (IPv6)", Internet-Draft, draft-ietf-ipv6-2461bis-01.txt, October 2004. [ID-AAAA] Y. Morishita and T. Jinmei, "Common Misbehavior against DNS Queries for IPv6 Addresses", Internet-Draft, draft-ietf-dnsop-misbehavior-against-aaaa-02.txt, October 2004. [RFC-V6API] R. Gilligan et al, "Basic Socket Interface Extensions for IPv6", RFC 3493, February 2003. [BIND] ISC BIND web page. http://www.isc.org/ [RFC-TCP] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [ID-SOFTERROR] F. Gont., "TCP's Reaction to Soft Errors", Internet-Draft, draft-gont-tcpm-tcp-soft-errors-02.txt, September 2005. [DUALSTACK] Kenjiro Cho, Matthew Luckie and Bradley Huffaker, "Identifying IPv6 Network Problems in the Dual-Stack World", SIGCOMM'04 Network Troubleshooting Workshop, Portland, Oregon, September 2004. [RFC-EDNS0] P. Vixie, "Extension Mechanisms for DNS (EDNS0)", RFC 2671, August 1999. Copyright Notice Copyright (C) WIDE Project (2005). All Rights Reserved. ;;; Local Variables: ;;; mode: text ;;; End: