Line 4: | Line 4: | ||
* Some buggy DNS servers would be confused by AAAA requests | * Some buggy DNS servers would be confused by AAAA requests | ||
* Optimization of | * Optimization of DNS queries to only ask for useful addresses | ||
Currently, I'm aware of several documents that define AI_ADDRCONFIG: | Currently, I'm aware of several documents that define AI_ADDRCONFIG: | ||
Line 11: | Line 11: | ||
* RFC 3493 (informational): useless but (partially) breaks IPv4/IPv6 localhost | * RFC 3493 (informational): useless but (partially) breaks IPv4/IPv6 localhost | ||
* RFC 2553 (obsolete informational): useless but hopefully harmless | * RFC 2553 (obsolete informational): useless but hopefully harmless | ||
* | * man getaddrinfo: like RFC 3493 | ||
The current glibc <code>getaddrinfo()</code> code doesn't behave strictly according to any of these definitions including its own manual page. Under some conditions it fails to translate literal addresses and non-DNS names like ''localhost'', ''localhost4'', ''localhost6'', and any other names | |||
you put in /etc/hosts (e.g. the hostname). In Fedora, there was a patch that further broke link-local IPv6 addresses, but it has been removed recently. | |||
The first time I learned about this is on a laptop with virtualization, but you can get into this problem very easily even as an ordiary user. The symptom is an unexpected failure of a software that uses node-local TCP/IP communication. | |||
=== Problem statement === | === Problem statement === | ||
The choice to use AI_ADDRCONFIG is done by developers of software that uses TCP/IP networking. Those developers cannot always anticipate whether the software will used for node-local networking, link-local networking or global networking, not whether IPv4 or IPv6 will be used. | |||
This is usually up to the configuration of the software. For example, mount command can be used to connect external NFS mounts, but it can also be used to connect node-local (or localhost) NFS mounts. Another critical software is an SSH client (and server, of course). You may use the SSH client to connect to remote hosts, but sometimes you need to connect to a link-local address of a local server whose connectivity is misconfigured. | |||
There is a huge number of services that can be accessed globally, through a link-local IPv6 address or through the localhost addresses. If ''localhost'' is broken, you never know what else breaks. | |||
The benefits of AI_ADDRCONFIG are only really useful if: | |||
1) The usual processing of all node-local and link-local names and addresses is preserved as long as the respective addresses are present. | |||
2) The global name resolution is not affected by the existence or nonexitence of non-routable addresses. | |||
Unfortunately, the current implementation of <code>getaddrinfo()</code> mostly follows the informational RFC 3493, which fails in both #1 and #2. It mistakenly devices addresses to two groups (IPv4 and IPv6) and explicitly ignores loopback addresses. That cannot lead to good results. In reality, from the network connectivity perspective, one would have to distingush at least: | |||
1a) Node-local IPv4 (127.0.0.1) | |||
1b) Node-local IPv6 (::1) | |||
2a) Link-local IPv4 (169.254.0.0/16) | |||
2b) Link-local IPv6 (fe80::/64) | |||
3a) Global IPv4 | |||
3b) Global IPv6 | |||
4) Unique Local Addresses | |||
But even that is not perfect. It's just a heuristic on the world around. It is much better to use the routing decision to choose whether an address is reachable or not. So you would at least have | |||
to distinguish global and non-global addresses. | |||
But then what happens if you assign a global address to the loopback interface. This is a very common use case for well-known addresses (e.g. those you put in DNS) that are being assigned to a multi-network host that uses one device as a backup for the other with transport (or link-local) addresses on the physical links. | |||
Should a global address assigned to a loopback interface considered a loopback address? Definitely not, as it's not a node-local address, but does everyone read the standards this way? | |||
AI_ADDRCONFIG is all about heuristics. About avoiding both false negatives and false positives. Only the routing decision can be used as a test whether a particular host is reachable. | |||
=== Potential benefits === | |||
The potential benefits of AI_ADDRCONFIG are more than questionable. If it hasn't been a problem that AI_ADDRCONFIG doesn't even work in most cases (see the [[#Tests|tests]]), why should it be a problem | |||
if it's just ignored. | |||
If the benefit of not querying DNS records you don't need is important enough to have a special flag for it, it should really *not* do anything else than that. It should not do any filtering of non-DNS results, otherwise you can be sure you'll get into problems. | |||
Not querying IPv6 records is only really useful in an IPv4-only network and vice versa. The (recommended) behavior of such a flag should be precisely specified and should be done exactly the way it's described in the documentation. | |||
and | |||
it's | |||
I don't see *any* benefits at all in filtering non-DNS results. Applications using <code>getaddrinfo()</code> cycle through all the results and try to <code>connect()</code> to each address until it succeeds (or tries all of them). This works for both TCP and UDP. For unreachable hosts, <code>connect()</code> just fails. | |||
=== Tests === | === Tests === |
Revision as of 02:35, 8 December 2012
Flag AI_ADDRCONFIG considered harmful
As far as I know, AI_ADDRCONFIG was added for the following reasons:
- Some buggy DNS servers would be confused by AAAA requests
- Optimization of DNS queries to only ask for useful addresses
Currently, I'm aware of several documents that define AI_ADDRCONFIG:
- POSIX1-2008: useless but harmless
- RFC 3493 (informational): useless but (partially) breaks IPv4/IPv6 localhost
- RFC 2553 (obsolete informational): useless but hopefully harmless
- man getaddrinfo: like RFC 3493
The current glibc getaddrinfo()
code doesn't behave strictly according to any of these definitions including its own manual page. Under some conditions it fails to translate literal addresses and non-DNS names like localhost, localhost4, localhost6, and any other names
you put in /etc/hosts (e.g. the hostname). In Fedora, there was a patch that further broke link-local IPv6 addresses, but it has been removed recently.
The first time I learned about this is on a laptop with virtualization, but you can get into this problem very easily even as an ordiary user. The symptom is an unexpected failure of a software that uses node-local TCP/IP communication.
Problem statement
The choice to use AI_ADDRCONFIG is done by developers of software that uses TCP/IP networking. Those developers cannot always anticipate whether the software will used for node-local networking, link-local networking or global networking, not whether IPv4 or IPv6 will be used.
This is usually up to the configuration of the software. For example, mount command can be used to connect external NFS mounts, but it can also be used to connect node-local (or localhost) NFS mounts. Another critical software is an SSH client (and server, of course). You may use the SSH client to connect to remote hosts, but sometimes you need to connect to a link-local address of a local server whose connectivity is misconfigured.
There is a huge number of services that can be accessed globally, through a link-local IPv6 address or through the localhost addresses. If localhost is broken, you never know what else breaks.
The benefits of AI_ADDRCONFIG are only really useful if:
1) The usual processing of all node-local and link-local names and addresses is preserved as long as the respective addresses are present.
2) The global name resolution is not affected by the existence or nonexitence of non-routable addresses.
Unfortunately, the current implementation of getaddrinfo()
mostly follows the informational RFC 3493, which fails in both #1 and #2. It mistakenly devices addresses to two groups (IPv4 and IPv6) and explicitly ignores loopback addresses. That cannot lead to good results. In reality, from the network connectivity perspective, one would have to distingush at least:
1a) Node-local IPv4 (127.0.0.1)
1b) Node-local IPv6 (::1)
2a) Link-local IPv4 (169.254.0.0/16)
2b) Link-local IPv6 (fe80::/64)
3a) Global IPv4
3b) Global IPv6
4) Unique Local Addresses
But even that is not perfect. It's just a heuristic on the world around. It is much better to use the routing decision to choose whether an address is reachable or not. So you would at least have to distinguish global and non-global addresses.
But then what happens if you assign a global address to the loopback interface. This is a very common use case for well-known addresses (e.g. those you put in DNS) that are being assigned to a multi-network host that uses one device as a backup for the other with transport (or link-local) addresses on the physical links.
Should a global address assigned to a loopback interface considered a loopback address? Definitely not, as it's not a node-local address, but does everyone read the standards this way?
AI_ADDRCONFIG is all about heuristics. About avoiding both false negatives and false positives. Only the routing decision can be used as a test whether a particular host is reachable.
Potential benefits
The potential benefits of AI_ADDRCONFIG are more than questionable. If it hasn't been a problem that AI_ADDRCONFIG doesn't even work in most cases (see the tests), why should it be a problem if it's just ignored.
If the benefit of not querying DNS records you don't need is important enough to have a special flag for it, it should really *not* do anything else than that. It should not do any filtering of non-DNS results, otherwise you can be sure you'll get into problems.
Not querying IPv6 records is only really useful in an IPv4-only network and vice versa. The (recommended) behavior of such a flag should be precisely specified and should be done exactly the way it's described in the documentation.
I don't see *any* benefits at all in filtering non-DNS results. Applications using getaddrinfo()
cycle through all the results and try to connect()
to each address until it succeeds (or tries all of them). This works for both TCP and UDP. For unreachable hosts, connect()
just fails.
Tests
Tested with glibc 2.16.0.
#!/usr/bin/python3 import sys from socket import * hosts = [ None, "localhost", "127.0.0.1", "localhost4", "::1", "localhost6", "195.47.235.3", "2a02:38::1001", "info.nix.cz", "www.google.com", ] for host in hosts: print("getaddrinfo host=\"{}\" hints.ai_flags=AI_ADDRCONFIG:".format(host)) try: for item in getaddrinfo(host, "http", AF_UNSPEC, SOCK_STREAM, SOL_TCP, AI_ADDRCONFIG): print(" {}".format(item[4][0])) except gaierror as error: print(" !! {} !!".format(error))
The desired result may not be well defined in this case. For now I'm using a simple definition that says:
1) Don't break non-DNS results. You never know when you need them.
2) Filter DNS results based on the presence of global IPv4 and global IPv6 addresses (with a simplified definition of global that means not node-local and not link-local).
Feel free to offer better definitions of what constitutes a desired result.
The documented result is what follows from the manual page. Note that the definition of getaddrinfo() is roughly the same as RFC 3493 but substantially different from POSIX1-2008.
Host with only 127.0.0.1 and ::1 names
Desired result: All addresses and all non-DNS names should work.
Documented result: Nothing should work.
Actual result: Same as desired result, different from documented result.
Broken addresses: None (127.0.0.1, ::1 according to documentation).
Host with 127.0.0.1, ::1 and at least one link-local IPv6 address
Desired result: All addresses and all non-DNS names should work.
Documented result: Only IPv6 addresses should work. Non-DNS names should only give IPv6 addresses.
Actual result: Same as documented result, different from desired result.
Broken addresses: 127.0.0.1
Host with global IPv4, link-local IPv6 (and DNS)
Desired result: All addresses and all non-DNS names should work. DNS names should only give IPv4 addresses.
Document result: Unlimited address resolution (like without AI_ADDRCONFIG).
Actual result: Same as documented, different from desired.
Host with global IPv4 (and DNS), without link-local IPv6 (like non-ethernet links)
Desired result: All addresses and all non-DNS names should work. DNS names should only give IPv4 addresses.
Document result: Only IPv4 addresses should work. Both non-DNS and DNS names should only give IPv4 addresses.
Actual result: Same as documented, different from desired.
Broken addresses: ::1
Host with global IPv6 (and DNS)
Desired result: All addresses and all non-DNS names should work. DNS name should only give IPv6 addresses.
Documented result: Only IPv6 addresses should work. Both non-DNS and DNS names should only give IPv6 addresses
Actual result: Same as documented result, different from desired result.
Broken addresses: 127.0.0.1
Host with both IPv4 and IPv6 addresses (and DNS, of course)
Desired and documented result: Unlimited address resolution (like without AI_ADDRCONFIG).
Actual result: Same as desired and documented. Everything works.
Making AI_ADDRCONFIG useful
A possible solution for the first problem (that AI_ADDRCONFIG is useless) is to treat link-local addresses the same as loopback (or node-local) addresses. But this is even more harmful.
Fedora's GLIBC was patched to do exactly the above thing. The consequence was that even link-local IPv6 stopped working when a global IPv6 address was absent. And what would we have link-local addresses for if they didn't work without global addresses? This patch has been already reverted.
Conclusion
The whole idea of filtering-out non-DNS addresses is flawed and breaks so many things including IPv4 and IPv6 literals. There is no reason to filter them out.
Proposed solutions:
1) Make getaddrinfo()
ignore AI_ADDRCONFIG. It has not been working for years and nobody
cared enough to fix it, there is a substantial probability that it's not
needed. Remove the code that implements it (patch).
1b) Make getaddrinfo()
ignre AI_ADDRCONFIG only when filtering the results but keeps its behavior for gethostbyname* function selection which affects DNS results. The resulting behavior is something between #1 and #3.
2) Patch all software to avoid using AI_ADDRCONFIG. Follow new development, and prevent/reject modifications that add it. This is impractical.
3) Only process AI_ADDRCONFIG in the nsswitch DNS plugin. This requires
implementing getaddrinfo()
in nsswitch which is required
for zeroconf networking anyway. Use solution (1) as a temporary fix. Locally
assigned addresses looked up through local DNS would still fail.
Notes: Solution #2 is advocated by Michal Kubeček from SUSE. The third solution is an output of long discussions between me (Pavel Šimerda) and Tore Anderson, who explained me the original purpose of AI_ADDRCONFIG. I would have no problem with just doing #1.
More resources:
- IPv4:
getaddrinfo("127.0.0.1", ...)
fail with some AI_ADDRCONFIG configurations - IPv6: Fedora 808147 -
getaddrinfo("::1", ...)
fails with some configurations of AI_ADDRCONFIG - IPv6:
getaddrinfo("fe80::1234:56ff:fe78:90%eth0", ...)
also fails as above - IPv6: GLIBC's nsswitch doesn't support overriding
getaddrinfo
which is requred to resolve link-local IPv6 addresses
Examples of software using AI_ADDRCONFIG
- Mozilla (patch adding AI_ADDRCONFIG with comments)
- GLIB (lines with AI_ADDRCONFIG)
- Apache (patch adding AI_ADDRCONFIG with comments)