From Fedora Project Wiki
Line 200: Line 200:
<pre>
<pre>
sock1 = socket(AF_INET6, SOCK_STREAM, SOL_TCP)
sock1 = socket(AF_INET6, SOCK_STREAM, SOL_TCP)
bind(sock2, in6addr_any, sizeof (in6addr_any))
bind(sock1, in6addr_any, sizeof (in6addr_any))
...
...
sock2 = socket(AF_INET, SOCK_STREAM, SOL_TCP)
sock2 = socket(AF_INET, SOCK_STREAM, SOL_TCP)
Line 207: Line 207:
</pre>
</pre>


...
The first socket succeeds and binds to both addresses. The second one
 
fails. If the application ignores (or only warns about) the second failure,
it would work without problem.


The correct resulting actions would be either:
The correct™ way to handle this with <code>getaddrinfo()</code> would be
to prevent the linux kernel from using the ''dualstack hack''.


<pre>
<pre>
sock1 = socket(AF_INET, SOCK_STREAM, SOL_TCP)
sock1 = socket(AF_INET6, SOCK_STREAM, SOL_TCP)
bind(sock1, INADDR_ANY, sizeof (INADDR_ANY))
setsockopt(sock1, IPPROTO_IPV6, IPV6_V6ONLY, &yes, sizeof(int))
bind(sock1, in6addr_any, sizeof (in6addr_any))
...
...
sock2 = socket(AF_INET6, SOCK_STREAM, SOL_TCP)
sock2 = socket(AF_INET, SOCK_STREAM, SOL_TCP)
bind(sock2, in6addr_any, sizeof (in6addr_any))
bind(sock2, INADDR_ANY, sizeof (INADDR_ANY))
...
...
</pre>
</pre>


After writing this, I found [http://funcptr.net/2012/08/07/ipv6----getaddrinfo%28%29-and-bind%28%29-ordering-with-v6only/ a resource] from Bert JW Regeer on the same subject.


==== ... ====


When using <code>getaddrinfo()</code>, the host argument can be an
When using <code>getaddrinfo()</code>, the host argument can be an

Revision as of 13:23, 27 November 2012

Name resolution

Resolving using getaddrinfo() in applications

The getaddrinfo() function is a dualstack-friendly API to name resolution. It is used by applications to translate host and service names to a linked list of struct addrinfo objects. It has its own manual page getaddrinfo(3) in the Linux Programmer's Manual.

Running getaddrinfo()

And example of getaddrinfo() call:

const char *node = "www.fedoraproject.org";
const char *service = "http";
struct addrinfo hints = {
    .ai_family = AF_UNSPEC,
    .ai_socktype = SOCK_STREAM,
    .ai_protocol = SOL_TCP,
    .ai_flags = 0,
    .ai_canonname = NULL,
    .ai_addr = NULL,
    .ai_next = NULL
};
struct addrinfo *result;
int error;

error = getaddrinfo(node, service, &hints, &result);

The input of getaddrinfo() consists of node specification, service specification and further hints.

  • node: literal IPv4 or IPv6 address, or a hostname to be resolved
  • service: numeric port number or a symbolic service name
  • hints.ai_family: enable dualprotocol, IPv4-only or IPv6-only queries
  • hints.ai_socktype: select socket type
  • hints.ai_protocol: select transport protocol

Socktype and protocol are somewhat duplicate for TCP/IP stack with just TCP and UDP. getaddrinfo() can be futher tweaked with the hints.ai_flags. Other attributes are not supposed to be set in hints (ai_canonname, ai_addr and ai_next).

On success, the error variable is assigned to 0 and result is pointed to a linked list of one or more struct addrinfo objects.

Never assume that getaddrinfo() returns only one result or that the first result actually works!

Using getaddrinfo() results

It is necesary to try all results until one successfully connects. This works perfectly for TCP connections as they can fail gracefully at this stage.

struct addrinfo *item;
int sock;

for (item = result; item; item = item->ai_next) {
    sock = socket(item->ai_family, item->ai_socktype, item->ai_protocol);

    if (sock == -1)
        continue;

    if (connect(sock, item->ai_addr, item->ai_addrlen) != -1) {
        fprintf(stderr, "Connected successfully.");
        break;
    }

    close(sock);
}

For UDP, connect() succeeds without contacting the other side (if you are using connect() with udp at all). Therefore you might want to perform additional actions (such as sending a message and recieving a reply) before crying out „success!“.

Freeing getaddrinfo() results

When we're done with the results, we'll free the linked list.

freeaddrinfo(result);

Using getaddrinfo() in Python

Python's socket.getaddrinfo() API tries to be a little bit more sane than the C API.

#!/usr/bin/python3

import sys, socket

host = "www.fedoraproject.org"
service = "http"
family = socket.AF_UNSPEC
socktype = socket.SOCK_DGRAM
protocol = socket.SOL_TCP
flags = 0

result = socket.getaddrinfo(host, service, family, socktype, protocol, flags)

sock = None
for family, socktype, protocol, canonname, sockaddr in result:
    try:
        sock = socket.socket(family, socktype, protocol)
    except socket.error:
        continue
    try:
        sock.connect(sockaddr)
        print("Successfully connected to: {}".format(sockaddr))
    except socket.error:
        sock.close()
        sock = None
        continue
    break

if sock is None:
    print("Failed to connect.", file=sys.stderr)
    sys.exit(1)

Tweaking getaddrinfo() flags

  • AI_NUMERICHOST: use literal address, don't perform host resolution
  • AI_PASSIVE: return socket addresses suitable for bind() instead of connect(), sendto() and sendmsg()
  • AI_NUMERICSERV: use numeric service, don't perform service resolution
  • AI_CANONNAME: save canonical name to the first result
  • AI_ADDRCONFIG: this never really worked, as far as I know
  • AI_V4MAPPED+AI_ALL: only with AF_INET6, return IPv4 addresses mapped into IPv6 space
  • AI_V4MAPPED: I don't see any real use for this, only returns mapped IPv4 if there are no IPv6 addresses

Using getaddrinfo() for multiple transport channels

Some applications need to open several TCP or UDP channels to the same host. The classic usage of getaddrinfo() returns a linked list of addrinfo objects for just one channel.

Solutions:

1) Run getaddrinfo() once per channel. This may for example cause multiple DNS requests for the same information.

2) Run getaddrinfo() only for the main (or first) channel. When connection succeeds, reuse the addrinfo structure for the other channels. It is usually safe to assume that when one channel succeeds, the machine is available for the other channels, too.

An example of such an application is Spice. Thanks to David Jaša for information on this subject.

Binding to addresses using getaddrinfo()

According to the manual page, you should use AI_PASSIVE flag when you use getaddrinfo() to retrieve addresses to bind() to. Those are usually stored as user configuration with NULL being the default value.

getaddrinfo() returns a linked list of addrinfo structures. Developers should not generally assume that it only returns one address nor that the first address is the best and only one to bind() to.

The general idea is that one would loop through getaddrinfo() structure and bind() one socket to each of them. But this doesn't work in general. Read on.

Binding to the INADDR_ANY and/or in6addr_any addresses

This is the most common option that doesn't limit the service to a particular set of local addresses.

getaddrinfo(NULL, ...) with AI_PASSIVE returns two addresses, 0.0.0.0 and ::, in this order. If you use the general rule above, the resulting actions would look like:

sock1 = socket(AF_INET, SOCK_STREAM, SOL_TCP)
bind(sock1, INADDR_ANY, sizeof (INADDR_ANY))
...
sock2 = socket(AF_INET6, SOCK_STREAM, SOL_TCP)
bind(sock2, in6addr_any, sizeof (in6addr_any))
...

This won't work. The first bind() will successfully bind to 0.0.0.0, while the second tries to bind() in a dualstack manner, taking both :: and 0.0.0.0 (unless sysctl net.ipv6.bindv6only is enabled) and therefore it will fail as 0.0.0.0 is already taken.

The addresses are obviously returned in different order than usual (IPv4 first) and that should probably be fixed in glibc. But even if it's fixed, it only changes the order of the actions:

sock1 = socket(AF_INET6, SOCK_STREAM, SOL_TCP)
bind(sock1, in6addr_any, sizeof (in6addr_any))
...
sock2 = socket(AF_INET, SOCK_STREAM, SOL_TCP)
bind(sock2, INADDR_ANY, sizeof (INADDR_ANY))
...

The first socket succeeds and binds to both addresses. The second one fails. If the application ignores (or only warns about) the second failure, it would work without problem.

The correct™ way to handle this with getaddrinfo() would be to prevent the linux kernel from using the dualstack hack.

sock1 = socket(AF_INET6, SOCK_STREAM, SOL_TCP)
setsockopt(sock1, IPPROTO_IPV6, IPV6_V6ONLY, &yes, sizeof(int))
bind(sock1, in6addr_any, sizeof (in6addr_any))
...
sock2 = socket(AF_INET, SOCK_STREAM, SOL_TCP)
bind(sock2, INADDR_ANY, sizeof (INADDR_ANY))
...

After writing this, I found a resource from Bert JW Regeer on the same subject.

...

When using getaddrinfo(), the host argument can be an IPv4 or IPv6 address or even a name to be resolved. Typically, getaddrinfo() only returns multiple items when used to resolve a name. One exception is when it is called with a NULL host.

The following test shows how it works:

getaddrinfo-test-ai-passive.py:

#!/usr/bin/python3
from socket import *
hosts = [
    None,
    "localhost",
    "info.nix.cz",
    "www.google.com",
]
for host in hosts:
    print(host)
    for item in getaddrinfo(host, "http", AF_UNSPEC, SOCK_STREAM, SOL_TCP, AI_PASSIVE):
        print("  ", item)
# ./getaddrinfo-test-ai-passive.py
None
   (2, 1, 6, '', ('0.0.0.0', 80))
   (10, 1, 6, '', ('::', 80, 0, 0))
localhost
   (10, 1, 6, '', ('::1', 80, 0, 0))
   (2, 1, 6, '', ('127.0.0.1', 80))
info.nix.cz
   (10, 1, 6, '', ('2a02:38::1001', 80, 0, 0))
   (2, 1, 6, '', ('195.47.235.3', 80))
www.google.com
   (10, 1, 6, '', ('2a00:1450:4016:801::1012', 80, 0, 0))
   (2, 1, 6, '', ('173.194.35.144', 80))
   (2, 1, 6, '', ('173.194.35.145', 80))
   (2, 1, 6, '', ('173.194.35.147', 80))
   (2, 1, 6, '', ('173.194.35.148', 80))
   (2, 1, 6, '', ('173.194.35.146', 80))

In the first case, the returned addresses are 0.0.0.0 and :: (in this order). On a linux system (with sysctl net.ipv6.bindv6only = 0), binding to :: implies also binding to 127.0.0.1.

If you want to bind() to wildcard address in a dualstack manner (the first case), you have to choose one of the following solutions:

1) Never supply NULL to getaddrinfo() when using the result for bind(). The AI_PASSIVE would then lose its meaning. bind() to the first returned address.

2) bind() to the first returned address. Fix getaddrinfo() so that it returns the in6addr_any first. Then rely on dualstack feature provided by the kernel.

The "localhost" works with either of them. But if you look at the third case, there's no such a trick like v6only an you have to bind() to both of the addresses. The next solutions will cover it:

3) Call getaddrinfo() separately for AF_INET and AF_INET6 and bind() separately.

But this doesn't help when you have multiple addresses per family (the fourth case). The only real solution to that is:

4) Bind to each address returned by getaddrinfo(). Use IPV6_V6ONLY to avoid collisions between dualstack and singlestack sockets.

Of course there is still the possibility to only support numeric listening addresses either with or without IPV6_V6ONLY, as well as possibility to not support listening addresses at all.

Flag AI_ADDRCONFIG considered harmful

As far as I know, AI_ADDRCONFIG was added for the following reasons:

  • Some buggy DNS servers would be confused by AAAA requests
  • Optimization of the number DNS queries

Currently, I'm aware of several documents that define AI_ADDRCONFIG:

  • POSIX1-2008: useless but harmless
  • RFC 3493 (informational): useless but (partially) breaks IPv4/IPv6 localhost
  • RFC 2553 (obsolete informational): useless but hopefully harmless
  • GLIBC getaddrinfo(3): like RFC 3493

Actual GLIBC getaddrinfo() behavior differs from the manual page.

Problem statement

Currently, any of the definitions above prevents AI_ADDRCONFIG from filtering out IPv6 addresses when a link-local IPv6 address is present. These addresses are automatically added to interfaces that are otherwise only configured for IPv4. Therefore, on a typical linux system, AI_ADDRCONFIG cannot meet its goals and is effectively useless.

But it builds on a false assumption, that no IPv4 communication is feasible without a non-loopback address. But why would we have a loopback address if we can't use it for node-local communication? AI_ADDRCONFIG breaks localhost, localhost4, localhost6, 127.0.0.1, ::1 and more if there's no non-loopback address of the respective protocol.

This can happen if the computer is connected to an IPv4-only network or and IPv6-only network, when it loses IPv4 or IPv6 connectivity and when it's used offline.

Making AI_ADDRCONFIG useful

A possible solution for the first problem (that AI_ADDRCONFIG is useless) is to treat link-local addresses the same as loopback (or node-local) addresses. But this is even more harmful.

Fedora's GLIBC was patched to do exactly the above thing. The consequence was that even link-local IPv6 stopped working when a global IPv6 address was absent. And what would we have link-local addresses for if they didn't work without global addresses? This patch has been already reverted.

Conclusion

The whole idea of filtering-out non-DNS addresses is flawed and breaks so many things including IPv4 and IPv6 literals. There is no reason to filter them out.

Proposed solutions:

1) Make getaddrinfo() ignore AI_ADDRCONFIG. It has not been working for years and nobody cared enough to fix it, there is a substantial probability that it's not needed. Remove the code that implements it (patch).

2) Patch all software to avoid using AI_ADDRCONFIG. Follow new development, and prevent/reject modifications that add it.

3) Only process AI_ADDRCONFIG in the nsswitch DNS plugin. This requires implementing getaddrinfo() in nsswitch which is required for zeroconf networking anyway. Use solution (1) as a temporary fix.

The first solution is trivial. The second is rather political. And the third solution has been so far the most acceptable. It is an output of long discussions between me (Pavel Šimerda) and Tore Anderson, who explained me the original purpose of AI_ADDRCONFIG.

More resources:

Comments and discussion

Please send any remarks and questions to psimerda-at-redhat-dot-com or use Talk:Networking. Edit with care.