This post is a living document. Updates since the original publish dates are noted inline as such.
Update 1/11/2011: Nearly all of this article is bunk now, as it turns out the GNU libc developers consider getent (the basis of everything below) to be a debugging tool only. As such, it does abnormal things. I suggested the man page should indicate as much. Anyway, here it is…
Let’s examine hostname resolution on a RHEL 5.5 box on a Sunday night. I was inspired from reading Down the ‘ls’ Rabbit Hole 2 weeks ago. I suspect any other modern Linux distro will provide nearly identical results.
The short summary is:
- Read /etc/resolv.conf
- Try to use nscd
- Try to use nscd again
- Read /etc/nsswitch.conf
- Load libnss_files.so
- Read /etc/host.conf
- Try to find IPv6 address in /etc/hosts
- Load libnss_dns.so
- Load libresolv.so
- Perform DNS IPv6 ‘AAAA’ query
- Try to find IPv4 address in /etc/hosts
- Perform DNS IPv4 ‘A’ query
Read on for the full trace with commentary.
strace -f getent hosts www.puppetlabs.com ... open("/etc/resolv.conf", O_RDONLY) = 3 ... close(3) = 0
Looking at the source for GNU libc 2.5 (which is what is installed on this box), it appears that /etc/resolv.conf is loaded in resolv/res_init.c and the explanation is given as:
/* * Resolver state default settings. */ /* * Set up default settings. If the configuration file exist, the values * there will have precedence. Otherwise, the server address is set to * INADDR_ANY and the default domain name comes from the gethostname(). * * An interrim version of this code (BIND 4.9, pre-4.4BSD) used 127.0.0.1 * rather than INADDR_ANY ("0.0.0.0") as the default name server address * since it was noted that INADDR_ANY actually meant ``the first interface * you "ifconfig"'d at boot time'' and if this was a SLIP or PPP interface, * it had to be "up" in order for you to reach your own name server. It * was later decided that since the recommended practice is to always * install local static routes through 127.0.0.1 for all your network * interfaces, that we could solve this problem without a code change. * * The configuration file should always be used, since it is the only way * to specify a default domain. If you are running a server on your local * machine, you should say "nameserver 0.0.0.0" or "nameserver 127.0.0.1" * in the configuration file. * * Return 0 if completes successfully, -1 on error */
Okay. I guess. Let’s move on.
... socket(PF_FILE, SOCK_STREAM, 0) = 3 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"...}, 110) = -1 ENOENT (No such file or directory) close(3) = 0 socket(PF_FILE, SOCK_STREAM, 0) = 3 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"...}, 110) = -1 ENOENT (No such file or directory) close(3) = 0
Why did you check nscd twice?
GNU libc nscd/nscd_helper.c is the only place with a connect() call referencing /var/run/nscd/socket (aka _PATH_NSCDSOCKET as defined in nscd/nscd-client.h). The connect() is in open_socket(), which is referenced in two places:
One:
/* Try to get a file descriptor for the shared memory segment containing the database. */ static struct mapped_database * get_mapping (request_type type, const char *key, struct mapped_database **mappedp)
Two:
/* Create a socket connected to a name. */ int __nscd_open_socket (const char *key, size_t keylen, request_type type, void *response, size_t responselen)
Here I took it upon myself to try to build the GNU libc code I was referencing. I figured I’d build it with debug symbols and then run getent again under gdb. The build with CFLAGS=-g spit out an error saying that it must be built with optimization. So much for that, but I did at least throw in some syslog() calls. For one, the two attempts to connect to an nscd socket above are in fact from both referenced functions.
Update 1/11/2011: This shows my lack of gdb knowledge. One doesn’t need to build in debugging symbols to see what I am trying to see. Commenter Dave W. shows that below with his traces.
Jan 3 03:57:33 new-host-2 getent: get_mapping() trying to open nscd socket with open_socket() Jan 3 03:57:33 new-host-2 getent: __nscd_open_socket() trying to open nscd socket with open_socket() with open_socket()
Is that correct behavior? Could it be better? Beats me. I’m only taking it that far, but it doesn’t seem ideal.
... open("/etc/nsswitch.conf", O_RDONLY) = 3 ... close(3) = 0
Now we actually get somewhere. At least we’re reading the right configuration file at this point.
This is generated from GNU libc nss/nsswitch.c
int __nss_database_lookup (const char *database, const char *alternate_name, const char *defconfig, service_user **ni) { ... service_table = nss_parse_file (_PATH_NSSWITCH_CONF);
Fine, moving on.
open("/lib64/libnss_files.so.2", O_RDONLY) = 3 ... close(3) = 0
This is due to “files” being first in /etc/nsswitch.conf. Fine.
... open("/etc/host.conf", O_RDONLY) = 3 ... close(3) = 0
The hell? You already found a valid /etc/nsswitch.conf. Why would you query this stupid old legacy file?
nss/getXXbyYY_r.c causes this read of /etc/host.conf
#ifdef NEED__RES_HCONF if (!_res_hconf.initialized) _res_hconf_init (); #endif /* need _res_hconf */
Turns out this is hardcoded and not managed/overriden in any way by configure.
[jblaine@new-host-2 glibc-2.5]$ grep "#define NEED__RES_HCONF" */* inet/gethstbyad_r.c:#define NEED__RES_HCONF 1 inet/gethstbynm2_r.c:#define NEED__RES_HCONF 1 inet/gethstbynm_r.c:#define NEED__RES_HCONF 1
??? – feel free to provide a comment on this below. I don’t understand the need for this nowadays when we have /etc/nsswitch.conf.
... open("/etc/hosts", O_RDONLY) = 3 ... close(3) = 0
Makes sense finally, at least if this was the result of doing what our /etc/nsswitch.conf said (“files dns”).
Update 1/11/2011: Oddly, this first opening of /etc/hosts is due to trying to resolve www.puppetlabs.com via an IPv6 address.
open("/lib64/libnss_dns.so.2", O_RDONLY) = 3 ... close(3) = 0 ... open("/lib64/libresolv.so.2", O_RDONLY) = 3 ... close(3) = 0
Fine.
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.1.1")}, 28) = 0 ... sendto(3, "uw\1\0\0\1\0\0\0\0\0\0\3www\npuppetlabs\3com\0"..., 36, MSG_NOSIGNAL, NULL, 0) = 36 ... recvfrom(3, "uw\201\200\0\1\0\1\0\1\0\0\3www\npuppetlabs\3com\0"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.1.1")}, [16]) = 120 close(3) = 0
DNS traffic, finally.
Update 1/10/2011: After coming back to this little exercise tonight armed with Wireshark, I’ve found that this DNS request is for an IPv6 “AAAA” record. Commenter Dave W. confirmed this below. Again, this is odd to me that it would try IPv6 first.
open("/etc/hosts", O_RDONLY) = 3 ... close(3) = 0
Why? What did this and what is the reason?
Update 1/11/2011: This is the attempt to look it up as an IPv4 address. The following lack of expected syslog() output is still a bit mysterious though.
Opening /etc/hosts happens in 2 GNU libc functions:
One:
void _sethtent(f) int f; { if (!hostf) hostf = fopen(_PATH_HOSTS, "r" ); else rewind(hostf); stayopen = f; }
Two:
struct hostent * _gethtent() { ... if (!hostf && !(hostf = fopen(_PATH_HOSTS, "r" ))) { __set_h_errno (NETDB_INTERNAL); return (NULL); } ...
Let’s assume our “problem” is _gethtent(). It’s referenced 3 places:
One:
struct hostent * _gethtbyname2(name, af) const char *name; int af;
Two:
struct hostent * _gethtbyaddr(addr, len, af) const char *addr; size_t len; int af;
Three:
struct hostent * gethostent()
Oddly, with plenty of syslog() calls in _sethtent() and _gethtent() around where the fopen() of /etc/hosts happens, I cannot get them to be reached. This odd opening of /etc/hosts remains a mystery.
Moving on.
... socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.1.1")}, 28) = 0 ... sendto(3, "\256\261\1\0\0\1\0\0\0\0\0\0\3www\npuppetlabs\3com\0"..., 36, MSG_NOSIGNAL, NULL, 0) = 36 ... recvfrom(3, "\256\261\201\200\0\1\0\2\0\0\0\0\3www\npuppetlabs\3com\0"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.1.1")}, [16]) = 66 close(3) = 0 ... write(1, "74.207.250.144 puppetlabs.com w"..., 5074.207.250.144 puppetlabs.com www.puppetlabs.com) = 50 exit_group(0) = ?
Another DNS query before we get our screen output and getent exits. Why?
Update 1/11/2011: This is the IPv4 query of an “A” record finally.
Feel free to chime in.
Jeff: stumbled into your blog when debugging your little puzzle ;-) Hope you don’t mind if I post the answer:
on the first fopen of /etc/hosts you get the following trace:
#4 0x00007f3e3e901b07 in gethostbyname2 (name=0x7fff62c898cd "www.google.com", af=0xa)
at ../nss/getXXbyYY.c:117
*note the af=0xa*
second fopen backtrace:
#4 0x00007f3e3e901b07 in gethostbyname2 (name=0x7fff62c898cd "www.google.com", af=0x2)
at ../nss/getXXbyYY.c:117
*note the af=0x2*
Now we check:
$ grep -Ri AF_INET *
bits/socket.h:#define AF_INET PF_INET
bits/socket.h:#define AF_INET6 PF_INET6
and we follow up with:
bits/socket.h:#define PF_INET 2 /* IP protocol family. */
bits/socket.h:#define PF_INET6 10 /* IP version 6. */
which corresponds with 0x2 and 0xa in the backtraces.
Thanks Dave
Nearly all of this article is bunk now, as it turns out the GNU libc developers consider getent (the basis of everything below) to be a debugging tool only. As such, it does abnormal things. I suggested the man page should indicate as much.
Hi Jeff,
Thanks for leaving this up as a framework, even if getent has changed over the years (pretty sure it is still relevant in LDAP and NIS config/troubleshooting.)
I wanted to mention about the use of /etc/hosts when doing these lookups … Perhaps you have already found out about this, but the reason being is because of the control of the order of name resolution via /etc/nsswitch.conf .
Here’s an example excerpt (unmodified from stock) :
#####
#hosts: db files nisplus nis dns
hosts: files dns myhostname
#####
The first line is a stock example, showing the available options for the hosts line (the format is the same across the rest of the file as you’ll see).
The second line is saying that the first source for looking up any hostname will be a file (in this case, /etc/hosts) . If /etc/hosts does not resolve the hostname, then the configured DNS server (manual or DHCP) will be attempted, and then finally check the system’s own hostname (this method is rarely referenced in my experience.)
If you want to make it so your system never checks /etc/hosts (which I would personally avoid doing) then you can remove ‘files’ from that line, OR you could just put ‘dns’ before ‘files’ on that line:
#####
hosts: dns files myhostname
#####
Hope this helps, someone, somewhere, sometime :)
Regards,
Armin C