AWS GuardDuty + EKS + DNS = wild goose chase

On a beautiful day amidst the autumn in early November, we received one concerning GuardDuty alert Backdoor:EC2/C&CActivity.B!DNS which basically means that an EC2 instance queried a domain name that is associated with a known C2 (command and control) server. Here is what we know based on the alert alone:

The activity took place in the middle of the night which didn’t exactly scream normal.
The EC2 instance in question was actually an EKS (Elastic Kubernetes Service) worker node so we were dealing with Kubernetes workload here. Let’s call this instance Node E.

Now there were services that had been known to do this so we were not quite jumpy yet but we started the investigation to understand what was going on.

In order to determine all the services that could have been running on Node E and thus, be responsible for the DNS query, we did the following among many other things:

Look at the current pods running on Node E -> nothing suspicious there
Check kube audit logs to see if there were any deleted pods on Node E between the current time and a few days ago -> only 1 potential service but as soon as we talked to the service owner, it was clear that service couldn’t be responsible for the lookup.
Check falco logs for any alert that might signal a compromise -> nothing there either

Almost two hours into the investigation and we were at a loss. We knew all the services running on Node E at the time of the event but we could not tie the action to a single specific pod. Just when we were about to lose all hope, the SRE guy who was helping us with the investigation yelled:

AHHH COREDNS!!!

As it turned out, the explanation was quite simple but we need to have a basic understanding of how DNS resolution works in Kubernetes and specifically EKS so here goes.

If you go to a pod running in EKS and view the content of the /etc/resolv.conf file, it may look like this:

nameserver 10.96.0.10 # you may see a different IP of course
search yournamespace.svc.cluster.local svc.cluster.local cluster.local ec2.internal

Hmmm, what is that DNS server running on 10.96.0.10?

>> kubectl get svc -n kube-system
NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)               AGE
kube-dns                    ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP         53d
[TRUNCATED]

So 10.96.0.10 is the Cluster IP for CoreDNS service which is the default kube-dns since version v1.12 of Kubernetes. Pods running inside EKS use the CoreDNS service’s cluster IP as the default name server for querying internal and external DNS records. Where does it point to though?

>> kubectl describe svc kube-dns -n kube-system
Name:              kube-dns
Namespace:         kube-system
[TRUNCATED]
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.96.0.10
[TRUNCATED]
>> kubectl get deploy -n kube-system -l k8s-app=kube-dns
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
coredns   5/5     5            5           53d
>> kubectl get pods -n kube-system -o wide -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
coredns-5644d7b6d9-r9pw7   1/1     Running   1          53d   <IP A>        <Node A>   <none>           <none>
coredns-5644d7b6d9-w65kz   1/1     Running   1          53d   <IP B>        <Node B>   <none>           <none>
coredns-5644d7b6d9-x9gnf   1/1     Running   1          53d   <IP C>        <Node C>   <none>           <none>
coredns-5644d7b6d9-zjpjm   1/1     Running   1          53d   <IP D>        <Node D>   <none>           <none>
coredns-5644d7b6d9-tkz2q   1/1     Running   1          53d   <IP E>        <Node E>   <none>           <none>

What this means is that when your pod does a DNS query, it sends the query to CoreDNS’s ClusterIP which then forwards the query to any one of the CoreDNS pods (which most likely run on different nodes!). If the query is for an external domain (not within the Kubernetes cluster), it will be forwarded to predefined resolvers (usually /etc/resolv.conf on the host/worker node - use kubectl -n kube-system get configmap coredns -o yaml to confirm).

Now back to our case, what possibly happened was that a service running on, say worker Node A, let a user upload media from a domain that was deemed suspicious by GuardDuty intelligence data. The service sent the domain query to CoreDNS service’s ClusterIP which then happened to forward the query to a CoreDNS pod running on Node E. Since the query was for an external domain, said CoreDNS pod forwarded the query to Amazon DNS Server from Node E where it was running.

As GuardDuty monitors DNS logs from instances’ perspective, all it saw was that a DNS query was made from Node E so when the domain matched the threat list, an alert was created and sent us on a wild goose chase that fine day. It wasn’t really GuardDuty fault, we only had our lack of understanding for DNS resolution in Kubernetes to blame. However, it would be nice if GuardDuty could monitor CoreDNS logs too ;)

Oh well, another day another lesson!