On a beautiful day amidst the autumn in early November, we received one concerning GuardDuty alert
Backdoor:EC2/C&CActivity.B!DNS which basically means that an EC2 instance queried a domain name that is associated with a known C2 (command and control) server. Here is what we know based on the alert alone:
- The activity took place in the middle of the night which didn’t exactly scream normal.
- The EC2 instance in question was actually an EKS (Elastic Kubernetes Service) worker node so we were dealing with Kubernetes workload here. Let’s call this instance Node E.
Now there were services that had been known to do this so we were not quite jumpy yet but we started the investigation to understand what was going on.
In order to determine all the services that could have been running on Node E and thus, be responsible for the DNS query, we did the following among many other things:
- Look at the current pods running on Node E -> nothing suspicious there
- Check kube audit logs to see if there were any deleted pods on Node E between the current time and a few days ago -> only 1 potential service but as soon as we talked to the service owner, it was clear that service couldn’t be responsible for the lookup.
- Check falco logs for any alert that might signal a compromise -> nothing there either
Almost two hours into the investigation and we were at a loss. We knew all the services running on Node E at the time of the event but we could not tie the action to a single specific pod. Just when we were about to lose all hope, the SRE guy who was helping us with the investigation yelled:
As it turned out, the explanation was quite simple but we need to have a basic understanding of how DNS resolution works in Kubernetes and specifically EKS so here goes.
If you go to a pod running in EKS and view the content of the
/etc/resolv.conf file, it may look like this:
nameserver 10.96.0.10 # you may see a different IP of course search yournamespace.svc.cluster.local svc.cluster.local cluster.local ec2.internal
Hmmm, what is that DNS server running on
>> kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 53d [TRUNCATED]
10.96.0.10 is the Cluster IP for CoreDNS service which is the default kube-dns since version v1.12 of Kubernetes. Pods running inside EKS use the CoreDNS service’s cluster IP as the default name server for querying internal and external DNS records. Where does it point to though?
>> kubectl describe svc kube-dns -n kube-system Name: kube-dns Namespace: kube-system [TRUNCATED] Selector: k8s-app=kube-dns Type: ClusterIP IP: 10.96.0.10 [TRUNCATED] >> kubectl get deploy -n kube-system -l k8s-app=kube-dns NAME READY UP-TO-DATE AVAILABLE AGE coredns 5/5 5 5 53d >> kubectl get pods -n kube-system -o wide -l k8s-app=kube-dns NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-5644d7b6d9-r9pw7 1/1 Running 1 53d <IP A> <Node A> <none> <none> coredns-5644d7b6d9-w65kz 1/1 Running 1 53d <IP B> <Node B> <none> <none> coredns-5644d7b6d9-x9gnf 1/1 Running 1 53d <IP C> <Node C> <none> <none> coredns-5644d7b6d9-zjpjm 1/1 Running 1 53d <IP D> <Node D> <none> <none> coredns-5644d7b6d9-tkz2q 1/1 Running 1 53d <IP E> <Node E> <none> <none>
What this means is that when your pod does a DNS query, it sends the query to CoreDNS’s ClusterIP which then forwards the query to any one of the CoreDNS pods (which most likely run on different nodes!). If the query is for an external domain (not within the Kubernetes cluster), it will be forwarded to predefined resolvers (usually /etc/resolv.conf on the host/worker node - use
kubectl -n kube-system get configmap coredns -o yaml to confirm).
Now back to our case, what possibly happened was that a service running on, say worker Node A, let a user upload media from a domain that was deemed suspicious by GuardDuty intelligence data. The service sent the domain query to CoreDNS service’s ClusterIP which then happened to forward the query to a CoreDNS pod running on Node E. Since the query was for an external domain, said CoreDNS pod forwarded the query to Amazon DNS Server from Node E where it was running.
As GuardDuty monitors DNS logs from instances’ perspective, all it saw was that a DNS query was made from Node E so when the domain matched the threat list, an alert was created and sent us on a wild goose chase that fine day. It wasn’t really GuardDuty fault, we only had our lack of understanding for DNS resolution in Kubernetes to blame. However, it would be nice if GuardDuty could monitor CoreDNS logs too ;)
Oh well, another day another lesson!