This guide was originally published in the Tetragon documentation. It was first part of the getting started guides, but was later replaced and move to the tutorials section, that was later removed. Since we don’t want to maintain this guide anymore, it would have a good end of life on a blog if it can be useful to some people.
Note that this guide is not a tutorial on how to deploy Tetragon standalone (i.e. without Kubernetes), you can see the container deployment and package deployment guides for that. This is just a walkthrough to try and experiment Tetragon for the first time.
This guide has been tested on Ubuntu 22.04 and 22.10 with respectively kernel
5.15.0
and 5.19.0
on amd64 and arm64 but any recent
distribution
shipping with a relatively recent kernel should work. See the FAQ for further
details on the recommended kernel
versions.
Note that you cannot run Tetragon using Docker Desktop on macOS because of a limitation of the Docker Desktop Linux virtual machine. Learn more about this issue and how to run Tetragon on a Mac computer in this section of the FAQ page.
Start Tetragon
The easiest way to start experimenting with Tetragon is to run it via Docker using the released container images. If you prefer running directly the binaries without Docker, please refer to the development setup to see how to build the binaries.
docker run --name tetragon-container --rm --pull always \
--pid=host --cgroupns=host --privileged \
-v /sys/kernel/btf/vmlinux:/var/lib/tetragon/btf \
quay.io/cilium/tetragon:v1.0.2
Let’s break down the previous command options:
Option | Description |
---|---|
--name tetragon-container | Names our container so that we can refer to it later via a simple name. |
--rm | Deletes the container once it exits. |
--pull always | Forces Docker to download the latest existing image corresponding to the latest mutable tag. |
--pid=host | Needed to read the procfs for gathering context on processes started before Tetragon. |
--cgroupns=host | Needed to detect container feature. |
--privileged | Needed for Tetragon to load its BPF programs. |
-v /sys/kernel/btf/vmlinux:/var/lib/tetragon/btf | Mounts the BTF file inside the Tetragon container filesystem. Tetragon needs a BTF file corresponding to your kernel version to load its BPF programs. |
quay.io/cilium/tetragon:v1.0.2 | Version v1.0.2 of Tetragon. |
Observe Tetragon base events
With this default configuration, Tetragon already loaded its base sensors to perform process lifecycle observability.
To quickly see the events, you can use the tetra
CLI already shipped in the
Tetragon container that was just started, it will connect to the Tetragon gRPC
server listening on localhost:54321
. In another terminal, type the following
command:
docker exec tetragon-container tetra getevents -o compact
In a different terminal, you can start a shell like sh
and start executing
programs. Here, the commands executed in the shell where ls
, uname -a
,
whoami
, and exit
, the output should be similar to this:
🚀 process /usr/bin/sh
🚀 process /usr/bin/ls
💥 exit /usr/bin/ls 0
🚀 process /usr/bin/uname -a
💥 exit /usr/bin/uname -a 0
🚀 process /usr/bin/whoami
💥 exit /usr/bin/whoami 0
💥 exit /usr/bin/sh 0
You can see the start of sh
first, then the execution of the multiple
programs, ls
, uname
and whoami
, and finally the exit of sh
.
Note that, since Tetragon monitors all processes on the host, you may observe events unrelated to what was typed in the shell session. Indeed, Tetragon is monitoring every process start and exit in your environment.
Write a TracingPolicy to extend Tetragon events
Tetragon by default observes only process lifecycles. More advanced use-cases, require configuring Tetragon via TracingPolicies. A TracingPolicy is a user-configurable Kubernetes custom resource (CR) that allows users to access additional functionality, such as tracing arbitrary events in the kernel and, optionally, define actions to take on a match. These actions can be, for example, used to implement enforcement.
For more details about TracingPolicies and how to write them, refer to the TracingPolicy documentation.
Observability with TracingPolicy
Let’s start with a simple policy:
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "cat-open"
spec:
kprobes:
- call: "sys_openat"
syscall: true
args:
- index: 0
type: "int"
- index: 1
type: "string"
- index: 2
type: "int"
selectors:
- matchBinaries:
- operator: In
values:
- "/usr/bin/cat"
Copy this policy in a file named tracing_policy.yaml
in your current working
directory and restart Tetragon with the following command:
docker run --name tetragon-container --rm --pull always \
--pid=host --cgroupns=host --privileged \
-v $PWD/tracing_policy.yaml:/tracing_policy.yaml \
-v /sys/kernel/btf/vmlinux:/var/lib/tetragon/btf \
quay.io/cilium/tetragon:v1.0.2 \
--tracing-policy /tracing_policy.yaml
We added two flags to our commands:
-v $(pwd)/tracing_policy.yaml:/tracing_policy.yaml
mounts the created local file into the container filesystem.--tracing-policy /tracing_policy.yaml
indicates Tetragon to load the policy in the provided file.
Now again, in another terminal, open the tetra
CLI to see the new events
generated by Tetragon, monitoring the openat
system call.
docker exec tetragon-container tetra getevents -o compact
And in the third terminal, just read the tracing_policy.yaml
file we created
using cat with:
cat tracing_policy.yaml
The output from the tetra
CLI should be similar to:
🚀 process /usr/bin/cat tracingpolicy.yaml
📬️ openat /usr/bin/cat /etc/ld.so.cache
📬️ openat /usr/bin/cat /lib/aarch64-linux-gnu/libc.so.6
📬️ openat /usr/bin/cat tracingpolicy.yaml
💥 exit /usr/bin/cat tracingpolicy.yaml 0
Note that, the syscall we choose is openat
and not open
because the glibc
and most of the implementations of the standard C library use the openat
syscall for their open
wrapper. So cat
on Linux is using openat
.
We can see that during the execution of cat
, because it is dynamically
linked, it used the openat
syscall to open:
- the cache file for shared libraries
ld.so.cache
; - the
libc.so.6
library; - the file we actually wanted to read
tracing_policy.yaml
.
This is the kind of information we could obtain by executing the program with
strace
for example, but here Tetragon obtains this information directly from
the kernel, transparently for the executed program.
Please note that tetra
CLI, with the -o compact
output format,
automatically tried to decode the second argument as a string and printed it
because it has the hardcoded knowledge that the openat
syscall has an
interesting string as second argument.
You can use tetra getevents
to retrieve the whole JSON events and see the
complete fields and values retrieved from the event. In our situation the
output should be similar to the following.
{
"process_kprobe": {
"process": {
"exec_id": "OjEwMTgzOTUyNjg1NjcwNDoyNjIzNw==",
"pid": 26237,
"uid": 1000,
"cwd": "/home/ubuntu",
"binary": "/usr/bin/cat",
"arguments": "tracing_policy.yaml",
"flags": "execve clone",
"start_time": "2023-04-06T18:30:03.405730761Z",
"auid": 1000,
"parent_exec_id": "OjEwMTczMDEyNTQ3MjUxMDoyNjIwOA==",
"refcnt": 1
},
"parent": {
"exec_id": "OjEwMTczMDEyNTQ3MjUxMDoyNjIwOA==",
"pid": 26208,
"uid": 1000,
"cwd": "/home/ubuntu",
"binary": "/bin/bash",
"flags": "execve clone",
"start_time": "2023-04-06T18:28:14.004347210Z",
"auid": 1000,
"parent_exec_id": "OjEwMTczMDEwMjc2NTg0NjoyNjIwNw==",
"refcnt": 2
},
"function_name": "__x64_sys_openat",
"args": [
{
"int_arg": -100
},
{
"string_arg": "tracing_policy.yaml"
},
{
"int_arg": 0
}
],
"action": "KPROBE_ACTION_POST"
},
"time": "2023-04-06T18:30:03.406205829Z"
}
You can see that Tetragon collected information on the process itself, with for
example, the uid, the binary name, the start_time and much more. It also
indicates process parent, /bin/bash
in this case. This is the common base of
information we also get from the process lifecycle sensors. Finally, we can see
that Tetragon hooked the function __x64_sys_openat
(for an arm64 host it
would be __arm64_sys_openat
) and that it also collected the arguments’ value.
Enforcement with TracingPolicy
Let’s see how we can modify the previous TracingPolicy to do enforcement with Tetragon.
Please note that this example is just an illustration and should not be used in
a production environment. There are many challenges in building a
production-grade policy, that go beyond the scope of this guide. For example,
filtering an argument for security can be difficult, especially with file paths.
The openat
syscall can take relative paths as arguments and even absolute path
filtering can often be bypassed by pseudo filesystems like procfs, for example
using a prefix like /proc/self/root/
to access /
.
Overriding return values
First, let’s say we don’t want to stop any program that would try to open the
tracing_policy.yaml
but just stop them from actually using the syscall to
open the file, as if they were not unauthorized to access the file. We could
write the following TracingPolicy.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "file-access"
spec:
kprobes:
- call: "sys_openat"
syscall: true
args:
- index: 0
type: "int"
- index: 1
type: "string"
selectors:
- matchBinaries:
- operator: In
values:
- "/usr/bin/cat"
matchArgs:
- index: 1
operator: "Equal"
values:
- "tracing_policy.yaml"
matchActions:
- action: Override
argError: -1
In the above TracingPolicy, we removed the third argument because we don’t
need to extract it specifically, and we added two filters in our existing
selector. One matchArgs
to filter on the value of the index 1 argument, which
contains the pathname to access, and one matchActions
to perform on action on
a such match.
Replace the content of the policy in the tracing_policy.yaml
file with the
above and restart Tetragon with the same command as before. If we open tetra
with docker exec tetragon-container tetra getevents -o compact
and perform a
cat tracing_policy.yaml
on the side, we should now observe an output similar
to:
🚀 process /usr/bin/cat tracing_policy.yaml
📬️ openat /usr/bin/cat tracing_policy.yaml
💥 exit /usr/bin/cat tracing_policy.yaml 1
And the cat tracing_policy.yaml
should return the following error:
cat: tracing_policy.yaml: Operation not permitted
Sending a SIGKILL to the process
Now, let’s say that accessing this file in this manner is critical and we want
to stop any process that tries to perform such action immediately. We just need
to modify the action
in the matchActions
filter from Override
and add
Sigkill
.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "file-access"
spec:
kprobes:
- call: "sys_openat"
syscall: true
args:
- index: 0
type: "int"
- index: 1
type: "string"
selectors:
- matchBinaries:
- operator: In
values:
- "/usr/bin/cat"
matchArgs:
- index: 1
operator: "Equal"
values:
- "tracing_policy.yaml"
matchActions:
- action: Override
argError: -1
- action: Sigkill
Again, replace the content of the policy in the tracing_policy.yaml
file with
the above and restart Tetragon with the same command as before. If we open
tetra with docker exec tetragon-container tetra getevents -o compact
and
perform a cat tracing_policy.yaml
on the side, we should now observe an
output similar to:
🚀 process /usr/bin/cat tracing_policy.yaml
📬️ openat /usr/bin/cat tracing_policy.yaml
💥 exit /usr/bin/cat tracing_policy.yaml SIGKILL
And the cat tracing_policy.yaml
should return an error (that is actually
returned by the shell that invoked the program):
Killed
Now the process will not have the time to catch the error and will be synchronously terminated as soon as he triggers the kernel function we hooked with our specified argument values.
Again, the specific filtering is only given as an example, just reading the
file using cat ./tracing_policy.yaml
will bypass the policies presented here.