Seccomp Security ဆိုတာဘာလဲ

Linux Security မှာ seccomp (Secure Computing), AppArmor (Application Armor) နဲ့ SELinux (Security-Enhanced Linux) ဆိုပြီး ၃ မျိုးရှိတယ်။ အဲဒီထဲမှာ စလေ့လာမယ်ဆိုရင် seccomp က beginner-friendly အဖြစ်ဆုံးပဲ။ ဒါပေမယ့် တခုသုံးရင် ကျန်တာတွေ သုံးဖို့မလိုတော့တာတော့ မဟုတ်ဘူး။ ဥပမာ AppArmor နဲ့ SELinux တွေက ဘယ် file တွေကို program တခုက access လုပ်ခွင့်ရှိတယ် မရှိဘူး သတ်မှတ်တာဖြစ်ပြီး seccomp ကတော့ program တခုက ဘယ် syscall တွေခေါ်လို့ရလဲဆိုတာကို သတ်မှတ်တာ။ ဒီတော့ တွဲသုံးရင် maximum security ကိုရမှာပေါ့လေ။ ဒီနေ့ကတော့ seccomp နဲ့ပတ်သက်ပြီး အခြေခံသဘောတရား နားလည်သွားအောင် ပြောပြမယ်။

program (process) တခုက သူ့ကိုယ်သူ seccomp mode လို့ကြေညာလိုက်တာနဲ့ syscall လေးခုပဲ လုပ်လို့ရတော့တယ်။ read (file descriptor ဆီကနေ data ဖတ်ဖို့)၊ write (file descriptor ကို data ရေးဖို့)၊ _exit (process ကိုရပ်ဖို့)၊ sigreturn (signal handler ကနေ execution state ကို ပြန်ဝင်ဖို့) ဆိုတဲ့ လေးခုပေါ့။

ရှေ့ ၃ ခုက နားလည်လွယ်ပေမယ့် နောက်ဆုံးတခုက ဒီလို။ တကယ်လို့ ကိုယ့် program က Signal Interrupt ကိုဖမ်းဖို့ handler တခုကို signal နဲ့ register လုပ်ထားတယ်ဆိုပါတော့။ အဲဒီ process ဆီကို SIGINT ပို့လိုက်တဲ့အခါ kernel က program ကိုရပ်လိုက်ပြီး လက်ရှိ CPU register state, program counter နဲ့ stack pointer စတဲ့ execution context ကို သိမ်းလိုက်တယ်။ ပြီးရင် control ကို စောစောက program ထဲမှာ register လုပ်ထားတဲ့ handler ဆီ jump လို့ရအောင် signal handler trampoline တခုဆောက်လိုက်တယ်။ အဲဒီ handler ထဲကအထွက်မှာ sigreturn ကိုခေါ်လိုက်ရင် kernel ကခုနက သိမ်းထားတဲ့ context အတိုင်းပြန်ပြင်ဆင်ပေးမှာမို့ program က interrupt လုပ်ခံထားရတဲ့ နေရာကနေ ဆက် run သွားလို့ရတယ်ပေါ့။ ဒီတော့ Ctrl + C ရိုက်ရင်တောင် ရပ်မသွားမယ့် program မျိုးတွေ ရေးလို့ရတယ်။

#include <stdio.h>
#include <signal.h>

void handler(int sig) {
    printf("Caught SIGINT! Not exiting\n");
}

int main() {
    signal(SIGINT, handler);
    while (1) { }
}

SIGKILL တို့ဘာတို့ကိုတော့ အဲလိုဖမ်းပြီး ဘာမှမဖြစ်သလို ဆက် run နေလို့မရဘူးပေါ့လေ။

အဲဒီ syscall လေးခုကိုကျော်ပြီး ခေါ်တဲ့အခါ kernel ကခေါ်တဲ့ thread ကို kill ပလိုက်တယ်။ နောက်ပိုင်း implementation တွေမှာဆိုရင် အဲ thread ရှိတဲ့ process လိုက်ကြီးပါ kill ပစ်တယ်။ ဒါပေမယ့် လက်တွေ့မှာ program တခု ကောင်းကောင်းအလုပ်လုပ်နိုင်ဖို့ဆိုရင် ဒီ syscall လေးခုတည်းနဲ့ လုံလောက်လေ့မရှိဘူး။ ဒီတော့ seccomp-bpf ဆိုတဲ့ implementation အသစ်ထပ်ထွက်လာတယ်။ bpf implementation မှာတော့ kernel က seccomp mode ကြေညာထားတဲ့ process ဆီက syscall တခုရောက်လာရင် attached လုပ်ထားတဲ့ BPF program ကို run တယ်။ run လို့ရလာတဲ့ status အပေါ် မူတည်ပြီးတော့မှ syscall တခုကို allow မလား။ log မလား။ ရိုးရိုး deny မလား။ kill ရမှာလားစသဖြင့် ဆုံးဖြတ်တယ်။

seccomp profile တွေက 350-400 လောက်ရှိတဲ့ C API syscall တွေကို စနစ်တကျ နားမလည်ထားရင် ရေးရခက်နိုင်ပေမယ့် အင်တာနက်ပေါ်မှာ အသင့်သုံး profile တွေအများကြီး ရှိပါတယ်။ Kubernetes ပေါ်မှာဆိုရင်လည်း RuntimeDefault ဒါမှမဟုတ် LocalHost စတဲ့နာမည်တွေနဲ့ default profile ရှိပြီးသားပါ။

အဓိကက high-risk ဖြစ်တဲ့ syscall တွေ ဥပမာ kernel image အသစ်ကို memory ပေါ်တင်ပြီး reboot တွေ BIOS တွေ bootloader တွေမလိုပဲ kernel အသစ်ကို တန်းချိန်းနိုင်တဲ့ kexec_load လိုမျိုး။ system ကို restart ချနိုင်တဲ့ reboot လိုဟာ။ eBPF program တွေ run နိုင်တဲ့ bpf ရယ် နောက်ပြီး kernel module တွေနဲ့ပတ်သက်တဲ့ init_module, delete_module စသဖြင့်၊ host filesystem ကို access ရသွားစေနိုင်တဲ့ mount, umount2 ဒါမျိုး syscall တွေခေါ်လာရင် kill ပစ်ဖို့လုပ်ထားသင့်ပါတယ်။ တခုရှိတာက eBPF routing ကို implement လုပ်လာကြတဲ့ CNI plugin တွေမှာဆိုရင်တော့ သူတို့ရဲ့ agent (Daemon Set ရဲ့ pod) တွေက clone, bpf, perf_event_open စတဲ့ high-risk syscall တွေကို လိုအပ်တယ်။ အဲတော့ ကိုယ့်ရဲ့ သီးသန့် hardening လုပ်ထားတဲ့ node တွေမှာ ဒါတွေကို run မယ်ဆိုရင် seccomp profile ကိုသေချာရေးဖို့လိုပါမယ်။