Uni Machine

We have liased with backstage to have them provide a machine to us, which is running on university equipment.

The credentails can be found by the BOSS committee members under “Uni Server”.

We use this server for hosting any basic small service we can’t easily host for free, such as froom backend. Note that this is in a bit of a grey zone with funding and therefore we must backup everything to an external storage (which we do pay for), therefore even if we have 100GB of storage, we want to use as little as possible.

Setup

We use k3s for deployment of a kubernetes cluster. The full install/update script is:

Machine configuration

Created file /etc/sysctl.d/90-kubelet.conf with:

vm.panic_on_oom=0
vm.overcommit_memory=1
kernel.panic=10
kernel.panic_on_oops=1

Then run

sysctl -p /etc/sysctl.d/90-kubelet.conf

RESOLVE MADNESS

Originally when setting up the kubernetes config, we have issues where requesting any domain e.g. https://google.co.uk returned a self signed certificate and 404 page if you ignored that. But on the host machine it was fine.

Upon investigation, we found that su-srv-01.bath.ac.uk was in our search list in resolvectl (meaning if you requested https://google.co.uk. it responded correctly).

You can see the search parameters by going

resolvectl status

So we had to change this to localdomain by:

sudo resolvectl domain enp5s0 localdomain

And once that was done everything worked perfectly.

SSH Security

We are currently allowing password authentication on the machine. This is not a recommended practice, and should change once we have our own hardware. To help mitigate this issue, we have set up fail2ban and use the default settings and sshd jail.

To set this up, you can do the following:

# install fail2ban
sudo apt install fail2ban

# verify install, you should get a big help text
fail2ban-client -h

# verify default jail exists, you should see the `sshd` jail is enabled
cat /etc/fail2ban/jail.d/defaults-debian.conf

# enable the fail2ban service
sudo systemctl enable --now fail2ban

# verify service is running
sudo systemctl status fail2ban

This should mean repeated SSH authentication failures result in an IP ban.

For configuration, see the man pages for fail2ban-client(1), fail2ban(1), and jail.conf(5).

We also disable root login over SSH by setting the following in /etc/ssh/sshd_config:

PermitRootLogin no

K3S config

Also created k3s config file early in /etc/rancher/k3s/config.yaml:

protect-kernel-defaults: true
secrets-encryption: true
kube-apiserver-arg:
  - "enable-admission-plugins=NodeRestriction,EventRateLimit"
  - "admission-control-config-file=/var/lib/rancher/k3s/server/psa.yaml"
  - "audit-log-path=/var/lib/rancher/k3s/server/logs/audit.log"
  - "audit-policy-file=/var/lib/rancher/k3s/server/audit.yaml"
  - "audit-log-maxage=30"
  - "audit-log-maxbackup=10"
  - "audit-log-maxsize=100"
  - "service-account-extend-token-expiration=false"
kube-controller-manager-arg:
  - "terminated-pod-gc-threshold=100"
kubelet-arg:
  - "streaming-connection-idle-timeout=5m"
  - "tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305"
tls-san:
  - <INSERT INTERNAL IP>
  - k8s.bathcs.com

Network Policies

CIS requires that all namespaces have a network policy applied that reasonably limits traffic into namespaces and pods.

So basically in /var/lib/rancher/k3s/server/manifests we set a basic network policy for kube-system, all other policies will be controlled by our terraform deployment.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: intra-namespace
  namespace: kube-system
spec:
  podSelector: {}
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system

Pod security configuration

As defined in the configuration above, we must make /var/lib/rancher/k3s/server/psa.yaml which stores the pod security policy. Ours is:

apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
  - name: PodSecurity
    configuration:
      apiVersion: pod-security.admission.config.k8s.io/v1beta1
      kind: PodSecurityConfiguration
      defaults:
        enforce: "restricted"
        enforce-version: "latest"
        audit: "restricted"
        audit-version: "latest"
        warn: "restricted"
        warn-version: "latest"
      exemptions:
        usernames: []
        runtimeClasses: []
        namespaces: [kube-system, cis-operator-system]
  - name: EventRateLimit
    path: /var/lib/rancher/k3s/server/event.yaml

With /var/lib/rancher/k3s/server/event.yaml being:

apiVersion: eventratelimit.admission.k8s.io/v1alpha1
kind: Configuration
limits:
  - type: Namespace
    qps: 50
    burst: 100
    cacheSize: 2000
  - type: User
    qps: 10
    burst: 50

Audit config

As defined in the config file, we need to have logging for the API, so we need to define /var/lib/rancher/k3s/server:

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  - level: Metadata

Installation command

Then finally to install k3s we use the command:

curl -sfL https://get.k3s.io | sh -s - server --disable=traefik

Explanation

We don’t need to support IPv6 (sorry if we ever want to make that transition, please see my blog for how to do)
We want to enable ALL optional security stuff (see k3s’ hardening guide)
Don’t need to setup --cluster-init because we only have 1 server
SELinux is not installed
We want to manage traefik manually thereforew we won’t have k3s deploy it

Updates

K3S

K3s backups are handled automatically, using the k3s-upgrade system, therefore you shouldn’t need to touch it at all, and it should automatically get the latest updates.

There is risk in this, for example, k3s versions have broken niche use cases in the past, but hopefully we are not using those use cases.

Server

We also use ubuntu’s automatic updates for the server to reduce any reliance on human intervention. This again does come with some risks, but we believe that these risks are worthwhile to not require students to run any commands every few weeks.

This is configured in /etc/apt/apt.conf.d/50unattended-upgrades (the configuration how upgrades should be run) and /etc/apt/apt.conf.d/20auto-upgrades (for actually enabling it).

We have also enabled automatic reboots if necessary with:

Unattended-Upgrade::Automatic-Reboot "true";

As well as allow downgrades in case there was a security incident with a package:

Unattended-Upgrade::Allow-downgrade "true";

Note: we have not setup automatic emails if something does happen, so this could be a good exploration for anyone who has time.

Deploying infrastructure

Please see the page on terraform to see how we deploy our infrastructure and any relavent links.

Backups

Backups are handled per persistant volume on an opt in basis using k8up. Our namespace module has a way of defining whether the volumes in that namespace should be updated.

By default this creates a daily backup, weekly check and weekly pruning. Our namespace module has a way of defining whether the volumes in that namespace should be updated.

By default this creates a daily backup, weekly check and weekly pruning, using restic behind the scenes (so all backups are encrypted. The password for the encryption cannot just be stored in vaulttub (due to vaulttub being the thing backed up), therefore it is printed out and stored within the BCSS locker as a backup.

These then go to an s3 bucket managed by scaleway.

Restorations

I will just refer to k8ups documentation on restoration due to the potential of it being changed in the future.