Uni Machine
We have liased with backstage to have them provide a machine to us, which is running on university equipment.
The credentails can be found by the BOSS committee members under “Uni Server”.
We use this server for hosting any basic small service we can’t easily host for free, such as froom backend. Note that this is in a bit of a grey zone with funding and therefore we must backup everything to an external storage (which we do pay for), therefore even if we have 100GB of storage, we want to use as little as possible.
Setup
We use k3s for deployment of a kubernetes cluster. The full install/update script is:
Machine configuration
Created file /etc/sysctl.d/90-kubelet.conf with:
vm.panic_on_oom=0vm.overcommit_memory=1kernel.panic=10kernel.panic_on_oops=1Then run
sysctl -p /etc/sysctl.d/90-kubelet.confRESOLVE MADNESS
Originally when setting up the kubernetes config, we have issues where requesting any domain e.g. https://google.co.uk returned a self signed certificate and 404 page if you ignored that. But on the host machine it was fine.
Upon investigation, we found that su-srv-01.bath.ac.uk was in our search list in resolvectl (meaning if you requested https://google.co.uk. it responded correctly).
You can see the search parameters by going
resolvectl statusSo we had to change this to localdomain by:
sudo resolvectl domain enp5s0 localdomainAnd once that was done everything worked perfectly.
SSH Security
We are currently allowing password authentication on the machine. This is not a recommended practice, and should change once we have our own hardware. To help mitigate this issue, we have set up fail2ban and use the default settings and sshd jail.
To set this up, you can do the following:
# install fail2bansudo apt install fail2ban
# verify install, you should get a big help textfail2ban-client -h
# verify default jail exists, you should see the `sshd` jail is enabledcat /etc/fail2ban/jail.d/defaults-debian.conf
# enable the fail2ban servicesudo systemctl enable --now fail2ban
# verify service is runningsudo systemctl status fail2banThis should mean repeated SSH authentication failures result in an IP ban.
For configuration, see the man pages for fail2ban-client(1), fail2ban(1), and jail.conf(5).
We also disable root login over SSH by setting the following in /etc/ssh/sshd_config:
PermitRootLogin noK3S config
Also created k3s config file early in /etc/rancher/k3s/config.yaml:
protect-kernel-defaults: truesecrets-encryption: truekube-apiserver-arg: - "enable-admission-plugins=NodeRestriction,EventRateLimit" - "admission-control-config-file=/var/lib/rancher/k3s/server/psa.yaml" - "audit-log-path=/var/lib/rancher/k3s/server/logs/audit.log" - "audit-policy-file=/var/lib/rancher/k3s/server/audit.yaml" - "audit-log-maxage=30" - "audit-log-maxbackup=10" - "audit-log-maxsize=100" - "service-account-extend-token-expiration=false"kube-controller-manager-arg: - "terminated-pod-gc-threshold=100"kubelet-arg: - "streaming-connection-idle-timeout=5m" - "tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305"tls-san: - <INSERT INTERNAL IP> - k8s.bathcs.comNetwork Policies
CIS requires that all namespaces have a network policy applied that reasonably limits traffic into namespaces and pods.
So basically in /var/lib/rancher/k3s/server/manifests we set a basic network policy for kube-system, all other policies will be controlled by our terraform deployment.
kind: NetworkPolicyapiVersion: networking.k8s.io/v1metadata: name: intra-namespace namespace: kube-systemspec: podSelector: {} ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-systemPod security configuration
As defined in the configuration above, we must make /var/lib/rancher/k3s/server/psa.yaml which stores the pod security policy. Ours is:
apiVersion: apiserver.config.k8s.io/v1kind: AdmissionConfigurationplugins: - name: PodSecurity configuration: apiVersion: pod-security.admission.config.k8s.io/v1beta1 kind: PodSecurityConfiguration defaults: enforce: "restricted" enforce-version: "latest" audit: "restricted" audit-version: "latest" warn: "restricted" warn-version: "latest" exemptions: usernames: [] runtimeClasses: [] namespaces: [kube-system, cis-operator-system] - name: EventRateLimit path: /var/lib/rancher/k3s/server/event.yamlWith /var/lib/rancher/k3s/server/event.yaml being:
apiVersion: eventratelimit.admission.k8s.io/v1alpha1kind: Configurationlimits: - type: Namespace qps: 50 burst: 100 cacheSize: 2000 - type: User qps: 10 burst: 50Audit config
As defined in the config file, we need to have logging for the API, so we need to define /var/lib/rancher/k3s/server:
apiVersion: audit.k8s.io/v1kind: Policyrules: - level: MetadataInstallation command
Then finally to install k3s we use the command:
curl -sfL https://get.k3s.io | sh -s - server --disable=traefikExplanation
- We don’t need to support IPv6 (sorry if we ever want to make that transition, please see my blog for how to do)
- We want to enable ALL optional security stuff (see k3s’ hardening guide)
- Don’t need to setup
--cluster-initbecause we only have 1 server - SELinux is not installed
- We want to manage traefik manually thereforew we won’t have k3s deploy it
Updates
K3S
K3s backups are handled automatically, using the k3s-upgrade system, therefore you shouldn’t need to touch it at all, and it should automatically get the latest updates.
There is risk in this, for example, k3s versions have broken niche use cases in the past, but hopefully we are not using those use cases.
Server
We also use ubuntu’s automatic updates for the server to reduce any reliance on human intervention. This again does come with some risks, but we believe that these risks are worthwhile to not require students to run any commands every few weeks.
This is configured in /etc/apt/apt.conf.d/50unattended-upgrades (the configuration how upgrades should be run) and /etc/apt/apt.conf.d/20auto-upgrades (for actually enabling it).
We have also enabled automatic reboots if necessary with:
Unattended-Upgrade::Automatic-Reboot "true";As well as allow downgrades in case there was a security incident with a package:
Unattended-Upgrade::Allow-downgrade "true";Note: we have not setup automatic emails if something does happen, so this could be a good exploration for anyone who has time.
Deploying infrastructure
Please see the page on terraform to see how we deploy our infrastructure and any relavent links.
Backups
Backups are handled per persistant volume on an opt in basis using k8up. Our namespace module has a way of defining whether the volumes in that namespace should be updated.
By default this creates a daily backup, weekly check and weekly pruning. Our namespace module has a way of defining whether the volumes in that namespace should be updated.
By default this creates a daily backup, weekly check and weekly pruning, using restic behind the scenes (so all backups are encrypted. The password for the encryption cannot just be stored in vaulttub (due to vaulttub being the thing backed up), therefore it is printed out and stored within the BCSS locker as a backup.
These then go to an s3 bucket managed by scaleway.
Restorations
I will just refer to k8ups documentation on restoration due to the potential of it being changed in the future.