[Tutorial] Deploying Enterprise-Grade High-Availability K8S Clusters Based on Ansible
This is a development document intended for developers and AI, reposted from my documentation site. Original address:
The development environment for this article is Linux, using the
microCLI to edit files. Please adjust according to your own system environment.
Basic Concepts
About Ansible
Ansible is an agentless automation tool that writes configurations and changes as clear, repeatable tasks. It excels at consistent configuration across multiple hosts and is also suitable for application deployment and batch operations. When used with load balancers, it can break down complex changes into controllable rolling steps.
Ansible is very, very suitable for deploying and managing HAProxy ~

About Kubernetes and RKE2
Kubernetes (K8s) is a container orchestration system responsible for core capabilities such as scheduling, service discovery, rolling updates, and self-healing. Its goal is to standardize the way distributed applications run, making O&M processes more controllable.
RKE2 (RKE Government) is a Kubernetes distribution provided by Rancher that complies with consistency standards. By default, it leans more towards security and compliance, making it suitable for production environments.

About Rocky Linux and SELinux
Rocky Linux is an open-source enterprise-grade operating system aimed at maintaining bug-for-bug compatibility with RHEL. It has a stable lifecycle and is suitable for long-running production clusters.

SELinux is a Mandatory Access Control (MAC) mechanism used to finely restrict the access boundaries of processes and resources. Rocky Linux enables it by default in enforcing mode; it is recommended to configure it according to policies rather than disabling it.

Getting Started
Installing Ansible
Install Ansible (using yay as an example):
yay -S ansible
Run ansible --version to view version information.
yun@yun ~/V/a/yunzaixi-dev (main)> ansible --version
ansible [core 2.20.0]
config file = None
configured module search path = ['/home/yun/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.13/site-packages/ansible
ansible collection location = /home/yun/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.13.7 (main, Aug 16 2025, 15:55:01) [GCC 15.2.1 20250813] (/usr/bin/python)
jinja version = 3.1.6
pyyaml version = 6.0.3 (with libyaml v0.2.5)
Ansible is implemented based on Python, so please ensure your development environment has Python configured before installing Ansible.
lablabs.rke2depends on thenetaddrPython package, which needs to be installed separately. Arch Linux can usesudo pacman -S python-netaddr.
Installing Version Management Tools
Install git, gh (using yay as an example):
yay -S git github-cli
Run git version and gh version to view version information.
yun@yun ~/V/a/yunzaixi-dev (main)> git version
git version 2.52.0
yun@yun ~/V/a/yunzaixi-dev (main)> gh version
gh version 2.83.1 (2025-11-13)
https://github.com/cli/cli/releases/tag/v2.83.1
Log in to GitHub:
gh auth login --scopes workflow
Follow the prompts to proceed.
Preparing Cloud Servers
Before everything starts, we need to prepare the cloud servers for deploying the cluster. A minimum viable production-grade HA (control plane + etcd) usually consists of 3 rke2-server nodes (embedded etcd) plus at least one rke2-agent. Therefore, we need at least 4 cloud servers to proceed with the next steps.
For ease of O&M, all systems are unified as RockyLinux.
Reason for choosing RockyLinux: It is an open-source and free enterprise-grade operating system, 100% compatible with RHEL, and is within the RKE2 support matrix.
RKE2 is very lightweight, but has some minimum requirements:
- Two RKE2 nodes cannot have the same node name. By default, the node name is taken from the machine's hostname, so the hostnames of the Linux cloud servers must not be the same.
- Each cloud server should have at least 2 Core CPU, 4 GB RAM, and use SSD as the hard disk.
- Open specific firewall ports.
Configuring SSH Config
Add the following code to your system SSH Config (fill in the public IP address of the cloud server at HostName):
Host rke2-server1
HostName <Your Public IP Address 1>
User root
Host rke2-server2
HostName <Your Public IP Address 2>
User root
Host rke2-server3
HostName <Your Public IP Address 3>
User root
Host rke2-agent1
HostName <Your Public IP Address 4>
User root
Host rke2-agent2
HostName <Your Public IP Address 5>
User root
The above code configures SSH aliases for all cloud servers, which greatly simplifies future O&M operations. Next, upload the SSH public key to the target servers:
ssh-copy-id rke2-server1
ssh-copy-id rke2-server2
ssh-copy-id rke2-server3
ssh-copy-id rke2-agent1
ssh-copy-id rke2-agent2
If you have reinstalled the system before, you might need to clean the SSH fingerprints first:
ssh-keygen -R rke2-server1
ssh-keygen -R rke2-server2
ssh-keygen -R rke2-server3
ssh-keygen -R rke2-agent1
ssh-keygen -R rke2-agent2
Follow the prompts to proceed.
Once completed, you can log in to all cloud servers without a password:
ssh rke2-server1
ssh rke2-server2
ssh rke2-server3
ssh rke2-agent1
ssh rke2-agent2
Prompt after login: not using post-quantum key exchange algorithms will be vulnerable in the future (that's very futuristic), ignore this.
** WARNING: connection is not using a post-quantum key exchange algorithm.
** This session may be vulnerable to "store now, decrypt later" attacks.
** The server may need to be upgraded. See https://openssh.com/pq.html
Last failed login: ~~ from ~~ on ssh:notty There were 31 failed login attempts since the last successful login.
Initializing Ansible Project
Initializing Repository
First, create a folder; assume the project name is rke2-ansible.
yun@yun ~/V/a/y/p/ansible (main)> mkdir rke2-ansible
yun@yun ~/V/a/y/p/ansible (main)> ls
rke2-ansible/
Enter the project repository, initialize git, and create a GitHub private repository:
cd rke2-ansible
git init
echo "# rke2-ansible" > README.md
git add .
git commit -m "chore: initial commit"
gh repo create rke2-ansible --private --source=. --remote=origin --push
The following code block is optional and is used to declare the newly created code repository as a submodule:
cd ..
rm -rf rke2-ansible/
git submodule add https://github.com/yunzaixi-dev/rke2-ansible.git ./rke2-ansible
Planning Directory Structure
Next, plan the project structure:
mkdir -p inventories/prod \
group_vars \
host_vars \
playbooks \
roles
Create empty files:
touch ansible.cfg \
requirements.yml \
inventories/prod/hosts.yml \
group_vars/all.yml \
group_vars/rke2_servers.yml \
group_vars/rke2_agents.yml \
host_vars/rke2-server1.yml \
playbooks/site.yml \
playbooks/ping.yml \
playbooks/update-packages.yml \
playbooks/set-hostname.yml \
playbooks/disable-ssh-password.yml
The directory structure is as follows:
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> tree
.
├── ansible.cfg
├── group_vars
│ ├── all.yml
│ ├── rke2_agents.yml
│ └── rke2_servers.yml
├── host_vars
│ └── rke2-server1.yml
├── inventories
│ └── prod
│ └── hosts.yml
├── playbooks
│ ├── disable-ssh-password.yml
│ ├── ping.yml
│ ├── site.yml
│ ├── update-packages.yml
│ └── set-hostname.yml
├── README.md
├── requirements.yml
└── roles
Description of each directory and file:
ansible.cfg: Ansible global configuration, specifies inventory and roles_path.requirements.yml: Galaxy dependency list, used to install thelablabs.rke2role.inventories/prod/hosts.yml: Production environment host inventory and grouping.group_vars/*.yml: Host group variables, used for cluster common parameters and server/agent respectively.host_vars/rke2-server1.yml: Single machine variables, used to declare the initialization of the first control plane.playbooks/site.yml: Deployment entry point, including system preparation and RKE2 installation process.playbooks/ping.yml: Connectivity check Playbook, used to verify host reachability.playbooks/update-packages.yml: Batch update Playbook, used to upgrade system software packages.playbooks/set-hostname.yml: Batch set hostname, preserving-and cleaning illegal characters.playbooks/disable-ssh-password.yml: Disable SSH password login, only allow key login.roles/: Directory for roles downloaded by Galaxy.
Installing Galaxy Role
micro requirements.yml :
roles:
- name: lablabs.rke2
version: "1.49.1"
lablabs.rke2is a community-maintained RKE2 Role. GitHub repository address: https://github.com/lablabs/ansible-role-rke2. It encapsulates official installation scripts and service management logic. Pinning it to1.49.1ensures the deployment process is reproducible and reduces uncertainty from upstream updates.
Install dependencies:
ansible-galaxy role install -r requirements.yml -p roles
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> ansible-galaxy role install -r requirements.yml -p
roles
Starting galaxy role install process
- downloading role 'rke2', owned by lablabs
- downloading role from https://github.com/lablabs/ansible-role-rke2/archive/1.49.1.tar.gz
- extracting lablabs.rke2 to /home/yun/Vaults/admin/yunzaixi-dev/project/ansible/rke2-ansible/roles/lablabs.rke2
- lablabs.rke2 (1.49.1) was installed successfully
Configuring Ansible
micro ansible.cfg (interpreter_python path should be adjusted according to your own situation):
[defaults]
inventory = inventories/prod/hosts.yml
remote_user = root
host_key_checking = False
roles_path = ./roles
forks = 10
timeout = 30
deprecation_warnings = False
stdout_callback = default
result_format = yaml
interpreter_python = /usr/bin/python3
Writing Inventory
micro inventories/prod/hosts.yml :
all:
children:
rke2_servers:
hosts:
rke2-server1:
rke2-server2:
rke2-server3:
rke2_agents:
hosts:
rke2-agent1:
rke2-agent2:
rke2_cluster:
children:
rke2_servers:
rke2_agents:
Since the SSH Config was configured earlier, host aliases can be used directly here without filling in
ansible_hostadditionally.
Connectivity Check
micro playbooks/ping.yml :
- name: Ping all hosts
hosts: all
gather_facts: false
tasks:
- name: Ping
ansible.builtin.ping:
Execute:
ansible-playbook playbooks/ping.yml
The output is as follows:
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> ansible-playbook playbooks/ping.yml
PLAY [Ping all hosts] ***********************************************************************
TASK [Ping] *********************************************************************************
ok: [rke2-agent1]
ok: [rke2-agent2]
ok: [rke2-server2]
ok: [rke2-server1]
ok: [rke2-server3]
PLAY RECAP **********************************************************************************
rke2-agent1 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-agent2 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server1 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server2 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server3 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Batch Setting Hostnames
Hostnames cannot contain
_.
micro playbooks/set-hostname.yml :
- name: Set hostname from SSH alias
hosts: all
become: true
vars:
raw_hostname: "{{ inventory_hostname | lower }}"
hostname_from_alias: "{{ raw_hostname | regex_replace('[^a-z0-9-]', '') | regex_replace('^-+', '') | regex_replace('-+$', '') }}"
tasks:
- name: Ensure hostname is not empty
ansible.builtin.assert:
that:
- hostname_from_alias | length > 0
fail_msg: "Derived hostname is empty. Check inventory_hostname: {{ inventory_hostname }}"
- name: Set hostname
ansible.builtin.hostname:
name: "{{ hostname_from_alias }}"
Execute:
ansible-playbook playbooks/set-hostname.yml
The results are as follows:
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> ansible-playbook playbooks/set-hostname.yml
PLAY [Set hostname from SSH alias] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-server3]
ok: [rke2-server2]
ok: [rke2-server1]
ok: [rke2-agent2]
ok: [rke2-agent1]
TASK [Ensure hostname is not empty] *********************************************************
ok: [rke2-server1] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [rke2-server2] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [rke2-server3] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [rke2-agent1] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [rke2-agent2] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [Set hostname] *************************************************************************
changed: [rke2-agent1]
changed: [rke2-server1]
changed: [rke2-server3]
changed: [rke2-server2]
changed: [rke2-agent2]
PLAY RECAP **********************************************************************************
rke2-agent1 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-agent2 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server1 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server2 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server3 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Disabling SSH Password Login (Optional)
Before execution, please confirm that key login has been configured to avoid being locked out of the server.
micro playbooks/disable-ssh-password.yml :
- name: Disable SSH password authentication
hosts: all
become: true
tasks:
- name: Write SSH hardening config
ansible.builtin.copy:
dest: /etc/ssh/sshd_config.d/99-disable-password.conf
mode: "0644"
content: |
PasswordAuthentication no
KbdInteractiveAuthentication no
ChallengeResponseAuthentication no
notify: Restart sshd
- name: Validate sshd config
ansible.builtin.command: sshd -t
changed_when: false
handlers:
- name: Restart sshd
ansible.builtin.service:
name: sshd
state: restarted
Execute:
ansible-playbook playbooks/disable-ssh-password.yml
The output is as follows:
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> ansible-playbook playbooks/disable-ssh-password.yml
PLAY [Disable SSH password authentication] **************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-agent1]
ok: [rke2-server3]
ok: [rke2-agent2]
ok: [rke2-server1]
ok: [rke2-server2]
TASK [Write SSH hardening config] ***********************************************************
changed: [rke2-server3]
changed: [rke2-agent1]
changed: [rke2-server2]
changed: [rke2-server1]
changed: [rke2-agent2]
TASK [Validate sshd config] *****************************************************************
ok: [rke2-server3]
ok: [rke2-agent1]
ok: [rke2-server2]
ok: [rke2-agent2]
ok: [rke2-server1]
RUNNING HANDLER [Restart sshd] **************************************************************
changed: [rke2-server2]
changed: [rke2-server3]
changed: [rke2-server1]
changed: [rke2-agent2]
changed: [rke2-agent1]
PLAY RECAP **********************************************************************************
rke2-agent1 : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-agent2 : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server1 : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server2 : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server3 : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Batch Updating System Packages and Rebooting (Recommended)
Applicable to scenarios already on Rocky Linux 9 where only system software packages need to be updated. If no reboot is required, set
reboot_after_updatetofalse.
micro playbooks/update-packages.yml :
- name: Update Rocky Linux packages
hosts: all
become: true
serial: 1
vars:
reboot_after_update: true
tasks:
- name: Update package metadata
ansible.builtin.dnf:
update_cache: true
- name: Upgrade all packages
ansible.builtin.dnf:
name: "*"
state: latest
- name: Remove unneeded packages
ansible.builtin.dnf:
autoremove: true
- name: Clean package cache
ansible.builtin.command: dnf clean all
changed_when: false
- name: Reboot after update (optional)
ansible.builtin.reboot:
reboot_timeout: 3600
when: reboot_after_update
Execute:
ansible-playbook playbooks/update-packages.yml
The output is as follows:
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> ansible-playbook playbooks/update-packages.yml
PLAY [Update Rocky Linux packages] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-server1]
TASK [Update package metadata] **************************************************************
ok: [rke2-server1]
TASK [Upgrade all packages] *****************************************************************
ok: [rke2-server1]
TASK [Remove unneeded packages] *************************************************************
ok: [rke2-server1]
TASK [Clean package cache] ******************************************************************
ok: [rke2-server1]
TASK [Reboot after update (optional)] *******************************************************
changed: [rke2-server1]
PLAY [Update Rocky Linux packages] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-server2]
TASK [Update package metadata] **************************************************************
ok: [rke2-server2]
TASK [Upgrade all packages] *****************************************************************
changed: [rke2-server2]
TASK [Remove unneeded packages] *************************************************************
ok: [rke2-server2]
TASK [Clean package cache] ******************************************************************
ok: [rke2-server2]
TASK [Reboot after update (optional)] *******************************************************
changed: [rke2-server2]
PLAY [Update Rocky Linux packages] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-server3]
TASK [Update package metadata] **************************************************************
ok: [rke2-server3]
TASK [Upgrade all packages] *****************************************************************
changed: [rke2-server3]
TASK [Remove unneeded packages] *************************************************************
ok: [rke2-server3]
TASK [Clean package cache] ******************************************************************
ok: [rke2-server3]
TASK [Reboot after update (optional)] *******************************************************
changed: [rke2-server3]
PLAY [Update Rocky Linux packages] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-agent1]
TASK [Update package metadata] **************************************************************
ok: [rke2-agent1]
TASK [Upgrade all packages] *****************************************************************
changed: [rke2-agent1]
TASK [Remove unneeded packages] *************************************************************
ok: [rke2-agent1]
TASK [Clean package cache] ******************************************************************
ok: [rke2-agent1]
TASK [Reboot after update (optional)] *******************************************************
changed: [rke2-agent1]
PLAY [Update Rocky Linux packages] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-agent2]
TASK [Update package metadata] **************************************************************
ok: [rke2-agent2]
TASK [Upgrade all packages] *****************************************************************
changed: [rke2-agent2]
TASK [Remove unneeded packages] *************************************************************
ok: [rke2-agent2]
TASK [Clean package cache] ******************************************************************
ok: [rke2-agent2]
TASK [Reboot after update (optional)] *******************************************************
changed: [rke2-agent2]
PLAY RECAP **********************************************************************************
rke2-agent1 : ok=6 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-agent2 : ok=6 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server1 : ok=6 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server2 : ok=6 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server3 : ok=6 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Deploying RKE2
Writing RKE2 Variables
The
rke2_configoflablabs.rke2is a template path (defaulttemplates/config.yaml.j2); do not write it as a dictionary. Parameters that need to be written toconfig.yamlshould be placed inrke2_server_options/rke2_agent_options.
micro group_vars/all.yml :
rke2_cluster_group_name: "rke2_cluster"
rke2_servers_group_name: "rke2_servers"
rke2_agents_group_name: "rke2_agents"
rke2_channel: "latest"
rke2_version: "v1.34.2+rke2r1"
rke2_token: "CHANGE_ME"
rke2_api_ip: "<LB or server1>"
rke2_additional_sans:
- "<LB or server1>"
rke2_selinux: true
rke2_cni:
- cilium
rke2_tokenis a shared secret used for cluster registration; it must be consistent across all nodes.rke2_api_ipis the control plane entry address: if there is an LB/VIP, fill in the IP or domain name of the LB/VIP; if there is no LB/VIP and each machine has only a fixed single IP, you can fill in the IP/domain name of the first control plane (e.g.,rke2-server1) and add this value torke2_additional_sanssynchronously. This configuration is equivalent to pinning the API to a single node; the control plane entry is not highly available. It is recommended to use LB/VIP for production.rke2_tokencan be generated usingopenssl rand -base64 32. When Rocky Linux has SELinux enabled by default, be sure to setrke2_selinux: trueand ensurecontainer-selinuxis installed. When using Cilium, pointrke2_cnitocilium.
micro group_vars/rke2_servers.yml :
rke2_server_options:
- write-kubeconfig-mode: "0644"
micro group_vars/rke2_agents.yml :
rke2_agent_options:
- node-ip: "{{ ansible_default_ipv4.address }}"
Mark the first control plane as the initialization node, micro host_vars/rke2-server1.yml :
rke2_server_options:
- write-kubeconfig-mode: "0644"
- cluster-init: true
Writing Playbook
micro playbooks/site.yml :
- name: Base setup
hosts: all
become: true
tasks:
- name: Install base packages
ansible.builtin.package:
name:
- curl
- tar
- socat
- conntrack
- iptables
- container-selinux
state: present
- name: Disable swap
ansible.builtin.command: swapoff -a
when: ansible_swaptotal_mb | int > 0
changed_when: false
- name: Remove swap from fstab
ansible.builtin.replace:
path: /etc/fstab
regexp: '^(.*\\sswap\\s.*)$'
replace: '# \\1'
- name: Load br_netfilter
ansible.builtin.modprobe:
name: br_netfilter
state: present
- name: Enable sysctl for Kubernetes
ansible.builtin.sysctl:
name: "{{ item.name }}"
value: "{{ item.value }}"
state: present
reload: true
loop:
- { name: net.bridge.bridge-nf-call-iptables, value: 1 }
- { name: net.bridge.bridge-nf-call-ip6tables, value: 1 }
- { name: net.ipv4.ip_forward, value: 1 }
- name: RKE2 servers
hosts: rke2_servers
become: true
serial: 1
roles:
- role: lablabs.rke2
- name: RKE2 agents
hosts: rke2_agents
become: true
roles:
- role: lablabs.rke2
Deployment and Verification
Executing Deployment
Perform a syntax check first:
ansible-playbook playbooks/site.yml --syntax-check
Execute deployment:
ansible-playbook playbooks/site.yml
Getting kubeconfig
Log in to any control plane node and export the kubeconfig:
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
rke2 kubectl get nodes -o wide
If using kubectl locally, you can copy the kubeconfig:
mkdir -p ~/.kube
scp rke2-server1:/etc/rancher/rke2/rke2.yaml ~/.kube/rke2.yaml
sed -i 's/127.0.0.1/<LB or server1>/g' ~/.kube/rke2.yaml
export KUBECONFIG=~/.kube/rke2.yaml
kubectl get nodes -o wide
At this point, the deployment of the minimum high-availability RKE2 cluster is complete.