Traefik Load Balanced VMware Cloud Director 10.x

I have a lab environment running VMware Cloud Director 10.x and wanted to leverage my existing Traefik server for SSL, Reverse Proxy, and Load Balancing my 3 server nodes. This article steps through my configuration and provides an example, working (anonymized) dynamic file based Traefik configuration file.

Overview

The lab environment consiste of the following:

  • PhotonOS VM (Updated docker, installed docker-compose)
  • Firewall Rule allowing ports 443 and 8443
  • NAT Rule forwarding ports 443 and 8443 to internal IP address of the PhotonOS VM that runs Traefik
  • 3 Node VMware Cloud Director 10.x using shared Internal issued SSL Certificate and connected to NFS for Transfer storage

Traefik v2 Server details

I like to run Traefik on a Docker host since it is super easy and fast to recover if anything were to go wrong. All I need is the info in this blog post to get the server back up and running. From my PhotonOS template to fully running/configured reverse proxy is measured in minutes :smile:

I use Traefik static config to specify my entrypoints, enable the api and dashboard, set log file paths, enable healthcheck url, enable file and docker providers, configure my Let’s Encrypt resolver, and allow certificate verification to be skipped (IE: don’t object to self-signed or internal CA issued SSL certificates).

File System

Here’s what my folder tree looks like for this setup:

 1❯ tree
 2.
 3├── config
 4│   ├── acme.json        <--- generated file
 5│   ├── logs
 6│   │   ├── access.log   <--- generated file
 7│   │   └── traefik.log  <--- generated file
 8│   ├── traefik.d
 9│   │   └── 10-vcd.yml
10│   └── traefik.yaml
11├── docker-compose.yml
12└── env
13   └── aws.env

Note that a few files are generated. The others are files you must create. The generated files are:

  • config/acme.json - contains the SSL Certificates that were issued by LetsEncrypt
  • config/logs/access.log - Traefik web server access log
  • config/logs/traefik.log - Traefik server log

The files you must create are listed below and sample contents are shown below the list:

  • docker-compose.yml
  • env/aws.env
  • config/traefik.yaml
  • config/traefik.d/10-vcd.yml

docker-compose.yml

Getting the right combination of switches and parameters can be confusing. There’s quite a bit of documentation though. Hopefully this cleaned version of my docker-compose.yml file can help you out.

 1version: '3'
 2
 3services:
 4  reverse-proxy:
 5    # The official v2 Traefik docker image
 6    image: traefik:v2.5.5
 7    # If you don't specify an appropriate restart value, then upon host reboot, your container will not auto-start
 8    restart: always
 9    # Provide a friendly name for the container to allow for easier troubleshooting
10    container_name: traefik
11    # Since I use Route53 for DNS, I load up a restricted API account via environment variables and store them in a config file
12    env_file: ./env/aws.env
13    networks:
14      - proxy
15    ports:
16      # The HTTPS port for VMware Cloud Director and 8443 for the VM Console Proxy
17      - "443:443"
18      - "8443:8443"
19      # The Web UI (enabled by --api.insecure=true), port 8080 NOT permitted through Firewall
20      - "8080:8080"
21    labels:
22      - traefik.enable=true
23    volumes:
24      # So that Traefik can listen to the Docker events - helpful if you run additional containers 
25      - /var/run/docker.sock:/var/run/docker.sock
26      - "./config:/etc/traefik"
27
28networks:
29  proxy:
30    external: true

Environment file containing DNS Keys (env/aws.env)

The aws.env file contains the necessary contents for use with the route53 DNS provider so that traefik can do the necessary DNS validation for generating the LetsEncrypt Wildcard SSL Certificates using the required DNS-01 Challenge.

1AWS_ACCESS_KEY_ID=(insert your AWS Access key ID here)
2AWS_SECRET_ACCESS_KEY=(Insert your AWS Secret Access Key here)
3AWS_PROPAGATION_TIMEOUT=300
4AWS_POLLING_INTERVAL=60

Static Traefik config (config/traefik.yaml)

Be sure to review the comments within the following yaml, and adjust to meet your environment needs.

 1global:
 2  checkNewVersion: true
 3  sendAnonymousUsage: true
 4
 5serversTransport:
 6  # Allow self-signed/Internal CA Issued Certs to be used easily for back-end connections
 7  insecureSkipVerify: true
 8
 9api:
10    insecure: true
11    dashboard: true
12    debug: false
13
14log:
15  filePath: "/etc/traefik/logs/traefik.log"
16  level: INFO 
17  format: common
18  # format: json
19
20accessLog:
21  filePath: "/etc/traefik/logs/access.log"
22  format: json
23  filters:    
24    statusCodes:
25      - "200"
26      - "300-302"
27    retryAttempts: true
28    minDuration: "10ms"
29
30ping:
31  entryPoint: "traefik"
32
33entryPoints:
34  web:
35    address: ":80"
36    http:
37      redirections:
38        entryPoint:
39          to: secureweb
40          scheme: https
41    proxyProtocol:
42      insecure: true
43    forwardedHeaders:
44      insecure: true
45  secureweb:
46    address: ":443"
47  vcd-console:
48    address: ":8443"
49
50providers:
51  # Enable the file provider to define routers / middlewares / services in file
52  # Each *.yml file placed in this directory will be dynamically read and applied by Traefik!
53  file:
54    directory: /etc/traefik/traefik.d
55    watch: true
56    debugLogGeneratedTemplate: true
57  docker:
58    exposedbydefault: false
59    network: proxy
60
61certificatesResolvers:
62  cert-resolver:
63    acme:
64      email: "your-email@example.com"
65      storage: "/etc/traefik/acme.json"
66      # Staging Server: Keep the staging server enabled until your config is validated and working. Then comment it and uncomment the production server
67      caServer: "https://acme-staging-v02.api.letsencrypt.org/directory"
68      # Production Server: Keep this commented until you have validated with the staging server above
69      # caServer: https://acme-v02.api.letsencrypt.org/directory
70      # For wildcard cert, must use dnsChallenge
71      dnsChallenge:
72        provider: "route53"
73        delayBeforeCheck: 60
74        disablePropagationCheck: false
75        # Check your DNS settings to get the authoritative
76        # DNS Server IPs for your domain and fill in the IP
77        # Addresses here
78        resolvers:
79          - "dns.server.1.ip:53"
80          - "dns.server.2.ip:53"
81          - "dns.server.3.ip:53"
82          - "dns.server.4.ip:53"

Full Dynamic Config for VMware Cloud Director (config/traefik.d/10-vcd.yml)

Once you have the static configuration setup, you can move on to the dynamic file based configuration. I like this option because I can easily add new config files to generate new routers to direct traffic to additional internal sites as needed without restarting Traefik :)

Be sure to review the comments within the following yaml, and adjust to meet your environment needs.

 1# tcp routing section - needed for the VMware Cloud Director Console Proxy
 2tcp:
 3  routers:
 4    tcp-console:
 5      entryPoints: ["vcd-console"]
 6      rule: "HostSNI(`vcd.example.com`)"
 7      # The vCD Console requires passthrough TLS as it is required to handle its own Cert.
 8      tls:
 9        passthrough: true 
10      service: "vcd-console"
11  services:
12    vcd-console:
13      loadBalancer:
14        servers:
15        - address: "192.168.110.156:8443"
16        - address: "192.168.110.158:8443"
17        - address: "192.168.110.160:8443"
18# http routing section
19http:
20  routers:
21    vcd-ssl:
22      entryPoints: ["secureweb"]
23      # Lower priority is processed last. Setting another config to 1
24      # Allows for specific routers to be processed first as long
25      # as their priority is greater than 1
26      priority: 10
27      rule: "HostRegexp(`vcd.example.com`)"
28      tls:
29        certResolver: cert-resolver
30        # Note that the credentials provided in the env/aws.env must have the necessary permissions to create a DNS record in the Domain specified below.
31        domains:
32          - main: "*.example.com"
33            sans: ["example.com"]
34      service: "vcloud-director"
35
36  services:
37    vcloud-director:
38      loadBalancer:
39        servers:
40        - url: "https://192.168.110.156"
41        - url: "https://192.168.110.158"
42        - url: "https://192.168.110.160"
43        healthCheck:
44          path: /cloud/server_status
45          port: 443
46          interval: "10s"
47          timeout: "3s"
48        passHostHeader: true
49        responseForwarding:
50          flushInterval: "3s"

As time permits, I will update/revise this article to make it flow a bit better. I hope you found the article helpful!

As a follow-on to setting this up, I would highly encourage you to also learn How to enable logrotate on PhotonOS.