Tong's Work Notes

Wednesday, March 9, 2022

Analyzing Istio Performance

Based on the instruction from this link. One can find some performance information, the following two things one can do to help when running the tool in a server which does not have browser. 1. Specify an IP address which can be reached from outside of the machine, for example, the original command looks like this

go tool pprof -http=:8888 localhost:8080/debug/pprof/heap

One can use a specific IP address to allow access from outside of the machine which is running the tool

go tool pprof -http=192.168.56.32:8888 localhost:8080/debug/pprof/heap

2. When running in the server env. -no_browser option probably will be nice to avoid the warning messages from the process.

go tool pprof -no_browser -http=192.168.56.32:8888 localhost:8080/debug/pprof/heap

Monday, February 21, 2022

Run Istio Integration test in debug mode in VSCode

Assume that you've setup your istio project in VSCode, and also run go mod vendor,

you can do the following to debug step-by-step in VSCode.

1. Create a kind k8s cluster

2. Create an Istio integration k8s cluster topology file named single.json, like this.

[
  {
    "kind": "Kubernetes",
    "clusterName": "istio-testing",
    "network": "istio-testing",
    "meta": {
      "kubeconfig": "/home/ubuntu/.kube/config"
    }
  }
]

Notice that the kubeconfig field, the value should be the kube config file

3. Now in VSCode, make sure that you have the following in your settings.

    "go.buildTags": "integ",
    "go.testFlags": ["-args", "--istio.test.kube.topology=/home/ubuntu/test/single.json", "--istio.test.skipVM"],

Now if you nagivate to an integration test go file in VSCode, you should be able to click on the codelens `debug test` to start debugging your code.

=================================================

For multiple cluster integration tests, the following few items will need to be taken care of:

1. Create multiple clusters using this script with a topology json file

2. To use the code that you as a developer just built (such as istioctl or docker images such as pilot and proxyv2), you will need to make sure that these images get preloaded into the clusters created in the above step #1. If you use the scripts described in that step, then the newly built images should be loaded onto the cluster automatically.

3. In each test setup (most likely in TestMain method), you will need to setup how the tag and hub should be so that the process will use these images correctly. Otherwise, the process will use the public images not the ones that you just built.

To do this, you most likely will need to do the followings:

a. Create a new method like this

func enableMCSServiceDiscovery(t resource.Context, cfg *istio.Config) {
    cfg.Values["global.tag"] = "1.15-dev"
    cfg.Values["global.imagePullPolicy"] = "IfNotPresent"
    cfg.Values["global.hub"] = "istio"
    cfg.ControlPlaneValues = fmt.Sprintf(`
values:
  pilot:
    env:
      PILOT_USE_ENDPOINT_SLICE: "true"
      ENABLE_MCS_SERVICE_DISCOVERY: "true"
      ENABLE_MCS_HOST: "true"
      ENABLE_MCS_CLUSTER_LOCAL: "true"
      MCS_API_GROUP: %s
      MCS_API_VERSION: %s`,
        common.KubeSettings(t).MCSAPIGroup,
        common.KubeSettings(t).MCSAPIVersion)
}

b. Then call that method in the TestMain method's Setup call like this. Notice one of the Setup method call uses the method enableMCSServiceDiscovery which got defined above.

func TestMain(m *testing.M) {
    framework.
        NewSuite(m).
        Label(label.CustomSetup).
        RequireMinVersion(17).
        RequireMinClusters(2).
        Setup(common.InstallMCSCRDs).
        Setup(istio.Setup(&i, enableMCSServiceDiscovery)).
        Setup(common.DeployEchosFunc("mcs", &echos)).
        Run()
}

4. Make sure that VSCode settings file, contains the go.buildTags, go.testFlags settings like the followings

    "go.buildTags": "integ",
    "go.testFlags": ["-args", "--istio.test.kube.topology=/tmp/work/topology.json", "--istio.test.skipVM"],

5. Once these above steps done, you can simply click on the debug test button (the codelens) above the MainTest method.

====================================================

If want to run the test locally (not via vscode codelens), you can do the following:

1. Found a PR from istio project on github which run successfully, there should be many integration tests, such as the following:

Notice that any test start with integ is an integration tests, you can pick any of them, then go to the raw build-log.txt file, in this file, you should be able to find the cluster topology file content. Then create a json file with the content like below:

[
    {
      "kind": "Kubernetes",
      "clusterName": "config",
      "podSubnet": "10.20.0.0/16",
      "svcSubnet": "10.255.20.0/24",
      "network": "network-1",
      "primaryClusterName": "external",
      "configClusterName": "config",
      "meta": {
        "kubeconfig": "/tmp/work/config"
      }
    },
    {
      "kind": "Kubernetes",
      "clusterName": "remote",
      "podSubnet": "10.30.0.0/16",
      "svcSubnet": "10.255.30.0/24",
      "network": "network-2",
      "primaryClusterName": "external",
      "configClusterName": "config",
      "meta": {
        "fakeVM": false,
        "kubeconfig": "/tmp/work/remote"
      }
    },
    {
      "kind": "Kubernetes",
      "clusterName": "external",
      "podSubnet": "10.10.0.0/16",
      "svcSubnet": "10.255.10.0/24",
      "network": "network-1",
      "primaryClusterName": "external",
      "configClusterName": "config",
      "meta": {
        "fakeVM": false,
        "kubeconfig": "/tmp/work/external"
      }
    }
  ]
 

Save the above content into a file such as topology.json

2. Then you should be able to find a command like the following:

go test -p 1 -v -count=1 -tags=integ -vet=off ./tests/integration/pilot/... \
  -timeout 30m --istio.test.skipVM --istio.test.ci --istio.test.pullpolicy=IfNotPresent \
  --istio.test.work_dir=/tmp/work --istio.test.hub=istio --istio.test.tag=1.15-dev \
  --istio.test.kube.topology=/tmp/work/topology.json "--istio.test.select=,-postsubmit"

3. Change the parameters of the above command to fit your own env. Pay special attention to parameters like istio.test.tag, istio.test.hub, making changes based on your own build. In the above command, I built istio images locally and tagged them like public istio images, and preloaded into the k8s clusters, so that everything is ready to go.

4. The parameter ./tests/integration/pilot/... indicates what tests will be run, that must be a directory from the source tree. It will normally contain multiple TestMain methods, each TestMain method is considered as test suite. When it starts, you should see something like the following:

2022-04-29T14:47:53.132024Z info    tf  === DONE: Building clusters ===
2022-04-29T14:47:53.132029Z info    tf  === BEGIN: Setup: 'pilot_analysis' ===
2022-04-29T14:47:53.132083Z info    tf  === BEGIN: Deploy Istio [Suite=pilot_analysis] ===

that should give you some clear indication what test suite it is running. If there is any error, you can find that test suite in the source code and start debugging using VSCode. Notice that normally one integration test may contain many test suites, that is, as stated above, many TestMain methods in that directory or sub directory.

Friday, February 18, 2022

Check disk space usages

Run the following command to show which folder uses spaces exceeded G bits

du -h ~ 2>/dev/null | grep '[0-9\.]\+G'

Friday, January 28, 2022

Envoy configurations

Envoy can use a set of APIs to update configurations, without any downtime or restart. Envoy only needs a simple bootstrap configuration file, which directs configurations to the proper discovery service API. Other settings are dynamically configured. Envoy's dynamic configuration APIs are called xDS services and they include:

LDS (Listener): This allows Envoy to query the entire listener. By calling this API, you can dynamically add, modify, and delete known listeners. Each listener must have a unique name. Envoy creates a universally unique identifier (UUID) for any unnamed listener.
RDS (Route): This allows Envoy to dynamically retrieve route configurations. Route configurations include HTTP header modifications, virtual host configurations, and the individual routing rules contained in each virtual host. Each HTTP connection manager can retrieve its own route configurations independently through an API. The RDS configuration, a subset of the LDS, specifies when to use static and dynamic configurations and which route to use.
CDS (Cluster): This is an optional API that Envoy calls to dynamically retrieve cluster-managed members. Envoy coordinates cluster management based on API responses, and adds, modifies, and deletes known clusters as needed. No clusters statically defined in Envoy configurations can be modified or deleted through the CDS API.
EDS(Endpoint): This is a gRPC- or RESTJSON-based API that allows Envoy to retrieve cluster members. It is a subset of the CDS. In Envoy, cluster members are called endpoints. Envoy uses discovery services to retrieve the endpoints in each cluster. EDS is the preferred discovery service.
SDS(Secret): This is an API used to distribute certificates. It simplifies certificate management. In non-SDS Kubernetes deployment, certificates must be created as keys and mounted to Envoy containers. If a certificate expires, its key must be updated and the Envoy container must be redeployed. When the SDS is used, the SDS server pushes certificates to all Envoy instances. If a certificate expires, the SDS server only needs to push the new certificate to the Envoy instance. The Envoy instance then applies the new certificate immediately without redeployment.
ADS(Aggregated): This is used to retrieve all the changes made by the preceding APIs in order from a serialized stream. In essence, the ADS is not an xDS service. Rather, it implements synchronous access to multiple xDS services in a single stream.

You can use one or more xDS services for configuration. Envoy's xDS APIs are designed for eventual consistency, and proper configurations are eventually converged. For example, Envoy may eventually use a new route to retrieve RDS updates, and this route may forward traffic to clusters that have not yet been updated in the CDS. As a result, the routing process may produce routing errors until the CDS is updated. Envoy introduces the ADS to solve this problem. Istio also implements the ADS, which can be used to modify proxy configurations.

The definition of service mesh

A service mesh is a distributed application infrastructure that is responsible for handling network traffic on behalf of the application in a transparent, out of process manner.

Data Plane and Control Plane

The service proxies form the "data plane" through which all traffic is handled and observed. The data plane is responsible for establishing, securing, and controlling the traffic through the mesh. The management components that instruct the data plane how to behave is known as the "control plane". The control plane is the brains of the mesh and exposes an API for operators to manipulate the network behaviors. Together, the data plane and the control plane provide important capabilities necessary in any cloud-native architecture such as:

Service resilience
Observability signals
Traffic control capabilities
Security
Policy enforcement

Figure 1.9. Service mesh architecture with co-located application-layer proxies (data plane) and management components (control plane)

With a service proxy next to each application instance, applications no longer need to have language-specific resilience libraries for circuit breaking, timeouts, retries, service discovery, load balancing, et. al. Moreover, the service proxy also handles metric collection, distributed tracing, and log collection.

Friday, January 14, 2022

Istio component category and its elements

============= CustomResourceDefinition//authorizationpolicies.security.istio.io

- Processing resources for Istio core.
============= CustomResourceDefinition//destinationrules.networking.istio.io
============= CustomResourceDefinition//envoyfilters.networking.istio.io
============= CustomResourceDefinition//gateways.networking.istio.io
============= CustomResourceDefinition//istiooperators.install.istio.io
============= CustomResourceDefinition//peerauthentications.security.istio.io
============= CustomResourceDefinition//proxyconfigs.networking.istio.io
============= CustomResourceDefinition//requestauthentications.security.istio.io
============= CustomResourceDefinition//serviceentries.networking.istio.io
============= CustomResourceDefinition//sidecars.networking.istio.io
============= CustomResourceDefinition//telemetries.telemetry.istio.io
============= CustomResourceDefinition//virtualservices.networking.istio.io
============= CustomResourceDefinition//wasmplugins.extensions.istio.io
============= CustomResourceDefinition//workloadentries.networking.istio.io
============= CustomResourceDefinition//workloadgroups.networking.istio.io
============= ServiceAccount/external-istiod/istio-reader-service-account
============= ServiceAccount/external-istiod/istiod-service-account
============= ClusterRole//istio-reader-external-istiod
============= ClusterRole//istiod-external-istiod
============= ClusterRoleBinding//istio-reader-external-istiod
============= ClusterRoleBinding//istiod-external-istiod
============= Role/external-istiod/istiod-external-istiod
============= RoleBinding/external-istiod/istiod-external-istiod

✔ Istio core installed
============= ServiceAccount/external-istiod/istiod

- Processing resources for Istiod.
============= ClusterRole//istio-reader-clusterrole-external-istiod
============= ClusterRole//istiod-clusterrole-external-istiod
============= ClusterRole//istiod-gateway-controller-external-istiod
============= ClusterRoleBinding//istio-reader-clusterrole-external-istiod
============= ClusterRoleBinding//istiod-clusterrole-external-istiod
============= ClusterRoleBinding//istiod-gateway-controller-external-istiod
============= ValidatingWebhookConfiguration//istio-validator-external-istiod
============= EnvoyFilter/external-istiod/stats-filter-1.11
============= EnvoyFilter/external-istiod/stats-filter-1.12
============= EnvoyFilter/external-istiod/stats-filter-1.13
============= EnvoyFilter/external-istiod/tcp-stats-filter-1.11
============= EnvoyFilter/external-istiod/tcp-stats-filter-1.12
============= EnvoyFilter/external-istiod/tcp-stats-filter-1.13
============= ConfigMap/external-istiod/istio
============= ConfigMap/external-istiod/istio-sidecar-injector
============= Deployment/external-istiod/istiod
============= PodDisruptionBudget/external-istiod/istiod
============= Role/external-istiod/istiod
============= RoleBinding/external-istiod/istiod
============= HorizontalPodAutoscaler/external-istiod/istiod
============= Service/external-istiod/istiod

Wednesday, January 5, 2022

Access K8S rest API using curl command

To get all the pods from a namespace,

curl -k --cacert ca.crt -H "Authorization: Bearer <The token>" https://172.19.0.3:6443/api/v1/namespaces/metallb-system/pods

Where the IP address and port should be the k8s api server IP and port, then the url should follow the naming convention which should be always

/api/<version>/namespaces/<namespace>/<resourcetype>

in the example above, the version is v1, namespace is metallb-system and we are trying to get all the pods.

Use --cacert to indicate an ca certificate file and use -k to allow insecure server connections when use ssl.

Tuesday, January 4, 2022

how does kubernetes_sd_configs actually work?

When a job is configured like the following:

- job_name: 'kubernetes-pods'
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (.+)

This is hard to figure out what it is saying. it turns out that this job basically

create a new url based on the formula of

   ${__scheme__}://${__address__}/${__metrics_path__}

which in many cases __scheme__, __address__, __metrics_path__ are all use

the default value, it makes things even more confusing.

for example scheme is normally http if it is not specified

__metrics_path__ normally defaults to /metrics

__address__ normally defaults to the object IP address.

So for pod type, we are looking at http://${POD_IP}:8080/metrics by default

So, it is up to the person who configure this job to make up the part

of the url by using various actions. For example, in the above configuration

target_label: __metrics_path__ actually uses the pod annotation's

prometheus.io/path value if the pod has such annotation. if a pod does not

have such annotation, then the value of __metrics_path__ obviously will

be an empty string, which most likely wont produce a valid url for prometheus to

retrieve any metrics.

For target_label __scheme__ in the above example, the action is to replace,

so the scheme will be basically whatever the annotation's prometheus.io/scheme

indicates.

where __address__ will be made up by two parts which was made up by the

regular expression using __address__ and pod annotation prometheus.io/port

if that pod indeed has that annotation. The default __address__ is the pod IP

address if nothing get changed.