Implementing a general-purpose policy-based control across the stack

6 min readJun 7, 2020

What is the need?

In organizations, big or small, you will always need policies to regulate the access and usage of your services. Let’s take an example of a platform like Kubernetes. Your developers have full access to the cluster as they need to develop new code and debug existing one in case of any incidents. So, the developers can run vulnerable images in the pod leading to security issue, or create a conflicting ingress hostname leading to a service outage, or they might create resources with no resource limit leading to resource overrun. So there is a need for governing policies to mitigate such issues.

Many coders still try to implement these policies by hard-coding them inside the service code but it naturally brings a lot of challenges with it. What will you do if the policies are revised, will you update the service code and plan a new release of the component? What if the revised policies affect a lot of components, will you update all that code? Your policies might depend on external resources, say databases, how will you incorporate that? Will you give your component access to these databases?

So there is clearly a need for an external policy agent which could govern the operation of different users and services. Not just that, we need an agent which is compatible with all the components of the services in your architecture.

The solution

My goto solution for meeting the policy-based control requirements is Open Policy Agent (OPA) which unifies policy enforcement across the stack. Being a developer who loves to tweak around with a lot of languages, I love its compatibility across a really vast ecosystem. You can view this whole ecosystem here. And the secret behind the extensive compatibility lies in the simplicity. OPA can accept any JSON and can output any JSON thus making it usable under multiple domains.

So how does it work?

The architecture of OPA is created such that the policy enforcement stays decoupled from the decision-making. The application services can query the OPA component and receive a JSON response from it stating whether the request abides by the policy or not.

If you are working with Go, then you may use OPA just as any other Go Library. However, if you are creating your service in any other language, then it is advised that you run OPA as a sidecar or as a host-level daemon. The idea here is to keep OPA near to your services in order to reduce latency and have minimal impact on performance. Furthermore, OPA will keep the policy and data in-memory so it need not call any other service until and unless configured so.
Now, this does not mean that we keep the policy only in memory and hope that the setup never dies. There is a golden source created for holding the updated policies and different service APIs are created to interact with it. These APIs includes

Bundle service API — To obtain the latest version of the policies from OPA
Status service API — To check the status of the daemon whether it is working well with the latest version of a policy or has encountered an error while doing so
Log service API — To get audit logs of all the decisions made by the policy. Any decision made by OPA gets recorded and batches of those policy decisions are periodically uploaded to the decision log service which can be used for audit, debugging and testing purposes.

How do I create a policy in OPA?

OPA policies are written in Rego, a high-level declarative language made exclusively for OPA. Rego allows us to create policy more like a code than just a set of instructions/statements. Just like any other language, it comes with a lot of toolchains, compatibility with IDEs and its own testing framework. So let us try and create a policy code in Rego and for this, I would recommend that you open the Rego Playground on the side.

Let’s take a use case where we have to apply policy so that a person can view his/her own salary and the salaries of those who work under their management. Rest all accessed should be denied. For this, first, we start by importing a base package

package play

Now we will create an allow policy so that a person can view his/her own salary

allow = true {
 input.method = “GET”
 input.path = [“salary”,employee_id]
 input.user = employee_id
}

So the above-mentioned code will accept requests with method GET, a path containing salary & the person whose salary has to be fetched and a user who is making the request. By setting allow = true here, we are allowing any request where the user and path have the same person.

You can test this code and it will work if both the names are the same by proving this input

{
 “method”: “GET”,
 “path”: [“salary”,”dinesh”],
 “user”: “dinesh”
}

but it will give undefined if they are not. This is happening because we have no such case mentioned in the policy, so we will set a default case of false for it like this

default allow = false

Next, we need to create a policy which allows users to view their subordinate employee’s salaries. For this, we need to create some kind of hierarchy through a dataset.

managers = {
 “gilfoyle”: {“richard”,”erlich”},
 “richard”: {“erlich”},
}

Here we are showing that Richard and Erlich are managers of Gilfoyle and Erlich is the manager of Richard. This piece of information can also be fetched from a dataset depending upon the size of data and how frequently this data changes. Now, we create an allow policy for this too

allow = true {
 input.method = “GET”
 input.path = [“salary”,employee_id]
 managers[employee_id][input.user]
}

And so our policies are ready. You can test the policy by providing the following inputs

{
 “method”: “GET”,
 “path”: [“salary”,”gilfoyle”],
 “user”: “richard”
}

The above-mentioned input will return true. For testing a false case, try the following input

{
 “method”: “GET”,
 “path”: [“salary”,”erlich”],
 “user”: “richard”
}

As mentioned before, you can also add test cases to check the correctness of your policy like this

test_allow {
 allow with input as {“method”: “GET”, “user”: “richard”, “path”: [“salary”,”gilfoyle”]}
}

On completing the code, it should look like this

Note: These policies can be shipped as it is or can be built into web assembly using the OPA cli tool using build command

Let’s try another piece of policy…

This is a pre-written code in the Rego Playground which you can look up from the Examples tab. On breaking it down…

deny[msg]

deny here is the key for the output that this policy will return. If we write just deny then the value for this key will be a boolean. By adding [msg] we can add a custom message in the response value.

some i

some keyword is used for iterating over the variable i. Since a template for Kubernetes is sent as input, we will use its hierarchy order to access its components. So, to access the value of kind in

“request”: {
  “kind”: {
    “kind”: “Pod”

we will use

 input.request.kind.kind == “Pod”

This will perform the check that whether the resource type is Pod or not.
Now to check if the image provided is authorized (starting with hooli.com/) or not, we will assign the image name to a variable called image and use the startswith function to perform the check. If the check fails, we will assign a custom message to msg variable that we provided in deny[msg]

image := input.request.object.spec.containers[i].image
 not startswith(image, “hooli.com/”)
 msg := sprintf(“Image ‘%v’ comes from untrusted registry”, [image])

And with that, our policy and this article are completed! Go to the Rego Playground and check out their pre-configured examples for more.