Reward Function

As you now have setup and connected your agent to Augento, we can start with the Reinforcement Learning preparation.

For this we will need a reward function, that, during training, will grade how good or bad the models outputs were.

For example to teach a model how to write correct programming language code, the reward function could simply return a binary based on if the code output of the model compiles.

example_reward.py


from pl import compiler
 
async def reward(completion: Completion) -> float:
    try:
        compiler.compile(completion)
        return 1
    except:
        return 0

Setup A Reward Function Server

On the Augento Plaform, a reward function is simply a REST endpoint with a POST route, taking the completion of the model as input and returning a scalar reward as output.

It is your choice to host it on your own machines (preferable when the reward function needs to access your environment) or on a compute platform (we recommend fly.io )

To get you started quickly, we provide templates in python and typescript:

Python

Reward Function Endpoint

The interface of the POST route on the reward function server has to adhere to the following specification.

Request body

prompt_messages	Previous conversation, in the exact same format as specified by the OpenAI API
completion	The completion, outputted by the model during training, expecting to get a reward by the reward function server
extra_data	Additional data that adds context to the verification function

exampleRequest


{
    "prompt_messages": [
      {
        "role": "developer",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "completion": "Hello! I am a useful language model",
    "extra_data": {
      "prompt_id": "example-001",
      "notes": "Sample request for grading"
    }
}

Returns

reward

A scalar reward that grades the completion of the model

exampleResponse


{
    "reward": "0.5",
}