Skip to Content
WorkflowDefine A Reward Function

Reward Function

As you now have setup and connected your agent to Augento, we can start with the Reinforcement Learning preparation.

For this we will need a reward function, that, during training, will grade how good or bad the models outputs were.

For example to teach a model how to write correct programming language code, the reward function could simply return a binary based on if the code output of the model compiles.

example_reward.py
from pl import compiler async def reward(completion: Completion) -> float: try: compiler.compile(completion) return 1 except: return 0

Setup A Reward Function Server

On the Augento Plaform, a reward function is simply a REST endpoint with a POST route, taking the completion of the model as input and returning a scalar reward as output.

It is your choice to host it on your own machines (preferable when the reward function needs to access your environment) or on a compute platform (we recommend fly.io)

To get you started quickly, we provide templates in python and typescript:

Reward Function Endpoint

The interface of the POST route on the reward function server has to adhere to the following specification.

Request body

prompt_messagesPrevious conversation, in the exact same format as specified by the OpenAI API
completionThe completion, outputted by the model during training, expecting to get a reward by the reward function server
extra_dataAdditional data that adds context to the verification function
exampleRequest
{ "prompt_messages": [ { "role": "developer", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ], "completion": "Hello! I am a useful language model", "extra_data": { "prompt_id": "example-001", "notes": "Sample request for grading" } }

Returns

rewardA scalar reward that grades the completion of the model
exampleResponse
{ "reward": "0.5", }
Last updated on