ishish.io
ishish.io

What is now proved
was once, only imagin'd

- William Blake

Revolutionizing Risk Assessment with LLMs — Part 3: Initial prompt and tracing

In the previous episode we created an interface and the context for interacting with the Claude 2.x LLM via AWS Bedrock. We performed an initial interaction with the model in order to create a joke about Julius Caesar. In the following episode we will create and refine the prompt for building the actual risk register.

We will start working with minimal versions of the Asset Inventory and Risk Scenario Register shown below.

Go ahead and create these examples in Google Sheets or similar application and export them as CSV.

Now, we want Claude to connect assets to scenarios and evaluate the risk according to a quantitative formula:

Risk = Business Impact + Likelihood

The resulting table should contain a list of risks. Let’s start with the following prompt:

human_str = """
Please connect the following assets:
{assets}
to the following scenarios:
{scenarios}
Please only connect scenarios that are relevant to the assets.
Please calculate the risk score for each connection based on formula:
Risk = Probability + Impact
Please output the result in the following format:
<risks>
 <risk num=1>
   <asset>Sample asset 1</asset>
   <scenario>Sample scenario 1</scenario>
   <risk_score>3</risk_score>
 </risk>
   <asset>Sample asset 2</asset>
   <scenario>Sample scenario 2</scenario>
   <risk_score>4</risk_score>
 <risk num=2>
 </risk>
   <asset>Sample asset 3</asset>
   <scenario>Sample scenario 3</scenario>
   <risk_score>1</risk_score>
 <risk num=3>
 </risk>
 ... // more risks
</risks>
"""

prompt = ChatPromptTemplate.from_messages([
 ("human", human_str)
 ])

When we execute the prompt by pressing the “Create Risk Register” button we get the following error:

In order to understand what went wrong it would be helpful to examine the interaction in Langchain. Let’s create a project, generate an API key and add necessary environment variables to the .env file (and properly export them, of course).

After we execute the chain again we can now see access and explore recorded traces coming in the Langchain project:

We can see that there is a problem with parsing the output by the XMLOutputParser. Let’s examine the Chat interaction to inspect verbatim representation of inputs and outputs.

As you can see, Claude is returning the requested output but my guess is that the xml declaration on top of the model’s response is stopping the XML parser from properly parsing it back into a python object. Why is Claude adding this even if we didn’t ask it to? Due to lack of more specific examples, it’s following the pattern established by provided inputs that too include the xml declaration.

Let’s adjust:

chain = (prompt | llm | parser)output = chain.invoke({
 'assets' : assets.to_xml(root_name='Assets', row_name='Asset', xml_declaration=False),
 'scenarios' : scenarios.to_xml(root_name='Scenarios', row_name='Scenario', xml_declaration=False)
 })

Then we’ll flatten the output to be able to use it for output DataFrame construction:

flattened_data = []
for item in output['risks']:
   flattened_dict = {}
   for entry in item['risk']:
       flattened_dict.update(entry)
   flattened_data.append(flattened_dict)

return pd.DataFrame(flattened_data)

Now with the sample inputs we get an example output:

Let’s verify if the values are following our requested formula:

Risk 1: 3 + 2 = 5 (correct)
Risk 2: 3 + 3 = 6 (correct)
Risk 3: 3 + 2 = 5 (correct)
Risk 4: 3 + 2 = 5 (correct)
Risk 5: 3 + 3 = 5 (incorrect)
Risk 6: 3 + 3 = 6 (correct)

So, what went wrong?

The answer is, while adding two natural numbers is easy for any traditional algorithm, it’s difficult for the LLM. Let’s remember what LLM is: a huge neural network which primary task at any step of text generation is to calculate the most probable next token. This is far from efficient method for mathematical calculations.

We can solve this problem in a number of ways, e.g.:

  1. just have the LLM connect the assets with scenarios, calculate estimation with traditional algorithm
  2. use LLM function calling for calculations
  3. break down complex task into a number of smaller, easier ones

We will focus on the last option, breaking down a complex task into several. It’s arguably not optimal but it will allow us exercise some techniques for improving the prompt engineering.

Summary

In the following episode, we created and executed an initial prompt for risk calculation according to a defined formula based on test data. We encountered a problem with the output parser and diagnosed it using Langsmith tracing function. Then we encountered another problem which is model not being able to execute a simple mathematic operation.

In the following episode, we’ll adjust our prompt using various prompt engineering techniques to improve the quality of the model output.

ishish.io Copyright 2024