Google have built-in response Safety features. These can be adjusted by adding a specific section to the LLM Service body definition for Google vendor services defined in LLMAsAService.io.
How to configure in a service
For the request body template, add the following (setting the levels using the values shown below). In this case, it lessens the policy to trigger only on high thresholds of detection -
{
"safetySettings" : [
{
"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_ONLY_HIGH"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_ONLY_HIGH"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_ONLY_HIGH"
}
]
}
Categories
The different categories for response analysis are given the following keys -
HARM_CATEGORY_HARASSMENT - Negative or harmful comments targeting identity and/or protected attributes.
HARM_CATEGORY_HATE_SPEECH - Content that is rude, disrespectful, or profane.
HARM_CATEGORY_SEXUALLY_EXPLICIT - Contains references to sexual acts or other lewd content.
HARM_CATEGORY_DANGEROUS_CONTENT - Promotes, facilitates, or encourages harmful acts.
Levels
The different detection levels a are negligible, low, medium, and high. Through our testing the default if not specified is medium. We have had some business cases where that has been too limiting, and based on our context, limiting (we weren't taking user prompts, we were asking for silly business ideas, and that sometime triggered harassment or dangerous to medium).
The supported levels are specified for each category as one of the following -
BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE
BLOCK_ONLY_HIGH
BLOCK_NONE
โ