Synthesizer
Quick Summary
DeepEval's Synthesizer
offers a fast and easy to automatically get started with testing your LLM by generating high-quality evaluation datasets (inputs, expected outputs, and contexts) from scratch.
The Synthesizer
class is a synthetic data generator that first uses an LLM to generate a series of input
s, before evolving each input
to make them more complex and realistic. These evolved inputs are then used to create a list of synthetic Golden
s, which makes up your synthetic EvaluationDataset
.
deepeval
's Synthesizer
uses the data evolution method to generate large volumes of data across various complexity levels to make synthetic data more realistic. This method was originally introduced by the developers of Evol-Instruct and WizardML.
For those interested, here is a great article on how deepeval
's synthesizer was built.
Synthesizer Methods Overview
There are 3 approaches a deepeval
's Synthesizer
can generate synthetic Golden
s:
- Generating synthetic
Golden
s using context extracted from documents. - Generating synthetic
Golden
s from a list of provided context. - Generating synthetic
Golden
s from scratch
1. Generating Golden
s from documents
If your application is a Retrieval-Augmented Generation (RAG) system, generating Goldens from documents can be particularly useful, especially if you already have access to the documents that make up your knowledge base. By simply providing these documents, the Synthesizer will automatically handle generating the relevant contexts needed for synthesizing test Goldens.
To generate contexts from user-provided documents, the documents are first split into chunks and embedded to form a collection of nodes. Random nodes are then selected, and for each selected node, similar nodes are retrieved and grouped together to create contexts. These contexts are then used to generate synthetic inputs and outputs, which are evolved to form your Golden test dataset.
When generating contexts from document chunks, you'll need to consider parameters such as chunk_size
and chunk_overlap
, as well as which embedding models to use, and how you'll want to qualify the selected chunks. For more details, see this guide: Using the Synthesizer.
2. Generating Golden
s from contexts
If you already have access to prepared contexts, you can skip the document processing step entirely. Instead, simply provide these contexts to the Synthesizer, and it will handle generating the Goldens directly without the need to process documents.
3. Generating Golden
s from scratch
You can also generate synthetic Goldens from scratch, without needing any documents or contexts. Simply provide the task (the goal of your application), the scenario (the environment or context in which a user interacts), and the desired input format (e.g. web search, text-to-SQL query, conversation). This way, you can customize the Goldens to reflect exactly how you want users to interact with your system.
This approach is particularly useful if your LLM application doesn’t rely on RAG or if you want to test your LLM on queries beyond the existing knowledge base.
Imagine you want to test how well your LLM chatbot can handle text-to-SQL requests. A great place to start is by generating examples from scratch. In this case, you'd want to generate Goldens where the inputs are natural language questions that need to be converted into SQL queries using the following parameters:
task
: answering text-to-SQL queries by accessing a database and returning the resultsscenario
: non-technical users trying to query the database using plain Englishinput_format
: plain english text-to-SQL queries
Note that you can also specify the scenario, task, input format, and expected output format when generating from documents or contexts. However, unlike those methods, you won’t be able to generate the expected output when starting from scratch, as there is no context to serve as the ground truth reference.
Creating a Synthesizer
DeepEval's synthesizer can be used as a standalone or within an EvaluationDataset
. To start generating Goldens using the synthesizer as a standalone, begin by creating a Synthesizer
object:
from deepeval.synthesizer import Synthesizer
synthesizer = Synthesizer()
There are nine optional parameters when creating a Synthesizer
:
- [Optional]
async_mode
: a boolean which when set toTrue
, enables concurrent generation of goldens. Defaulted toTrue
.
Model Parameters
These parameters allow you to specify which model to use for generating synthetic inputs from contexts and evolving them (model
), which model to use for filtering during context generation and synthetic input generation (critic_model
), and which model to use for transforming your document chunks (embedder
).
from deepeval.synthesizer import Synthesizer
class CustomSynthesizerModel(DeepEvalBaseLLM):
...
synthesizer = Synthesizer(
model=CustomSynthesizerModel(),
critic_model="gpt-4o",
embedder="text-embedding-ada-002",
)
- [Optional]
model
: a string specifying which of OpenAI's GPT models to use, OR any custom LLM model of typeDeepEvalBaseLLM
. Defaulted togpt-4o
. - [Optional]
critic_model
: a string specifying which of OpenAI's GPT models to use for quality filtering, OR any custom LLM model of typeDeepEvalBaseLLM
. Defaulted togpt-4o
. - [Optional]
embedder
: a string specifying which of OpenAI's embedding models to use, OR any custom embedding model of typeDeepEvalBaseEmbeddingModel
. Defaulted to 'text-embedding-3-small'.
As you'll learn later, an embedding model is only used when using the generate_goldens_from_docs()
method, so don't worry about the embedder
parameter too much unless you're looking to use your own embedding model.
Filtering Parameters
These parameters allow you to control the quality of both context generation and synthetic input generation by configuring quality thresholds and max retry settings.
from deepeval.synthesizer import Synthesizer
synthesizer = Synthesizer(
context_quality_threshold=0.6,
context_similarity_threshold=0.7,
context_max_retries=4,
synthetic_input_quality_threshold=0.8,
synthetic_input_max_retries=5,
)
Context Filtering:
Context filtering is only applicable when using the generate_from_docs
method.
- [Optional]
context_quality_threshold
: a float representing the minimum quality threshold for context selection. If the context quality is lower than this threshold, the context will be rejected. Defaulted to0.5
. - [Optional]
context_similarity_threshold
: a float representing the minimum similarity score required for context matching (via cosine similarity). Contexts with similarity scores lower than this threshold will not be used. Defaulted to0.5
. - [Optional]
context_max_retries
: an integer that specifies the number of times to retry context generation if it does not meet the required quality or similarity threshold. Defaulted to3
.
Synthetic Input Filtering:
- [Optional]
synthetic_input_quality_threshold
: a float representing the minimum quality threshold for synthetic input generation. Inputs with lower quality will be rejected. Defaulted to0.5
. - [Optional]
synthetic_input_max_retries
: an integer that specifies the number of times to retry synthetic input generation if it does not meet the required quality. Defaulted to3
.
It's important to note that if the quality scores for both the input and context do not meet the required thresholds after the allowed maximum retries, the most recent generation will be used. As a result, you may find generated Goldens that do not meet the minimum thresholds in your final evaluation dataset.
To learn more about how scores for context quality and input quality are generated, see this guide.
Generating Goldens
Once you've created a Synthesizer
object with the desired filtering parameters and models, you can begin generating Goldens.
1. Generating From Documents
You must install chromadb
v0.5.3 as an additional dependency when generating from documents. The use of a vector database allows for for faster indexing and retrieval of chunks during generation.
pip install chromadb==0.5.3
To generate synthetic Golden
s from documents, simply provide a list of document paths:
from deepeval.synthesizer import Synthesizer
synthesizer = Synthesizer()
synthesizer.generate_goldens_from_docs(
document_paths=['example.txt', 'example.docx', 'example.pdf'],
)
There are one mandatory and eleven optional parameters when using the generate_goldens_from_docs
method:
document_paths
: a list strings, representing the path to the documents from which contexts will be extracted from. Supported documents types include:.txt
,.docx
, and.pdf
.- [Optional]
include_expected_output
: a boolean which when set toTrue
, will additionally generate anexpected_output
for each syntheticGolden
. Defaulted toFalse
. - [Optional]
max_goldens_per_context
: the maximum number of goldens to be generated per context. Defaulted to 2. - [Optional]
max_contexts_per_document
: the maximum number of contexts to be generated per document. Defaulted to 3. - [Optional]
chunk_size
: specifies the size of text chunks (in characters) to be considered for context extraction within each document. Defaulted to 1024. - [Optional]
chunk_overlap
: an int that determines the overlap size between consecutive text chunks during context extraction. Defaulted to 0. - [Optional]
num_evolutions
: the number of evolution steps to apply to each generated input. This parameter controls the complexity and diversity of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1. - [Optional]
evolutions
: a dict withEvolution
keys and sampling probability values, specifying the distribution of data evolutions to be used. Defaulted to allEvolution
s with equal probability. - [Optional]
scenario
: a string, specifying the context or setting in which the inputs are fed into the target LLM. - [Optional]
task
: a string, representing task or purpose of the target LLM. - [Optional]
input_format
: a string, representing the ideal structure and format of the inputs to be generated. - [Optional]
expected_output_format
: a string, representing the ideal structure and format of the expected outputs to be generated.
Context Generation:
A token-based text splitter is used to manage document chunking, meaning the chunk_size
and chunk_overlap
parameters do not guarantee exact context sizes. This approach ensures contexts are meaningful and coherent, but might lead to variations in the expected size of each context
.
The synthesizer will raise an error if chunk_size
is too large to generate n=max_contexts_per_document
unique contexts.
Evolutions:
Evolution
is an ENUM
that specifies the different data evolution techniques you wish to employee to make synthetic Golden
s more realistic. DeepEval's synthesizer supports 7 types of evolutions, which are randomly sampled based on a defined distribution. You can apply multiple evolutions to each Golden, and later access the evolution sequence through the Golden's additional metadata field.
from deepeval.synthesizer import Evolution
available_evolutions = {
Evolution.REASONING: 1/7,
Evolution.MULTICONTEXT: 1/7,
Evolution.CONCRETIZING: 1/7,
Evolution.CONSTRAINED: 1/7,
Evolution.COMPARATIVE: 1/7,
Evolution.HYPOTHETICAL: 1/7,
Evolution.IN_BREADTH: 1/7,
}
For example, a Golden might take the following evolution route when num_evolutions
is set to 2 and evolutions
is a dictionary containing Evolution.IN_BREADTH
, Evolution.COMPARATIVE
, and Evolution.REASONING
, with sampling probabilities of 0.4, 0.2, and 0.4, respectively:
For those interested in what these evolutions mean, you can read this article here.
Prompt Customization:
You can also customize the input and expected output for each golden by configuring the scenario, task, and input format parameters, which control the synthetic input, while the expected_output format controls the synthetic expected output.
For example, the following example demonstrates how to use the Synthesizer
to generate synthetic Goldens for a text-to-SQL use case by leveraging the table schema from the context:
synthesizer.generate_goldens_from_docs(
document_paths=['example.txt', 'example.docx', 'example.pdf'],
scenario="Non-technical users trying to query a database using plain English instructions",
task="Answering text-to-SQL-related queries by querying a database and returning the results to users",
input_format="English instructions for retrieving information from a database",
expected_output_format="SQL query based on the given input"
)
The scenario, task, and input format parameters are used to enforce the input structure during generation and after all the evolutions have been performed. The output is then generated from the final evolved input, with the output format enforced.
2. Generating From Provided Contexts
deepeval
also allows you to generate synthetic Goldens
from a manually provided a list of context instead of directly generating from your documents.
This is especially helpful if you already have an embedded knowledge base. For example, if you already have documents parsed and stored in an existing vector database, you may consider handling the logic to retrieve text chunks yourself.
from deepeval.synthesizer import Synthesizer
synthesizer = Synthesizer()
synthesizer.generate_goldens(
# Provide a list of context for synthetic data generation
contexts=[
["The Earth revolves around the Sun.", "Planets are celestial bodies."],
["Water freezes at 0 degrees Celsius.", "The chemical formula for water is H2O."],
]
)
There are one mandatory and seven optional parameters when using the generate_goldens
method:
contexts
: a list of context, where each context is itself a list of strings, ideally sharing a common theme or subject area.- [Optional]
include_expected_output
: a boolean which when set toTrue
, will additionally generate anexpected_output
for each syntheticGolden
. Defaulted toFalse
. - [Optional]
max_goldens_per_context
: the maximum number of goldens to be generated per context. Defaulted to 2. - [Optional]
num_evolutions
: the number of evolution steps to apply to each generated input. This parameter controls the complexity and diversity of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1. - [Optional]
evolutions
: a dict withEvolution
keys and sampling probability values, specifying the distribution of data evolutions to be used. Defaulted to allEvolution
s with equal probability. - [Optional]
scenario
: a string, specifying the context or setting in which the inputs are fed into the target LLM. - [Optional]
task
: a string, representing task or purpose of the target LLM. - [Optional]
input_format
: a string, representing the ideal structure and format of the inputs to be generated. - [Optional]
expected_output_format
: a string, representing the ideal structure and format of the expected outputs to be generated.
3. Generating From Scratch
If you do not have a list of example prompts, or wish to solely rely on an LLM generation for synthesis, you can also generate synthetic Golden
s simply by specifying the scenario, task, and input format you wish your synthetic inputs to follow.
from deepeval.synthesizer import Synthesizer
synthesizer = Synthesizer()
synthesizer.generate_goldens_from_scratch(
scenario="Security engineers generating harmful and toxic prompts, focusing on dark humor, to red-team the LLM's ability to respond safely.",
task="Provide safe and responsible responses as an LLM chatbot to user inputs.",
output_format="Two-sentence response string.",
num_initial_goldens=25,
num_evolutions=2
)
The scenario, task, and input format parameters are inserted into a pre-defined prompt template and will need to be iterated upon for optimal results.
There are four mandatory and three optional parameters when using the generate_goldens_from_docs
method:
scenario
: a string, specifying the context or setting in which the inputs are fed into the target LLM.task
: a string, representing task or purpose of the target LLM.input_format
: a string, representing the ideal structure and format of the inputs to be generated.num_initial_goldens
: the number of initial goldens generated before consequent evolutions.- [Optional]
num_evolutions
: the number of evolution steps to apply to each generated prompt. This parameter controls the complexity and diversity of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1. - [Optional]
evolutions
: a dict withEvolution
keys and sampling probability values, specifying the distribution of data evolutions to be used. Defaulted to allEvolution
s (exceptMULTICONTEXT
) with equal probability.
Convert to DataFrame
To convert your synthetically generated goldens into a DataFrame, simply call the to_pandas
method on the synthesizer object:
dataframe = synthesizer.to_pandas()
print(dataframe)
Sample Dataframe
Here’s an example of what the resulting DataFrame might look like:
input | actual_output | expected_output | input | retrieval_context | n_chunks_per_context | context_length | context_quality | synthetic_input_quality | evolutions | source_file |
---|---|---|---|---|---|---|---|---|---|---|
Who wrote the novel "1984"? | None | George Orwell | ["1984 is a dystopian novel published in 1949 by George Orwell."] | None | 1 | 60 | 0.5 | 0.6 | None | file1.txt |
What is the boiling point of water in Celsius? | None | 100°C | ["Water boils at 100°C (212°F) under standard atmospheric pressure."] | None | 1 | 55 | 0.4 | 0.9 | None | file2.txt |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Saving Generated Goldens
To not accidentally lose any generated synthetic Golden
, you can use the save_as()
method:
synthesizer.save_as(
file_type='json', # or 'csv'
directory="./synthetic_data"
)
Using Synthesizer Within An Evaluation Dataset
An EvaluationDataset
also has the generate_goldens_from_docs
and generate_goldens
methods, which under the hood is powered by the Synthesizer
's implementation.
Except for an additional option to accept a custom Synthesizer
as argument, the generate_goldens_from_docs
and generate_goldens
methods in an EvaluationDataset
accepts the exact same arguments as those on a Synthesizer
.
You can optionally specify a custom Synthesizer
when calling generate_goldens_from_docs
and generate_goldens
through the EvaluationDataset
interface if for example, you wish to use a custom LLM to generate synthetic data. If no Synthesizer
is provided, the default Synthesizer
configuration is used.
To begin, optionally create a custom Synthesizer
:
from deepeval.synthesizer import Synthesizer
synthesizer = Synthesizer(model="gpt-3.5-turbo")
Then, provide it as an argument to generate_goldens_from_docs
:
from deepeval.dataset import EvaluationDataset
...
dataset = EvaluationDataset()
dataset.generate_goldens_from_docs(
synthesizer=synthesizer,
document_paths=['example.pdf'],
)
Or, to generate_goldens
:
...
dataset.generate_goldens(
synthesizer=synthesizer,
contexts=[
["The Earth revolves around the Sun.", "Planets are celestial bodies."],
["Water freezes at 0 degrees Celsius.", "The chemical formula for water is H2O."],
]
)
Lastly, don't forget to call save_as()
to perserve any generated synthetic Golden
:
saved_path = dataset.save_as(
file_type='json', # or 'csv'
directory="./synthetic_data"
)
The save_as()
method returns a string to the path the dataset was saved to, just in case you need to use it in code later on.
Using a Custom Embedding Model
Under the hood, only the generate_goldens_from_docs()
method uses an embedding model. This is because in order to generate goldens from documents, the Synthesizer
uses cosine similarity to generate the relevant context needed for data synthesization.
Using Azure OpenAI
You can use Azure's OpenAI embedding models by running the following commands in the CLI:
deepeval set-azure-openai --openai-endpoint=<endpoint> \
--openai-api-key=<api_key> \
--deployment-name=<deployment_name> \
--openai-api-version=<openai_api_version> \
--model-version=<model_version>
Then, run this to set the Azure OpenAI embedder:
deepeval set-azure-openai-embedding --embedding_deployment-name=<embedding_deployment_name>
The first command configures deepeval
to use Azure OpenAI LLM globally, while the second command configures deepeval
to use Azure OpenAI's embedding models globally.
Using local LLM models
There are several local LLM providers that offer an OpenAI API compatible endpoint, like Ollama or LM Studio. You can use them with deepeval
by setting several parameters from the CLI. To configure any of those providers, you need to supply the base URL where the service is running. These are some of the most popular alternatives for base URLs:
- Ollama: http://localhost:11434/v1/
- LM Studio: http://localhost:1234/v1/
For example to use a local model served by Ollama, use the following command:
deepeval set-local-model --model-name=<model_name> \
--base-url="http://localhost:11434/v1/" \
--api-key="ollama"
Where model_name is one of the LLM that appears when executing ollama list
.
If you ever wish to stop using your local LLM model and move back to regular OpenAI, simply run:
deepeval unset-local-model
Then, run this to set the local Embeddings model:
deepeval set-local-embeddings --model-name=<embedding_model_name> \
--base-url="http://localhost:11434/v1/" \
--api-key="ollama"
To revert back to the default OpenAI embeddings run:
deepeval unset-local-embeddings
For additional instructions about LLM model and embeddings model availability and base URLs, consult the provider's documentation.
Using Any Custom Model
Alternatively, you can also create a custom embedding model in code by inheriting the base DeepEvalBaseEmbeddingModel
class. Here is an example of using the same custom Azure OpenAI embedding model but created in code instead using langchain's langchain_openai
module:
from typing import List, Optional
from langchain_openai import AzureOpenAIEmbeddings
from deepeval.models import DeepEvalBaseEmbeddingModel
class CustomEmbeddingModel(DeepEvalBaseEmbeddingModel):
def __init__(self):
pass
def load_model(self):
return AzureOpenAIEmbeddings(
openai_api_version="...",
azure_deployment="...",
azure_endpoint="...",
openai_api_key="...",
)
def embed_text(self, text: str) -> List[float]:
embedding_model = self.load_model()
return embedding_model.embed_query(text)
def embed_texts(self, texts: List[str]) -> List[List[float]]:
embedding_model = self.load_model()
return embedding_model.embed_documents(texts)
async def a_embed_text(self, text: str) -> List[float]:
embedding_model = self.load_model()
return await embedding_model.aembed_query(text)
async def a_embed_texts(self, texts: List[str]) -> List[List[float]]:
embedding_model = self.load_model()
return await embedding_model.aembed_documents(texts)
def get_model_name(self):
"Custom Azure Embedding Model"
When creating a custom embedding model, you should ALWAYS:
- inherit
DeepEvalBaseEmbeddingModel
. - implement the
get_model_name()
method, which simply returns a string representing your custom model name. - implement the
load_model()
method, which will be responsible for returning the model object instance. - implement the
embed_text()
method with one and only one parameter of typestr
as the text to be embedded, and returns a vector of typeList[float]
. We calledembedding_model.embed_query(prompt)
to access the embedded text in this particular example, but this could be different depending on the implementation of your custom model object. - implement the
embed_texts()
method with one and only one parameter of typeList[str]
as the list of strings text to be embedded, and return a list of vectors of typeList[List[float]]
. - implement the asynchronous
a_embed_text()
anda_embed_texts()
method, with the same function signature as their respective synchronous versions. Since this is an asynchronous method, remember to useasync/await
.
If an asynchronous version of your embedding model does not exist, simply reuse the synchronous implementation:
class CustomEmbeddingModel(DeepEvalBaseEmbeddingModel):
...
async def a_embed_text(self, text: str) -> List[float]:
return self.embed_text(text)
Lastly, provide the custom embedding model through the embedder
parameter when creating a Synthesizer
:
from deepeval.synthesizer import Synthesizer
...
synthesizer = Synthesizer(embedder=CustomEmbeddingModel())
If you run into invalid JSON errors using custom models, you may want to consult this guide on using custom LLMs for evaluation, as synthetic data generation also supports pydantic confinement for custom models.