class Aws::BedrockAgent::Types::SemanticChunkingConfiguration
@see docs.aws.amazon.com/goto/WebAPI/bedrock-agent-2023-06-05/SemanticChunkingConfiguration AWS API Documentation
@return [Integer]
The maximum number of tokens that a chunk can contain.
@!attribute [rw] max_tokens
@return [Integer]
The buffer size.
@!attribute [rw] buffer_size
@return [Integer]
The dissimilarity threshold for splitting chunks.
@!attribute [rw] breakpoint_percentile_threshold
from sentences 9, 10, and 11 combined.
set the buffer size to ‘1`, the embedding for sentence 10 is derived
that includes the previous and following sentence. For example, if you
sentences are compared in isolation, or within a moving context window
You must also specify a buffer size, which determines whether
split if they exceed the max token size.
similarity are split, creating 11 chunks. These chunks are further
sentences, 100 sentence pairs are compared, and the 10 with the least
sentence pairs that are least similar are split. So if you have 101
example, if you set the threshold to 90, then the 10 percent of
percentage of sentence pairs are divided into separate chunks. For
a percentile, where adjacent sentences that are less similar than that
determine how similar they are. You specify a threshold in the form of
With semantic chunking, each sentence is compared to the next to
processing.
of similar content derived from the text with natural language
chunking splits a document into into smaller documents based on groups
Settings for semantic document chunking for a data source. Semantic