class Aws::BedrockAgent::Types::WebCrawlerConfiguration
@see docs.aws.amazon.com/goto/WebAPI/bedrock-agent-2023-06-05/WebCrawlerConfiguration AWS API Documentation
@return [String]
a specific user agent permitted to access your source URLs.
optionally append a custom suffix to ‘bedrockbot_UUID` to allowlist
is provided). By default, it is set to `bedrockbot_UUID`. You can
`bedrockbot`, UUID, and a user agent suffix for your crawler (if one
web server. The user agent header value consists of the
A string used for identifying the crawler or bot when it accesses a
@!attribute [rw] user_agent_header
@return [String]
Returns the user agent suffix for your web crawler.
@!attribute [rw] user_agent
@return [String]
“docs.aws.amazon.com”.
“aws.amazon.com” can also include sub domain
the host or primary domain. For example, web pages that contain
other domains. You can choose to include sub domains in addition to
URL “docs.aws.amazon.com/bedrock/latest/userguide/” and no
or primary domain. For example, only web pages that contain the seed
You can choose to crawl only web pages that belong to the same host
The scope of what is crawled for your URLs.
@!attribute [rw] scope
@return [Array<String>]
precedence and the web content of the URL isn’t crawled.
filter/pattern and both match a URL, the exclusion filter takes
include certain URLs. If you specify an inclusion and exclusion
A list of one or more inclusion regular expression patterns to
@!attribute [rw] inclusion_filters
@return [Array<String>]
precedence and the web content of the URL isn’t crawled.
filter/pattern and both match a URL, the exclusion filter takes
exclude certain URLs. If you specify an inclusion and exclusion
A list of one or more exclusion regular expression patterns to
@!attribute [rw] exclusion_filters
@return [Types::WebCrawlerLimits]
The configuration of crawl limits for the web URLs.
@!attribute [rw] crawler_limits
authorized to crawl the URLs.
The configuration of web URLs that you want to crawl. You should be