Headless Browser
Chrome DevTools Protocol integration for JavaScript-rendering engines
Headless Browser
Engines like Google, Baidu, and Bing China require JavaScript rendering. A3S Search provides a BrowserPool that manages a shared headless Chrome instance with tab concurrency control.
Feature Gate
Headless browser support is enabled by default via the headless Cargo feature:
# Enabled by default (9 engines)
[dependencies]
a3s-search = "0.8"
# Disable headless for smaller binary (6 engines)
[dependencies]
a3s-search = { version = "0.8", default-features = false }BrowserPool
BrowserPool manages a single Chrome process with a tab semaphore for concurrency control:
use a3s_search::browser::{BrowserPool, BrowserPoolConfig};
use std::sync::Arc;
let pool = Arc::new(BrowserPool::new(BrowserPoolConfig {
max_tabs: 4,
headless: true,
chrome_path: None,
proxy_url: None,
launch_args: vec![],
}));Configuration
Prop
Type
Lifecycle
The browser is lazily initialized on the first acquire_browser() call:
let browser = pool.acquire_browser().await?;
pool.shutdown().await;BrowserFetcher
BrowserFetcher implements the PageFetcher trait using BrowserPool:
use a3s_search::browser::BrowserFetcher;
use a3s_search::WaitStrategy;
let fetcher = Arc::new(
BrowserFetcher::new(pool.clone())
.with_wait(WaitStrategy::Selector {
css: "div.g".into(),
timeout_ms: 5000,
})
.with_user_agent("Mozilla/5.0 ...")
);Wait Strategies
Control when a page is considered "loaded":
Prop
Type
Each built-in headless engine uses an appropriate strategy:
Prop
Type
Chrome Auto-Detection
When chrome_path is None, A3S Search looks for Chrome in this order:
CHROMEenvironment variable- System PATH (
google-chrome,chromium,chrome, etc.) - Well-known install paths (macOS
/Applications/..., Linux/usr/bin/..., WindowsC:\Program Files\...) - Auto-download Chrome for Testing from Google CDN
- Cache in
~/.a3s/chromium/
Supported platforms: macOS (arm64, x64), Linux (x64), and Windows (x64, x86). The first run may download ~150MB.
Using Headless Engines
use a3s_search::{Search, SearchQuery};
use a3s_search::browser::{BrowserPool, BrowserPoolConfig, BrowserFetcher};
use a3s_search::engines::{Google, Baidu, BingChina, DuckDuckGo};
use a3s_search::WaitStrategy;
use std::sync::Arc;
let pool = Arc::new(BrowserPool::new(BrowserPoolConfig {
max_tabs: 4,
headless: true,
chrome_path: None,
proxy_url: None,
launch_args: vec![],
}));
let google_fetcher = Arc::new(
BrowserFetcher::new(pool.clone())
.with_wait(WaitStrategy::Selector {
css: "div.g".into(),
timeout_ms: 5000,
})
);
let baidu_fetcher = Arc::new(
BrowserFetcher::new(pool.clone())
.with_wait(WaitStrategy::Selector {
css: "div.c-container".into(),
timeout_ms: 5000,
})
);
let mut search = Search::new();
search.add_engine(DuckDuckGo::new());
search.add_engine(Google::new(google_fetcher));
search.add_engine(Baidu::new(baidu_fetcher));
let results = search.search(SearchQuery::new("rust programming")).await?;
pool.shutdown().await;PageFetcher Trait
All fetchers implement this trait:
#[async_trait]
pub trait PageFetcher: Send + Sync {
async fn fetch(&self, url: &str) -> Result<String>;
}Prop
Type