Three-Way Empirical Comparison

Same prompt — Cross-Pacific AI Infra Stock-Investment Thesis (38 stocks / 6 countries / 10 layers) — produced three ways and measured. Numbers below are empirical from actual runs, not estimates.

Methodology. Three independent production paths for the same investment thesis prompt:
(1) Agentic Sciences — proprietary multi-agent orchestration pipeline. A primary orchestrator agent designs the workflow and dispatches specialized subagents in parallel; each subagent is routed to the most appropriate underlying model for its subtask. Built on top of a custom-curated primary-source corpus.
(2) Google Deep Research — single-model deep research mode with web search.
(3) Native single agent (no corpus) — a general-purpose AI agent with web + code tools, but explicitly without access to the Agentic Sciences proprietary corpus. Single-agent execution; no orchestration layer.
Three differentiators, compounding:
(1) Multi-model ensemble — best of every model. Each subagent is routed to the model that is empirically strongest for its subtask: one class for structural reasoning and tool orchestration; another optimized for cost-per-token on high-throughput structured extraction at scale; a third with extended deliberation budget for long-context multi-stage synthesis; specialized subagents for parallel research and quality control. No single model is best at everything — we use each where it wins.
(2) Proprietary primary-source corpus. Curated earnings-call transcripts, MD&A filings, and corporate-event databases that general web tools cannot reach behind paywalls. Every claim ties back to a verbatim quote with date attribution.
(3) Domain-expert judgment in the loop. A 14-year quant-economics researcher (Cornell PhD · AFA 2026) sets the framework, audits the synthesis, and red-teams the conclusions — turning agent output into an investable thesis, not a content artifact.

Neither single-model Deep Research nor single-agent baselines can match this compound effect.

① Agentic Sciences Pipeline

Multi-agent orchestration: primary orchestrator → parallel specialized subagents (extraction, synthesis, QC) → cost-routed across model classes. 26,288 words / 64 pages / 150+ dated verbatim quotes. Cross-corpus statistics + direct A-share MD&A reading. Each model used where it is best.
📄 Download thesis · free (67pp EN) 中文版 (54pp ZH)
Educational research only · Not investment advice.

② Google Deep Research

7,475 words / 28 pages / ~12 quotes (6 self-flagged "UNVERIFIED proxy"). 61 web sources, ~13% primary. Strong on macro framing.
📄 Download Google Deep Research raw output (.docx)

③ Native single agent (web tools + code)

8,706 words / 51 URL citations / only 5 UNVERIFIED tags. Real-time web search + web fetch + code execution. Comparable to Deep Research in coverage, lower hallucination rate, faster (~14 minutes vs Deep Research's ~10-15).
📄 Read the native-agent raw output (.md)

Agentic Sciences — Orchestration Stages

StageWhat it doesWhy orchestration matters here
Pipeline orchestrationPrimary agent designs the corpus-build workflow, manages data acquisition, dispatches downstream subagentsStructural reasoning and tool use are best handled by an agent class optimized for agentic decision-making
Parallel research subagentsSpawned in parallel for gap analysis, fact-checking, comparison audits, focused deep-divesIndependent context per subagent — parallel work without polluting the orchestrator's reasoning trace
Bulk structured extractionThousands of source documents converted into structured records (sentiment, quotes, forward statements, risk mentions)Cost-optimized routing — extraction is high-throughput / low-complexity, best handled by a fast cheap model class
Multilingual extractionChinese-language filings → structured English digests with explicit field extraction (chip partners, capex, AI revenue %)Bilingual capability + structured-output reliability are domain-specific strengths to route to
Multi-stage deep synthesisFinal thesis sections generated independently with full deliberation budget on long-context inputsLong-context + extended deep-thinking is a specialized capability deserving its own subagent class
Cross-section quality controlReconciles long/short positions across independently-generated sections; flags contradictions; applies methodology caveatsRed-team / consistency check is a separate task type, best done by a different agent than the one that wrote the content
Bilingual renderingMarkdown → styled HTML → PDF with cover, ToC, multilingual font fallbackPrint-quality typesetting is a deterministic stage, handled by classical tools rather than AI
Why this architecture matters. No single model is best at all of: cheap-fast bulk digesting, deep multi-stage reasoning, agentic tool use, structural design, code generation, multilingual extraction, parallel research dispatch, quality control. The orchestration layer routes each subtask to the most appropriate specialized subagent for that stage. The compound effect — many specialized agents coordinating in parallel — is what neither single-call Deep Research nor single-agent baselines can match.

Empirical Output Measurements

MetricAgentic SciencesGoogle Deep ResearchNative single agent
Total words26,2887,4758,706
Top long candidates8810
Pair trades5 + anti-pair5 + anti-pair6 + anti-pair
Watchlist events151525
Source citationscorpus-scope (1,590 calls + 26 MD&A + 1,445 disclosure eventss)61 web URLs51 unique web URLs
Dated verbatim quotes~150+~12 (~6 "UNVERIFIED proxy")~30-40
Self-flagged UNVERIFIED tags~5~6+5
Production timeMulti-day pipeline build + ~30 min synthesis~10-15 minutes~14 minutes (47 tool calls)
Output reproducibilityHigh — corpus is grep-ableMedium — re-search webMedium — re-fetch URLs (live web)
Surprise finding. Native single agent with tools produced output of comparable size and quality to Google Deep Research — actually with lower hallucination rate (5 UNVERIFIED tags vs Deep Research's 6+ "UNVERIFIED proxy" + 1 outright product hallucination on Cambricon Siyuan 690 deployment claims). The original assumption that "a single agent without our corpus would produce thin general-knowledge output" was wrong — given web access + code tools, it can build a credible thesis from scratch in ~15 minutes.

Capability Matrix (post-experiment)

CapabilityAgentic SciencesGoogle Deep ResearchNative single agent
Read paywalled primary transcripts (CIQ / FactSet)✅ 1,590 calls❌ blocked by paywall❌ blocked by paywall
Read original Chinese A-share annual reports / MD&A PDFs✅ 26 PDFs digested❌ relies on aggregator translations⚠️ partial (can fetch English IR pages, not Chinese MD&A directly)
Real-time web search❌ corpus snapshot
Iterative tool use (search → fetch → parse → re-search)N/A (corpus pre-built)limited single-pass research✅ 47 tool calls in the comparison run
Run code to compute aggregates / parse HTMLN/A (Python pipeline pre-run)✅ code execution + Python
Cross-corpus statistical computation (mention rates etc.)✅ measured across 1,590 calls❌ assert without measurement❌ no corpus to measure
Citation discipline (source per claim)corpus + ticker + date61 sources, ~13% primary51 URLs, mostly news + IR pages
Hallucination rate~0 (corpus-bound)1 outright + 1 timeline-conflated0 outright (5 UNVERIFIED self-flags)
Coverage of A-share niche namesdeep (MD&A direct read)medium (via aggregators)medium (via web; some thinness on iFlytek/Sugon)
Compliance defensibilityPassMarginalMarginal (URLs may decay)

Test Cases — Same Question, Three Empirical Answers

Q1: What is iFlytek's primary AI chip partner? Provide source.
Agentic SciencesHuawei Ascend. Source: iFlytek 2025-06-30 interim MD&A (PDF read directly, machine-translated). Spark large-model training on Ascend processors + Atlas SuperPoD. verbatim from primary filing
Google Deep Research"Huawei Ascend 950 / Atlas 950 SuperPoD". Source: TrendForce + Chinese aggregators. The 950PR only entered mass production 2026-04 and 950DT/SuperPoD scheduled Q4 2026 — Deep Research conflated roadmap with shipped product. partial — timeline-conflated
Native single agentHuawei Ascend. Source: web search result citing Liu Qingfeng public statements + secondary news. Did NOT specify chip generation incorrectly. correct, conservative
Q2: 2026 aggregate hyperscaler capex. Specific number.
Agentic Sciences~$585B floor (per-company company-disclosed minimum from earnings calls). Conservative anchored on actual management guides. disclosed floor
Google Deep Research$725B from a Tom's Hardware article citing analyst aggregate estimate. analyst aggregate
Native single agent$665-740B. Built bottom-up from individual company guides fetched live: GOOGL $180-190B (CNBC 2026-04-29), MSFT ~$190B, META $125-145B, AMZN ~$200B (TheNextWeb), ORCL ~$50B. Each line has source URL. most defensible — bottom-up
Q3: Tencent's exact quote on GPU rationing for external cloud customers?
Agentic Sciences2026-03-18: "Tencent Cloud continued to face revenue headwinds due to limited availability of GPU for external customers as we prioritize our internal needs." Plus 2025-03-19 quote on internal allocation. Both verbatim from transcripts. verbatim + dated
Google Deep ResearchParaphrased substance correctly but no exact verbatim line. paraphrase
Native single agentDid NOT specifically obtain this quote in this run (corpus did not surface it via web search — it would require finding the actual transcript text on a free-tier IR page). The thesis cites Tencent's general GPU constraint context with secondary-source attribution. general only
Q4: HBM mention rate in Chinese cloud earnings calls?
Agentic Sciences0% across self-disclosure layer; 4% if including analyst-question references. Measured across 4 cloud companies × 28 calls. Methodology footnote in document reconciles. measured statistic
Google Deep ResearchAsserts "exactly 0%" without showing methodology. Likely picked up the conclusion from secondary commentary. borrowed assertion
Native single agentDiscusses bifurcation directionally but does not produce the mention-rate statistic. Acknowledges in body it cannot run mention-rate analysis without a transcript corpus. honest gap
Q5: Cambricon 2025 financial milestone (specific number)?
Agentic Sciences+4,347.82% YoY H1 2025 revenue growth to CNY 2.88B. Source: Cambricon 2025-06-30 interim MD&A (direct PDF read). interim filing
Google Deep ResearchGeneral "Cambricon revenue ramp" narrative, less granular. directional
Native single agentRMB 6.5B 2025 revenue / RMB 2.06B net profit (first profitable year ever). Source: Fortune 2025-08-27 article. Different but valid datapoint — uses 2025 full-year figure, found via web search. verified via news
Q6: A-share AI infra company with US patent infringement case?
Agentic SciencesEoptolink (300502). AOI filed N.D. California 3:24-cv-08165, 2024-11-19. Source: public corporate disclosure event records. specific case + court
Google Deep ResearchLikely captured given web news coverage. probably correct
Native single agentDid not surface this specific case — niche enough that it requires disclosure event-quality event coverage, which web search does not return cleanly. missed

Where Each Genuinely Wins

Agentic Sciences wins decisively at:

Native single agent wins at:

Google Deep Research wins at:

Failure Modes (empirical)

Agentic Sciences thesis fails when:

Google Deep Research fails when:

Native single agent fails when:

Cost / Effort Matrix

ToolOne-time setup costMarginal cost per thesisBest for
Agentic Sciences High (build corpus pipeline: research-platform access + scraper + cleaner + summarizer + indexer; days of engineering) Low (corpus is reusable — re-running thesis is a single orchestration call) Repeated investment processes where the same corpus is queried many times; compliance-grade defensibility
Google Deep Research None (just open the deep research tool) Free or marginal cost (consumer AI subscription) One-shot thought-leadership; macro framing; quick on-demand topics
Native single agent None (general-purpose AI agent) Per-task tool calls (~$1-3 per thesis at typical pricing) Ad-hoc deep-dives where iterative tool use beats single-pass; bottom-up data building

Optimal Workflow Across All Three

The full stack:
① Use Native single agent first to design the corpus and analytical framework. It excels at structural reasoning, methodology design, and red-team auditing.
② Use the corpus pipeline once built to produce the Agentic Sciences-grade thesis as the auditable trading book — compliance-defensible, fact-checked, cross-corpus statistics.
③ Run Google Deep Research monthly as a macro / regulatory news refresh layer to compensate for corpus snapshot lag.
④ For one-off questions or fast-turnaround pitches, Native single agent with tools alone is sufficient — empirically it produces 8,700+ word theses in 15 minutes with citation discipline competitive with Deep Research.