Research Strategy & Sources

How we're approaching the IT/web-security thematic bet — separated from the candidate universe so the methodology can evolve independently of the data.

Two-node split — why

01 · Initial Research List — the what. The universe of names, with current data, refreshed periodically.
02 · Research Strategy & Sources (this node) — the how and where. Methodology, intel sources, pipeline plan, known gaps.

When a new name surfaces (e.g. via a SACR post or a fresh IPO), the universe gets updated. When the process changes (we add a new screening layer, swap in a paid data feed, etc.), this node updates. Different change cadences.

How the candidate universe was built (in layers)

Ranked by edge contribution, lowest to highest:

Layer	Done?	Edge	Notes
1. Training-data recall	✅	None	Got the obvious 18 from memory. Common knowledge, no advantage.
2. Targeted web search	✅	Some	Surfaced NTSK (Sept-2025 IPO) and SAIL (Feb-2025 re-IPO) — names ETFs hadn't fully absorbed yet. Plus NTCT, CVLT, CLBT.
3. ETF holdings cross-ref	⏸	Low	CIBR/HACK/BUG holdings would catch 80% of $200M+ pure-plays. Confirms what we have, may add 1-2 we missed.
4. FMP screener pass	⏸	Medium	Filter Technology → Application/Systems Software → grep description for security keywords. Catches names we don't know exist.
5. Specialist substack reading	⏸	High (ongoing)	The single best ongoing source — practitioners + investors writing weekly on the AI-cyber thesis. See Intel Sources below.

Honest assessment: layers 1-2 are done; the universe today is "common knowledge plus 5 recent IPOs." That's the floor, not the ceiling. Real edge is in layers 3-5.

Where actual edge could come from

The thesis itself ("AI accelerates cyber attack → cyber demand structurally up") has been crowded for 2+ years. ETFs already encode it. Sell-side covers every $1B+ name. So the edge can't be the thesis — it has to be in how we operationalize it.

Layer 1 — Underfollowed names ETFs miss

CIBR has a $200M market cap floor. ETFs index by passive flow weight; small caps get missed. Pricing inefficiency lives here. Status: partially captured (NTSK, SAIL added).

Layer 2 — AI-defense leadership signals (the X-factor)

The thesis demands AI-driven defense, but no screen captures who's actually leading. Specific signals:

Threat research output — Falcon OverWatch, Unit 42, Mandiant. Frequency × depth × originality.
AI-product velocity — count + depth of AI-feature launches per quarter. CRWD shipped Charlotte AI; PANW shipped Precision AI; many others are slideware.
CVE response time — public CVE → patch ship date.
Patent activity — USPTO API is free; ML/anomaly-detection filings.
Conference presence — Black Hat / DEF CON / RSA accepted-talk count per vendor.

Status: not yet built. Highest leverage / least cost. Recommended first feature.

Layer 3 — Customer telemetry

Forward-looking signal that beats earnings by 90+ days. Three free sources to start:

1. Earnings call transcripts (FMP — already paid) — grep top-200 customer-side companies' transcripts for vendor mentions. When 3 Fortune 500 banks mention CRWD on the same quarter, that's a leading indicator. 2. CISA KEV + NVD CVE data (free public APIs) — vendor's own CVE count, patch time, threat-research-output volume. 3. USAspending federal contracts (free public API) — federal cyber spend by vendor; large lumpy awards land before revenue. DoD spends ~$15B/yr.

Status: planned, not built. ~2.5 days of work for all three.

Layer 4 — Quantitative edges screens miss

NRR (Net Revenue Retention) — only metric that actually matters for SaaS-shaped businesses; companies disclose it, but no screener captures it.
SBC drag — stock-based comp as % of revenue. Many cyber names look profitable until adjusted.
FCF / Revenue ratio — separates real businesses from accounting illusions.
Customer concentration — risk most screens hide.

Status: most are derivable from existing FMP fundamentals — modest extension to the candidates table.

Down-selection process

Universe is the starting point, not the deliverable. Apply value-investing rigor (Section B criteria from the strategy) to surface 2-3 stand-outs:

1. Consistent market-share gains 2. Fair-to-affordable valuation 3. Uniquely positioned for the AI-attack era (this is where Layer 2 above feeds in) 4. Nimble specialist vs. platform incumbent — decided per bucket 5. Real fundamentals (FCF positive or near-term path)

The combination that's hardest to find: high AI-leadership × reasonable valuation × already drawn-down. That's what we're hunting for.

Pipeline workflow — using existing infrastructure

We have the 18-step deep-analysis pipeline already (drives company-detail pages). For thematic candidates:

1. Triage: each candidate gets a fast deep / watch / skip verdict before we commit to a full analysis. 2. Deep dive: names that clear triage run the full 18-step pipeline (foundation → valuation → signals → final synthesis). 3. Surface: results auto-appear on company-detail pages. We can link from this tree's per-ticker nodes (when we add them) directly to the analysis.

Cost: triage is cheap (sub-$1/ticker). Full pipeline is expensive (~$2-5/ticker at full tier). Triage first, then deep-dive only the qualifying names.

Intel sources (substacks + sites we monitor)

The cybersecurity-investing community has converged on a small set of high-signal publications. These are where new names + thesis-aligned thinking surface before sell-side picks them up:

Source	Author	Why it matters
Software Analyst Cyber Research (SACR)	Francis Odum	Single best deep-research source on cyber investing. Tracks AI/cyber intersection, VC, M&A.
Venture in Security	Ross Haleliuk	Cyber business models, key players (public + private). Runs a syndicate.
Resilient Cyber	Chris Hughes	Practitioner-side perspective on the actual security stack.
Strategy of Security	—	Deep dives on specific cyber companies (e.g. SailPoint S-1 breakdown).
Mostly Metrics	—	S-1 / financial breakdowns of newly public companies.
Help Net Security	—	Daily cyber news + M&A flow.

These are the people doing the same thesis work we're doing — often with better access (sources, S-1 prep, VC dealflow). Reading them weekly is the cheapest edge available.

Known gaps (with-budget upgrades)

Things we'd add if data budget grew:

Gap	Tool	Cost	Edge added
Job-posting / vendor-skill tracking at scale	Coresignal or Revelio Labs	$1-2k/mo	Real-time deployment-growth proxy
Expert call transcripts	Tegus or AlphaSense	$20-50k/yr	Closest thing to ground truth on customer behavior
LinkedIn workforce data	Same as above (LinkedIn locked)	Same	Hiring trends as growth signal
Real-time CVE / threat feed	Recorded Future / Dragos	$$$$	Catches breach narratives before headlines

None of these are needed to start. The plan is to ship Layer 2 + Layer 3 telemetry from free sources first, see if signal exists, then upgrade.

What changes when (update cadence)

Universe (sibling node 01) — refresh the data roughly weekly. Add new names as they surface.
This node — update when methodology shifts (new layer added, signal proven, paid data feed turned on, etc.). Don't update for every data refresh.
Per-name nodes (future) — created under this tree when a candidate makes it to the deep verdict. Will hold full thesis writeup + pipeline output link.