Methodology | The Hydrogen Index

What is the H-Score?

The H-Score is a 0–100 composite authenticity index. A score of 100 means every measurable forensic signal points to genuine, organic reviews. A score of 0 means every signal points to coordinated manipulation.

No brand is expected to score 100. Real review populations always contain some noise, and our measurement instruments are imperfect. The question we are answering is not “are all these reviews fake?” but rather “does this review corpus look statistically consistent with organic customer behaviour?”

Six forensic signals feed the composite score. Each is computed independently and then combined via a weighted sum. The weights below reflect the relative predictive power of each signal in our validation set (historic cases where ground-truth manipulation was subsequently confirmed by platform delistings or FTC enforcement).

Signal 1 · Linguistic clustering

Linguistic clustering · n-gram cosine similarity

Weight: 29%

What we measure. We embed every review as a bigram and trigram frequency vector and compute pairwise cosine similarities across the brand’s entire corpus. This produces a similarity matrix whose mean we call the “linguistic clustering score.”

Why it matters. In an authentic corpus, reviewers use varied language, different vocabulary, different levels of enthusiasm, and different sentence structures. Fake reviews, whether generated by LLMs or written by review mills, tend to cluster tightly because they are produced from a limited set of templates or by a small number of writers working to a brief.

Example. The phrase “game changer for my energy levels” appeared in 34% of one brand’s Amazon reviews in a single calendar month. In an organic corpus of 200+ reviews, the most common exact phrase rarely exceeds 2–3%.

Signal threshold: mean pairwise similarity > 0.42 = flagged. Organic baseline: 0.11–0.19.

Signal 2 · Reviewer-history sparsity

Reviewer-history sparsity

Weight: 22%

What we measure. For each Amazon reviewer, we examine their full review history: how many reviews they have left, for what price range of products, and over what time period. We compute a “sparsity ratio” for the brand: the fraction of reviewers whose lifetime review count is ≤2.

Why it matters. Fake reviewer accounts are created specifically for manipulation campaigns. They tend to have sparse histories, often concentrated in a short time window, and typically review only cheap products (<$10) before leaving a suspiciously detailed five-star review of a $100+ hydrogen bottle.

Signal threshold: >40% of recent reviewers with ≤2 lifetime reviews = flagged. Organic baseline for established Amazon products: 12–18%.

Signal 3 · Posting-cadence anomaly

Posting-cadence anomaly · burstiness

Weight: 18%

What we measure. We model the expected Poisson arrival rate of reviews based on each brand’s 90-day historical baseline. We then run a sliding-window test on the current 30-day window, looking for statistically significant departures , known as “burst events.”

Why it matters. Real customers buy and review over time. Coordinated campaigns produce review bursts, spikes of 50–300 reviews in a 2–6 hour window. These bursts occur preferentially overnight UTC (01:00–05:00), matching incentivised-review fulfilment schedules in Southeast Asia.

What a burst looks like. PUREPEBRIX H8000 produced 142 site reviews in a 38-minute window on May 23, 2026. Their trailing 30-day daily average was 11 reviews. The burst was 12.9× the expected rate, a 5-sigma event.

Signal threshold: observed rate > 5× 30-day baseline for ≥90 minutes = burst event. Brands with ≥3 burst events in the trailing 30 days are flagged.

Signal 4 · Incentive-disclosure leakage

Incentivized-disclosure leakage

Weight: 14%

What we measure. We scan review text for positive and negative disclosure signals: phrases like “received this product for free,” “discount code,” “sent to me for testing,” and their synthetic equivalents, including vague hedges (“I was given the opportunity”) that reviewers use to technically comply while burying the disclosure.

Why it matters. Honest incentivized reviewers are required by the FTC (16 CFR Part 255) and Amazon’s Community Guidelines to disclose material connections. When reviews clearly describe gifted or discounted experiences but lack disclosures, this is both a regulatory violation and a signal of a structured incentive programme.

Note on coverage. This signal can only detect what reviewers accidentally reveal. It is a lower-bound estimate of incentivized content and should be read alongside the other signals.

Signal threshold: >8% of reviews contain implicit incentive language without FTC-compliant disclosure.

Signal 5 · Image/EXIF fingerprinting

Image & EXIF fingerprint reuse

Weight: 9%

What we measure. For reviews that include photos, we compute perceptual hashes (pHash) and compare them across reviewers and brands. We also extract and compare EXIF metadata where available (device model, GPS coordinates, creation timestamps).

Why it matters. Review mills frequently reuse the same set of stock or recycled photos across different accounts. An image appearing in four or more reviews from supposedly different “verified buyers” is a strong manipulation signal, especially when the EXIF timestamps precede the product’s launch date.

Signal threshold: identical pHash appearing across ≥4 distinct reviewer accounts.

Signal 6 · Platform cross-pollination

Platform cross-pollination ratio

Weight: 8%

What we measure. We check whether reviewers on one platform (e.g. Amazon) also appear organically discussing the same brand on other platforms (Reddit, Trustpilot, review aggregators). We also measure whether the brand itself has any neutral third-party review presence.

Why it matters. Genuine customers often mention a product across platforms organically. Coordinated campaigns tend to stay within a single platform because managing multi-platform fake identities is costly and risky. A brand with zero cross-platform reviewer overlap, and especially one with zero Trustpilot presence, scores poorly here.

Key sector finding. Only 6 of 15 brands in our current corpus have any Trustpilot footprint, and of those, two score below 2.0 stars. The remaining 9 have zero independent presence. Trustpilot uses independent verification and actively removes fake reviews; its absence correlates strongly with brands that cannot survive neutral scrutiny.

Signal threshold: zero verified cross-platform reviewer overlap + no Trustpilot presence = maximum penalty.

Score thresholds

H-Scores are classified into three risk bands. These thresholds were calibrated against our validation set of brands with confirmed manipulation history.

70–100

◆ Legit · authentic review pattern

Forensic signals are consistent with organic customer behaviour. The review corpus shows appropriate linguistic diversity, reviewer history depth, and posting cadence. Note: this validates the review signal, not the product claims themselves.

40–69

◐ Mixed signals · proceed with caution

Some forensic signals are within acceptable bounds; others are anomalous. The most common cause is a mix of genuine Amazon reviews and heavily curated brand-site reviews. Weight third-party platform reviews more heavily when evaluating these brands.

0–39

▼ High-risk · strong manipulation signals

Multiple forensic signals indicate coordinated review manipulation. Independent product testing and return-window verification are strongly recommended before purchase. Consider whether the claimed features can be independently verified.

Data sources

We ingest publicly available review data from the following platforms:

Amazon US · primary source; reviewer history, verified purchase flags, posting timestamps, review photos (primary corpus).
Reddit · r/hydrogen, r/biohacking, r/Supplements, and product-mention scraping across relevant subreddits. Used for sentiment cross-check and shill detection.
Brand websites · structured data and review widget scraping. All brands rely heavily on this source; it carries the highest manipulation risk.
YouTube · comment sections of brand-sponsored and organic review videos. Used for linguistic cross-pollination analysis.
TikTok · hashtag-associated reviews and unboxing content. Primarily used for image fingerprint and shill-account detection.

We do not currently ingest Trustpilot data for this sector because no tracked brand has a Trustpilot presence. This is itself a forensic signal.

Update cadence

Scores are recalculated every 6 hours. Each recalculation ingests the trailing 30-day review window. The “burst events” counter resets to zero at the start of each 30-day window. Historical scores are archived and accessible via the API.

The “Live flag feed” shown on the main Index page reflects events detected in the current 6-hour window, presented in reverse chronological order.

Limitations & caveats

Sophisticated operators. Brands that purchase reviews from high-end mills (aged accounts, human-written text, varied devices) will have scores higher than warranted. Our signals catch the majority of campaigns but not all.
Verified purchase inference. We cannot access Amazon’s verified-purchase database directly. We infer it from reviewer history patterns and stated disclosures. Our verified-buyer ratios are estimates, not ground truth.
PPB claims. We flag unverified high PPB claims but we do not test products. If you require independent hydrogen concentration data, seek out academic lab reports.
Geographic coverage. Our primary coverage is the US Amazon marketplace. AU and EU listings may differ significantly in review pattern and should be evaluated separately.
Score lag. A brand can change its review practices after our last recalculation. Scores reflect a 30-day trailing window, not real-time activity.
No medical claims assessment. We assess review authenticity only. We make no assessment of the health claims made by any brand.

Appeals

If you represent a brand and believe your H-Score is materially inaccurate, you may submit an appeal to methodology@hydrogenindex.review. Appeals must include:

Your brand name and current H-Score
A specific claim about which signal(s) you believe are miscalculated
Supporting evidence (e.g. independent platform audit reports, lab certificates, reviewer correspondence that contradicts our burst-event data)

We review all appeals within 14 business days. If an appeal results in a score change, we publish a correction notice. We do not accept payment in connection with appeals. An appeal has no effect on our editorial independence.

API access

We offer a read-only REST API for researchers, journalists, and consumer-protection organisations. The API returns current H-Scores, factor breakdowns, and 30-day burst event logs in JSON format.

API access is free for academic and non-commercial use. Commercial licensing is available. Contact api@hydrogenindex.review with a description of your intended use.

How we score a review.