v1.0April 2026

Methodology

Every number StackSpeed shows is traceable. Here is exactly where it comes from, how it is calculated, and what it cannot tell you.

Core Transparency Principles

Every data point has a source. No metric is displayed in the UI without a source badge indicating where it came from. If we do not know where a number originated, we do not show it.
Every methodology is versioned. When our methodology changes, existing scores are not silently updated. Old scores are flagged with their methodology version. Users can see what changed and why.
AI estimation is always disclosed. When a score is AI-estimated rather than lab-measured, the UI displays this clearly with an “AI Estimated” badge and confidence level. We never present an estimate as a measurement.
Corrections are welcome. We publish a corrections process. If a developer can demonstrate our data is wrong, we fix it and credit them. See the corrections process →
We publish our own limitations. This document includes a limitations section. We do not hide the weaknesses of our methodology. Read limitations →

Data Sources

1. WordPress.org Public API

We use the WordPress.org public JSON API at https://api.wordpress.org/plugins/info/1.2/ to retrieve plugin metadata. We cache responses for 24 hours.

What we use it for:

Plugin name, description, author, category, tags
Active install counts, user ratings, review counts
Version history, last updated date
PHP and WordPress version requirements
Download counts, file size

Limitations:

Active install counts are rounded (“1+ million”), not exact
Ratings are user-submitted and can be gamed
File size is the zip archive, not the installed footprint

2. StackSpeed Curated Benchmarks

For the MVP, our team manually researched and compiled benchmark data for the top 300 most-installed WordPress plugins. Data was sourced from:

Published research from the WordPress performance community
Performance reports from managed hosting providers (Kinsta, WP Engine, etc.)
Our own controlled lab testing where no third-party data was available

Every curated benchmark row includes: source_name, source_url, measured_at, and methodology_version. No benchmark row is stored without all four.

3. AI-Estimated Combination Scores

AI combination estimates are approximations, not measurements. Always verify performance on a staging environment before production deployment.

When a user builds a stack combining multiple plugins, the combined impact is not simply the sum of individual impacts. Plugins can conflict, share libraries, or have other interaction effects. For combinations that have not been lab-tested, we use the Anthropic Claude API to estimate the interaction coefficient.

How it works:

Start with the sum of individual plugin metrics as a mathematical baseline
Send individual metrics, plugin categories, and known conflicts to Claude
The model estimates an interaction coefficient (multiplicative factor)
We apply the coefficient and bound the result:

Combined score cannot exceed the best individual plugin's score
Combined score cannot be more than 3× worse than the sum of individual scores

Metric Definitions

Performance Score (0–100)

A composite score modeled on Google Lighthouse's scoring methodology. Higher is better. Calculated as a weighted average of normalized metrics:

Metric	Weight	Max (score = 0 at max)
Page load impact	35%	2,000 ms
JavaScript payload	20%	500 KB
HTTP requests	15%	50 requests
Database queries	15%	50 queries
PHP execution time	10%	200 ms
CSS payload	5%	200 KB

Score formula: score = Σ weight × (100 − min(value, max) / max × 100)

Score bands:

90–100 Excellent
70–89 Good
50–69 Needs improvement
0–49 Poor

Security Score (0–100)

Higher = safer. Composite of five factors:

Factor	Weight	Source
CVE history (past 3 years)	30%	WPScan Vulnerability DB
Days since last update	25%	WordPress.org API
Closed support thread ratio	15%	WordPress.org API
Active install count (trust signal)	15%	WordPress.org API
Public code audit status	15%	Manual research

CVE scoring: each unpatched CVE in past 12 months = −20 points. Each patched CVE = −5 points. No CVEs in 3 years = full points for this factor.

Update Frequency Grade (A–F)

Grade	Criteria
A	Released update within 60 days, 3+ releases in past year
B	Released update within 90 days, 2+ releases in past year
C	Released update within 180 days
D	Released update within 365 days
F	No release in over 365 days (effectively abandoned)

Other Metrics

Page Load Impact (ms): Estimated additional time added to page load (TTFB + FCP) by this plugin on a standard page with no caching active. Median of 5 test runs, outliers discarded.

HTTP Requests: Count of additional HTTP requests on a standard page load. Includes scripts, stylesheets, images, fonts, and on-load AJAX calls. Excludes user-triggered AJAX.

JavaScript Payload (KB): Total uncompressed JavaScript added by the plugin. Does not account for browser caching on repeat visits.

CSS Payload (KB): Total uncompressed CSS. Same methodology as JS payload.

Database Queries per Page Load: Measured using Query Monitor plugin in the test environment.

PHP Execution Time (ms): Additional PHP server-side processing time attributable to the plugin, measured using Xdebug profiling.

Combination Estimation

Plugin combinations are not purely additive. A stack of three plugins does not simply add the performance costs of each plugin individually. Plugins can:

Share JavaScript libraries, reducing combined JS payload
Conflict with each other, multiplying resource usage
Have database query overlap or cascading query patterns
Have caching interactions that amplify or reduce impact

Confidence levels:

High - We have actual lab-tested co-installation data for this combination.
Medium - The combination follows a known pattern (e.g., two caching plugins).
Low - Limited signal; treat as a rough approximation only.

AI model used: Anthropic Claude (model version displayed in the UI alongside each estimate and stored in the database for auditability).

“Combination estimates are AI-assisted approximations based on individual plugin benchmarks and known interaction patterns. They are not lab-measured values. Confidence level indicates our certainty in this estimate. Always verify performance on a staging environment before production deployment.”

Test Environment (StackSpeed Lab v1.0)

When StackSpeed measures data, this is the standardized environment:

WordPress versionLatest stable at time of measurement

PHP version8.2 (documented per measurement)

Web serverNginx 1.24

DatabaseMySQL 8.0

ServerDigitalOcean Droplet 2 GB RAM / 1 vCPU

ThemeTwenty Twenty-Four (default, unmodified)

Content10 sample posts, 1 page, 5 categories, 3 tags

Other pluginsNone (plugin under test only)

CachingNone active during measurement

Measurement toolLighthouse CI 12.x, headless Chrome

Runs per measurement5 (median reported, outliers discarded)

All parameters are stored with each benchmark row and displayed on plugin detail pages.

Limitations

This is the most important section of this document. Read it before making production decisions.

We do not claim our scores predict your specific site's performance.
Combination estimates are approximations, not measurements.
Third-party published data may use different test environments than ours.
Published data may be outdated relative to current plugin versions.
We cannot verify test conditions of third-party measurements.
The AI model may not have current knowledge of recent plugin updates.
Low-confidence estimates can be significantly wrong.
Plugin behavior is highly dependent on server environment, content volume, and other installed plugins not in the simulation.
Our security score does not catch all vulnerabilities.
Data freshness dates are shown — older data may not reflect the current plugin version.

StackSpeed is a research and planning tool. It helps you make better-informed decisions. It is not a substitute for testing on your own staging environment.

Corrections Process

We actively want to be corrected when our data is wrong. Here is how:

Navigate to any plugin's detail page
Click “Challenge this data” (visible on every benchmark row)
Submit your correction with: the metric you believe is wrong, your evidence (link to test, your measurement methodology, etc.)
Our team reviews within 7 business days
If verified, we update the data and credit you in the changelog

For systematic issues with our methodology, open a public discussion at GitHub Discussions (link coming at launch).

Methodology Version History

v1.0April 2026 — Initial release

• Initial methodology published
• Manual/curated benchmark data for top 300 plugins
• AI combination estimation via Anthropic Claude API
• Security score composite formula introduced
• Update frequency grade (A–F) introduced

Future versions will be documented here with: what changed, why it changed, how existing scores are affected, and whether existing scores will be recalculated.

Found an error? We want to know.

Explore the plugin directory →