The incident
On March 3, 2026, the client's production site began returning HTTP
429 errors to customers. New Relic showed repeated Lock wait timeout exceeded exceptions from 15:21 UTC, with the incident becoming customer-facing
by 19:00 UTC.
The failures were not confined to one feature. Category pages slowed down, cron jobs failed, the blog indexer timed out, message queue work backed up, and admin saves started failing. The error text was useful, but the timing was more useful.
Investigation: following the lock contention
New Relic traces were matched against server activity until the pattern became clear. Each lock-wait spike lined up with the price indexer, and the indexer had several entry points:
- A Firebear import job for contract prices
- Three admin product saves at 14:08 UTC
-
A
ContractPricesauto-import cron running every 15 minutes
Any one of those actions could start a price reindex. On this catalog, that turned ordinary operational work into a site-wide database contention problem.
The price index had 125 million rows
The production database showed the size of the problem:
SELECT COUNT(*) FROM catalog_product_index_price;
-- Result: 125,281,340 For a few thousand products and a small number of customer groups, this table normally stays in the tens of thousands of rows. This site had 125 million.
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
-- Result: 18,874,368,000 (~18 GB)
SELECT
table_name,
ROUND((data_length + index_length) / 1024 / 1024) AS size_mb
FROM information_schema.tables
WHERE table_name = 'catalog_product_index_price';
-- Result: catalog_product_index_price — 18,205 MB The price index table alone was larger than the entire InnoDB buffer pool. There was no practical buffer pool headroom left for catalog data, session state, cron flags, or queue tables. During a reindex, even unrelated requests were competing with a table too large to stay in memory.
Root cause: customer groups scaled 1:1 with companies
The Magento price indexer builds rows across products, customer groups, and websites:
index_rows = products × customer_groups × websites
This is intentional. It pre-computes final prices for each product-group-website combination so category pages can read prices quickly. With 3 customer groups and 10,000 products, the table has 30,000 rows.
This site had 11,787 customer groups.
The Company module was creating one Magento customer group
per B2B company, using the company's ERP identifier as the group code. Every
company with contract prices got its own customer group. The intent was
understandable: reuse Magento's native tier price mechanism, which filters
by customer_group_id. The cost was that customer groups
had become a company dimension.
With 11,787 groups, 10,629 indexable products, and 1 website:
10,629 × 11,787 × 1 = 125,281,340 rows The formula was checked against three environments: staging with 436 groups, staging with 5,900 groups, and production. In each case, it matched the real row count to within 99.99%.
Secondary finding: broken environment variables
The investigation also found that 5 of the 6 performance-related environment variables recommended by Adobe Support were not being applied. One example:
# Wrong — would never be read by the indexer config system
CONFIG__INDEXER__BATCH_SIZE__CATALOG_PRODUCT_PRICE__DEFAULT=100
# Correct — maps to env.php via MAGENTO_DC_ prefix
MAGENTO_DC_INDEXER__BATCH_SIZE__CATALOG_PRODUCT_PRICE__DEFAULT=100
Adobe Commerce has two separate config paths with different prefixes. CONFIG__ writes to core_config_data; MAGENTO_DC_ maps
to env.php. The batch size tuning the team expected was
not active. Fixing it reduced some pressure, but it did not change the
row-count problem.
Strategy options
Option A — consolidate to 3 customer groups Chosen
Stop using customer groups as the company-specific pricing dimension. Store the ERP company ID directly on the tier price table and filter by it at runtime. For indexing, all B2B companies share one "Wholesale (B2B)" group.
- Index rows after migration: ~31,887 (3 groups × 10,629 products)
- Reduction: 99.97%
- Risk: Moderate — requires refactoring ~10 files across the pricing plugin chain
Option B — customer group dimension mode Rejected
Magento can split the price index into one table per customer group. With 11,787 groups, that means 11,787 database tables. It moves the problem into file descriptors, DDL time, backup behavior, and operational maintenance.
Option C — custom contract price table
Move contract prices to a fully custom table and bypass Magento's
native tier price mechanism. The existing catalog_product_entity_tier_price table already had 5 custom columns from the client module, and the import
pipeline was already receiving the ERP company ID before discarding it.
Adding one column to preserve that ID was lower risk than a full table migration.
Option A kept the useful part of this idea without replacing the pricing
storage model.
Phase 0: immediate pressure reduction Completed Mar 17, 2026
While Strategy A was being scoped, the product set exposed a faster
way to reduce load. The price indexer was processing 15,182 products, but only 3,601 were visible and enabled. The remaining 11,582 had visibility = "Not Visible Individually". They supported configurable product children, bundles, and
quick-order lookup behavior, but they were still occupying rows in the
price index.
Quick-order behavior was reviewed with the client first. Once it was clear those products were not needed for that workflow, the 11,582 non-visible products were disabled.
| Metric | Before | After | Change |
|---|---|---|---|
| Price index rows | 121,610,967 | 35,181,327 | −71.1% |
| Full reindex time | 1h 43m | ~1h 07m | −35% |
| Buffer pool pressure | Saturated (>100%) | ~51% of pool | Breathing room |
Phase 0 did not fix the architecture. It reduced the row count by 71.1%, lowered reindex time, and gave the client room to plan the structural change outside the active incident cycle.
Strategy A: structural correction Designed & Scoped
Strategy A was designed and scoped, but not deployed during this engagement. The architecture below is the proposed structural correction.
The fix
The import pipeline sends an ERP company ID and immediately discards it:
// ContractPriceFactory.php
public function retrieveGroupValue(string $navId): int
Meanwhile, the customer group table stores that same ERP ID as the group code:
SELECT customer_group_id, customer_group_code AS nav_company_id
FROM customer_group WHERE customer_group_id = 118467;
-- 118467 | 5022741
The fix was to preserve that identifier. Add nav_company_id on catalog_product_entity_tier_price, backfill it from
the group code, write it on every new import, and refactor the pricing
plugins to filter by nav_company_id instead of customer_group_id. Once pricing no longer depends on per-company groups, those groups
can be consolidated.
Files to change
| File | Change |
|---|---|
ContractPrices/etc/db_schema.xml | Add nav_company_id VARCHAR(64) column |
ContractPrices/Setup/Patch/Data/BackfillNavCompanyId.php | Backfill from customer_group.customer_group_code |
ContractPrices/Model/ContractPriceFactory.php | Write nav_company_id on insert (+1 line) |
ContractPrices/Model/Validation/ContractPriceValidator.php | Stop calling getOrCreateCustomerGroupByCode |
Catalog/Plugin/Product/AddCustomerGroupFilterToTierPriceSelectPlugin.php | Filter by nav_company_id via CurrentCompanyContext |
ContractPrices/Plugin/Catalog/UpdateBasePrice.php | Query by nav_company_id |
ContractPrices/Plugin/Catalog/UpdateRegularPrice.php | Query by nav_company_id |
ContractPrices/Plugin/Catalog/RemoveSlashPricing.php | Replace raw SQL customer_group_id filter |
Company/Service/CustomerGroupManager.php | Neutralise — stop creating new groups |
ContractPrices/Plugin/CompanyRepository/ValidateGroupChange.php | Delete — entirely obsolete |
Ebizcharge/Service/GetCustomerIdentifier.php | Read nav_company_id from company, not group code |
ContractPrices/Plugin/Order/AddCustomerGroupCodeToOrderBeforeOrderPlacePlugin.php | Read from CompanyManagement |
Migration sequence
Once Phase 1 (schema + import) and Phase 2 (plugin refactor) are deployed and validated on staging:
-
Reassign all
company.customer_group_idrecords to Wholesale (ID 2) -
Reassign all
customer.group_idrecords for B2B customers to Wholesale (ID 2) - Delete all per-company customer groups (IDs > 100)
- Run full price reindex — projected completion: ~3 minutes
Additional finding: 51.7% of companies on wrong prices
The group-per-company model had already created a pricing correctness problem. Of 24,386 companies, 12,604 (51.7%) were assigned to the default "General" group and would see non-contract pricing. The available evidence pointed to group creation falling behind company registration or failing silently. Strategy A was designed to remove that failure mode by pricing against the ERP company ID instead of the assigned customer group.
Results
| Metric | Before | After Phase 0 | Strategy A target |
|---|---|---|---|
| Price index rows | 125,281,340 | 35,181,327 | ~31,887 |
| Full reindex time | >2 hours | ~1h 07m | ~3 minutes |
| Buffer pool pressure | >100% — saturated | ~51% | <1% |
| Lock wait outages | Recurring (weekly) | Significantly reduced | Primary trigger removed |
| Companies on wrong prices | 12,604 (51.7%) | 12,604 (51.7%) | Group-assignment failure mode removed |
Technical takeaways
1. The price indexer's cross-join is a hard scaling constraint, not a guideline.
Magento's default price indexing works with a small number of customer
groups. If customer groups become company-specific, row count grows
with products × groups × websites. That math needs to be
checked before a B2B pricing architecture goes live. On this site,
11,787 customer groups turned a normal price index into a table larger
than the buffer pool.
2. Verify that environment variables are actually being applied.
Both CONFIG__ and MAGENTO_DC_ prefixes exist in
Adobe Commerce, but they map to different config systems. Tuning the wrong
one has no effect. Verify the active value with bin/magento config:show, env.php, or the platform console instead of trusting
the variable name.
3. Query the live system before scoping a fix.
The initial estimate from an older database dump was 174M rows. Production showed 125M, a 28% gap, because the indexer pruned non-salable products after each batch. That difference mattered for both the Phase 0 forecast and the risk profile of Strategy A.
4. Separate the quick win from the structural fix.
Phase 0 took hours to validate and deploy. Strategy A required a broader pricing refactor. Keeping those phases separate gave the client a measurable reduction in lock contention before committing to the riskier migration path. On a B2B system with 24,386 companies, that sequencing mattered.
Adobe Commerce Cloud 2.4.7-p8 · New Relic APM · MySQL 8 · Redis · Fastly