Why lower the confidence threshold if I'm trying to reduce false positives?

Raising the confidence gate suppresses noise and real low-stock facings together, cratering recall. The strategy is to keep the gate low at 0.38 to 0.42 so borderline facings survive, then earn precision back with geometric and cross-modal filters that reject phantom boxes on structure and identity rather than score alone.

When should I use Weighted Box Fusion instead of standard NMS?

Use WBF on dense shelves where one product triggers several overlapping predictions. Hard NMS deletes all but one box and can merge tightly packed neighbours; WBF averages overlapping same-class boxes by confidence weight, producing one accurately placed box per physical unit. Set the IoU to 0.55 for packed facings and loosen toward 0.45 if neighbours start merging.

How do I stop price tags and promotional signage from being detected as SKUs?

Combine geometry and cross-modal checks. The aspect-ratio band and minimum-area filter remove most price rails and dividers, and the OCR gate rejects any surviving crop containing pricing glyphs, promo keywords, or bare unit strings with no catalog brand token. A decoded barcode that matches the catalog is treated as authoritative.

What happens to a box that fails both OCR and barcode validation?

It is routed to an async review queue with a reason code, not silently discarded. Preserving ambiguous boxes keeps an audit trail for category managers and supplies labelled hard examples for the next retraining cycle, so suppression never becomes silent data loss.

How does the pipeline keep suppression thresholds correct as stores change?

A circuit breaker tracks the false-positive rate over a rolling 100-frame window. When it exceeds 8% the confidence gate tightens by 0.05; when it falls below 3% the gate relaxes, both within fixed bounds. Every suppressed box is logged with score, coordinates, and reason code, which feeds drift monitoring and retraining.

Reducing False Positives in SKU Bounding Boxes

This walkthrough sits under Bounding Box Extraction & SKU Localization and solves one precise failure mode: a detector that keeps emitting boxes on things that are not stock — price rails, shelf talkers, promotional cardboard, empty facings, and glare blooms. Each phantom box is expensive downstream. It inflates share-of-shelf, fires a false out-of-stock or restock trigger, and corrupts the misplaced_sku_list and price_tag_mismatch_count that the compliance record carries into reporting. A category manager who catches two bad numbers stops trusting the dashboard entirely. Suppressing these detections is not a single threshold tweak; it is a short, ordered pipeline — confidence and box-fusion calibration, hard geometric constraints, a cross-modal identity check, and a feedback loop that recalibrates when the false-positive rate drifts — applied after detection and before the boxes leave the stage normalized. This page builds exactly that, step by step, and each step is independently verifiable.

Prerequisites & Context Jump to heading

Before applying this page, confirm the following are already in place. This procedure runs on the raw detections produced upstream; the detector variant that emits them is chosen by Vision Model Routing for Shelf Detection, and if your recall is collapsing rather than your precision, fix detection first via Optimizing YOLOv8 for Grocery Shelf Detection before reaching for suppression.

Runtime: Python 3.11+ with torch, torchvision, opencv-python, and numpy on the inference host.
Detector output contract: each frame yields parallel tensors of boxes in xyxy pixel coordinates, per-box scores, and integer class labels — the standard output before any post-processing.
Shelf ROI: a per-fixture region-of-interest polygon (from one-time store calibration) that bounds the valid shelf plane, so floor clutter and ceiling signage fall outside it.
SKU catalog: the master item table keyed on sku, with expected packaging aspect ratios, so geometric filters can be checked against catalog specifications rather than guessed.
Telemetry sink: somewhere to log suppressed boxes with their original score, coordinates, and a reason code — the same store the drift workflow in Debugging Vision Model Drift in Retail Environments reads from.

A note on counting: a false positive here is a box that survives to the compliance record but corresponds to no real facing. The goal is to drive that to near zero without dropping legitimate borderline facings, so every step below is paired with a recall guardrail.

Step 1 — Calibrate the Confidence Gate and Replace Hard NMS Jump to heading

Detectors default to a confidence threshold of 0.50, which is wrong for dense shelving where background texture mimics product edges. Lower the initial gate to 0.38–0.42 so borderline facings survive into post-processing, then earn precision back with the filters that follow rather than by raising this number. Pair the gate with better duplicate collapse: standard Non-Maximum Suppression at an IoU of 0.50 either fuses tightly packed neighbours into one box (undercounting facings) or leaves a single physical unit wearing two boxes. Replace it with Weighted Box Fusion, which averages overlapping coordinates by confidence weight instead of hard-deleting the loser, so a real product that triggered three near-identical predictions becomes one well-placed box.

import torch
from typing import Tuple


def _pairwise_iou(boxes: torch.Tensor, ref: torch.Tensor) -> torch.Tensor:
    """IoU of every row in `boxes` against a single `ref` box."""
    x1 = torch.max(boxes[:, 0], ref[0])
    y1 = torch.max(boxes[:, 1], ref[1])
    x2 = torch.min(boxes[:, 2], ref[2])
    y2 = torch.min(boxes[:, 3], ref[3])
    inter = (x2 - x1).clamp(min=0) * (y2 - y1).clamp(min=0)
    area_b = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
    area_r = (ref[2] - ref[0]) * (ref[3] - ref[1])
    union = area_b + area_r - inter
    return torch.where(union > 0, inter / union, torch.zeros_like(inter))


def weighted_box_fusion(
    boxes: torch.Tensor,
    scores: torch.Tensor,
    labels: torch.Tensor,
    iou_threshold: float = 0.55,
    score_threshold: float = 0.38,
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
    """
    Production-ready WBF for dense shelf SKU localization. Replaces hard NMS
    with confidence-weighted coordinate averaging, fusing only same-class boxes.
    """
    if len(boxes) == 0:
        return torch.empty(0, 4), torch.empty(0), torch.empty(0, dtype=labels.dtype)

    # Gate on confidence first so noise never anchors a fusion group.
    valid_mask = scores >= score_threshold
    boxes, scores, labels = boxes[valid_mask], scores[valid_mask], labels[valid_mask]
    if len(boxes) == 0:
        return torch.empty(0, 4), torch.empty(0), torch.empty(0, dtype=labels.dtype)

    # Highest-scoring box anchors each fusion group.
    order = scores.argsort(descending=True)
    boxes, scores, labels = boxes[order], scores[order], labels[order]

    keep_boxes, keep_scores, keep_labels = [], [], []
    suppressed = torch.zeros(len(boxes), dtype=torch.bool)

    for i in range(len(boxes)):
        if suppressed[i]:
            continue
        same_label = labels == labels[i]
        ious = _pairwise_iou(boxes, boxes[i])
        group = torch.where(same_label & (ious >= iou_threshold) & (~suppressed))[0]

        w_boxes, w_scores = boxes[group], scores[group]
        w_sum = w_scores.sum()
        fused_box = (w_boxes * w_scores.unsqueeze(1)).sum(dim=0) / w_sum
        fused_score = w_sum / len(w_boxes)

        keep_boxes.append(fused_box)
        keep_scores.append(fused_score)
        keep_labels.append(labels[i])
        suppressed[group] = True

    return torch.stack(keep_boxes), torch.tensor(keep_scores), torch.stack(keep_labels)

Set iou_threshold to 0.55 for tightly packed facings and loosen toward 0.45 only if you see adjacent products merging. These are the same density bands used when Validating Shelf Position Tolerances in Retail checks slot occupancy downstream.

Step 2 — Enforce Geometric and Spatial Constraints Jump to heading

Fusion fixes duplicates; it does nothing about a crisp, high-confidence box drawn around a price tag. That is what geometry is for. Apply hard constraints the instant boxes leave fusion, before any downstream consumer sees them. Three filters catch the bulk of structural false positives: an aspect-ratio band (real packaging rarely deviates beyond ±15% of its catalog ratio, so price rails and dividers fall out), a minimum area ratio (sub-0.5%-of-frame boxes are almost always labels or specks), and ROI containment (the box centroid must land inside the calibrated shelf polygon). For tilted captures, recover the fronto-parallel shelf plane with a homography first so the ROI test is meaningful — the warp math is the same one detailed in the parent component’s normalization pass.

import cv2
import numpy as np


def enforce_spatial_constraints(
    boxes: np.ndarray,
    frame_shape: tuple[int, int],
    shelf_roi: np.ndarray,
    min_aspect: float = 0.6,
    max_aspect: float = 2.2,
    min_area_ratio: float = 0.005,
) -> np.ndarray:
    """
    Filter boxes by aspect ratio, minimum area relative to the frame, and
    containment of the box centroid inside the shelf ROI polygon.
    """
    if len(boxes) == 0:
        return boxes
    img_area = float(frame_shape[0] * frame_shape[1])
    keep = []
    for idx, (x1, y1, x2, y2) in enumerate(boxes):
        w, h = x2 - x1, y2 - y1
        if h <= 0 or w <= 0:
            continue
        if not (min_aspect <= (w / h) <= max_aspect):
            continue  # price tags, dividers, promo cutouts
        if (w * h) / img_area < min_area_ratio:
            continue  # labels and specks
        cx, cy = (x1 + x2) / 2.0, (y1 + y2) / 2.0
        if cv2.pointPolygonTest(shelf_roi, (float(cx), float(cy)), False) < 0:
            continue  # outside the valid shelf plane
        keep.append(idx)
    return boxes[keep]

Derive min_aspect/max_aspect per category from the catalog rather than hard-coding one global band — endcap multipacks and single cans have very different envelopes.

Geometry rejects the wrong shape; it cannot tell a product box from a same-shaped promotional sleeve. Cross-modal validation closes that gap by reading a second signal out of the surviving region. Run lightweight OCR on each box: if the crop contains pricing glyphs ($, ¢), promo keywords (SALE, BOGO), or bare unit strings (oz, ml) with no catalog brand token, it is signage, not stock. Where a barcode or QR region overlaps the box, decode it and verify the payload against the catalog — a decoded barcode is authoritative. Critically, a box that fails both OCR and barcode is routed to an async review queue, not silently dropped, so you keep an audit trail for retraining instead of losing data.

import re
from dataclasses import dataclass
from typing import Callable, Optional

_PROMO_RE = re.compile(r"(\$|¢|\bSALE\b|\bBOGO\b|\b\d+\s?(oz|ml|g|lb)\b)", re.IGNORECASE)


@dataclass(frozen=True)
class CrossModalResult:
    sku: Optional[str]
    decision: str  # "accept" | "reject" | "review"
    reason: str


def cross_modal_gate(
    crop: "np.ndarray",
    ocr_fn: Callable[["np.ndarray"], str],
    barcode_fn: Callable[["np.ndarray"], Optional[str]],
    catalog: dict[str, dict],
) -> CrossModalResult:
    """Confirm or reject a surviving box using OCR text and barcode payload."""
    barcode = barcode_fn(crop)
    if barcode and barcode in catalog:
        return CrossModalResult(sku=barcode, decision="accept", reason="barcode_verified")

    text = ocr_fn(crop) or ""
    if _PROMO_RE.search(text) and not any(b in text.lower() for b in catalog.keys()):
        return CrossModalResult(sku=None, decision="reject", reason="promo_or_price_tag")

    if barcode is None and not text.strip():
        # No corroborating signal at all — preserve for human review, do not drop.
        return CrossModalResult(sku=None, decision="review", reason="no_secondary_signal")

    return CrossModalResult(sku=None, decision="review", reason="unverified")

OCR is the most glare-sensitive step here, so wrap ocr_fn with the retry and dead-letter handling described in Error Handling in Computer Vision Pipelines rather than letting a failed decode block the frame.

Step 4 — Close the Loop With an FPR Circuit Breaker Jump to heading

Static thresholds drift as stores re-light, packaging refreshes, and seasons change. Make suppression self-correcting: track the false-positive rate over a rolling window and recalibrate when it breaches budget. If FPR exceeds 8% over the last 100 frames, tighten the confidence gate by 0.05; when it recovers below 3%, relax it so recall is not left stranded. Every suppressed box is logged with its original score, coordinates, and reason code, which is exactly the telemetry the drift workflow consumes — so this step both protects today’s run and feeds tomorrow’s retraining set.

from collections import deque


class FPRCircuitBreaker:
    """Rolling false-positive-rate monitor that nudges the confidence gate."""

    def __init__(
        self,
        window: int = 100,
        high_water: float = 0.08,
        low_water: float = 0.03,
        base_conf: float = 0.40,
        step: float = 0.05,
        bounds: tuple[float, float] = (0.30, 0.60),
    ) -> None:
        self._flags: deque[int] = deque(maxlen=window)
        self.high_water, self.low_water = high_water, low_water
        self.conf, self.step, self.bounds = base_conf, step, bounds

    def record(self, was_false_positive: bool) -> float:
        """Log one reviewed detection; return the (possibly adjusted) gate."""
        self._flags.append(1 if was_false_positive else 0)
        if len(self._flags) < self._flags.maxlen:
            return self.conf
        fpr = sum(self._flags) / len(self._flags)
        lo, hi = self.bounds
        if fpr > self.high_water:
            self.conf = min(hi, round(self.conf + self.step, 2))
        elif fpr < self.low_water:
            self.conf = max(lo, round(self.conf - self.step, 2))
        return self.conf

Run the whole chain off the inference path. While the GPU detects the next batch, a CPU pool executes fusion, geometry, OCR, and the breaker update — the async batching pattern in Async Image Batching for High-Volume Stores is what keeps this post-processing from becoming the throughput ceiling.

Verification & Testing Jump to heading

Confirm each stage deterministically rather than eyeballing a dashboard:

Fusion collapses duplicates, not neighbours. Feed three boxes with IoU 0.9 and one disjoint box; assert weighted_box_fusion returns exactly 2 boxes and that the fused box sits between the inputs, weighted toward the highest score.
Geometry rejects the right shapes. Pass a synthetic price-tag box (aspect 4.0, area 0.2%) and a valid facing; assert only the facing survives enforce_spatial_constraints, and that a box whose centroid is outside shelf_roi is dropped.
Cross-modal routes, never silently drops. Stub ocr_fn to return "SALE $3.99" and assert decision == "reject"; stub both signals empty and assert decision == "review" so the frame lands in the queue, not the void.
Breaker moves the gate. Record 12 false positives in a 100-window and assert conf rose by exactly 0.05; record a clean window and assert it relaxes — and never escapes bounds.
Recall guardrail. On a labelled validation set, assert post-suppression recall stays within 2% of the raw detector’s recall. If recall craters, the confidence gate is too high — the false positives are a detection problem, not a suppression one.

A healthy run shows the suppressed-box log dominated by promo_or_price_tag and outside_roi reason codes, a stable FPR under 8%, and a review queue that drains rather than grows.

Troubleshooting Jump to heading

Symptom	Likely root cause	Remediation
Price tags and shelf talkers still scored as SKUs	Geometry band too loose, or OCR gate never reached because confidence gate already passed them	Tighten the per-category aspect band and ensure the cross-modal gate runs on every surviving box; assert `promo_or_price_tag` rejections appear in the log
Adjacent facings merged into one box (facings undercounted)	WBF `iou_threshold` too high for the density	Loosen toward `0.45`; this is IoU drift, not a confidence problem — raising confidence makes it worse
Real low-stock facings disappear after suppression	Confidence gate raised too far, or breaker stuck at upper bound	Lower the gate toward `0.38` and check the recall guardrail; treat thin recall as a detection fix, not more filtering
Glare blooms pass geometry and OCR returns garbage	Specular highlight saturates the crop so neither signal is reliable	Apply CLAHE before OCR and route `no_secondary_signal` boxes to review rather than accepting them
FPR swings wildly batch to batch	Window too small or breaker `step` too large	Widen the rolling window and shrink `step` to `0.02`; the breaker should nudge, not oscillate

Bounding Box Extraction & SKU Localization — the parent component and the localized-SKU record this suppression protects
Vision Model Routing for Shelf Detection — how the detector feeding these boxes is selected per fixture
Position Validation Algorithms for Planograms — the downstream consumer that turns clean boxes into compliance verdicts

Reducing False Positives in SKU Bounding Boxes

Prerequisites & Context Jump to heading#

Step 1 — Calibrate the Confidence Gate and Replace Hard NMS Jump to heading#

Step 2 — Enforce Geometric and Spatial Constraints Jump to heading#

Step 3 — Add a Cross-Modal Identity Gate Jump to heading#

Step 4 — Close the Loop With an FPR Circuit Breaker Jump to heading#

Verification & Testing Jump to heading#

Troubleshooting Jump to heading#

Related Jump to heading#