Skip to content

Advanced

These modules are for building and calibrating your own data packs and evaluating accuracy. Most users never import them. They require the [data] and [calibration] extras.

Why

The core Resolver loads pre-built, bundled data. The builder and calibration modules are for data engineers who want to generate custom entity packs, train calibrators on labeled examples, or run reproducible accuracy benchmarks.

Data builder

resolvekit.builder

Build tooling for module-oriented data generation and packaging.

BuildOptions

Bases: BaseModel

Execution options for builder runs.

registry_path property

registry_path

Registry file path for successful releases.

runs_root property

runs_root

Directory where per-run state is stored.

shared_geo_root property

shared_geo_root

Directory for persistent shared geo staging store.

BuildOutcome

Bases: BaseModel

Outcome returned by build/resume commands.

BuildPlan

Bases: BaseModel

Plan containing recipes and execution settings.

unique_recipe_ids classmethod

unique_recipe_ids(value)

Ensure recipe IDs are unique.

BuildStatus

Bases: StrEnum

Build status values.

DiscoveredEntityFacts

Bases: BaseModel

Inspection facts for one discovered source entity.

DomainInspection

Bases: BaseModel

Inspection report for one build domain.

EntityClassificationSummary

Bases: BaseModel

Aggregated classification counts for one inspected domain.

EntityFilter

Bases: BaseModel

Filtering options for one emitted module/domain artifact.

InspectionOutcome

Bases: BaseModel

Outcome returned by inspect().

ModuleRecipe

Bases: BaseModel

Definition of one installable module artifact.

QualityPolicy

Bases: BaseModel

Release quality thresholds.

ReleaseRecord

Bases: BaseModel

Registered successful release entry.

build

build(plan, *, adapter_builder=None)

Run a full build for the provided plan.

Caller errors (BuildPlan validation, missing run dir) raise; content-level failures collect into outcome.errors.

adapter_builder: optional factory overriding the default registry (used for testing/injection).

inspect

inspect(plan, *, adapter_builder=None)

Inspect source coverage for the provided plan.

Caller errors (BuildPlan validation, missing run dir) raise; content-level failures collect into outcome.errors.

list_releases

list_releases(module_id=None, options=None)

List successful releases from registry.

resume

resume(run_id, options=None, *, adapter_builder=None)

Resume a previously failed or interrupted run.

Caller errors (BuildPlan validation, missing run dir) raise; content-level failures collect into outcome.errors.

adapter_builder: optional factory overriding the default registry (used for testing/injection).

Calibration

resolvekit.calibration

Calibration pipeline for ResolveKit scorers.

Provides: - PlattCalibrator and IsotonicCalibrator models - Fitting routines (fit_platt, fit_isotonic) - Evaluation metrics (brier_score, expected_calibration_error, evaluate_calibration) - load_calibrator / save_calibrator helpers - Dataset models and labeling utilities - LogisticScoringModel and related helpers - train_model / ModelTrainResult for end-to-end ML model training

Training-only symbols (train_calibrator, train_model, run_adapters, ModelTrainResult, TrainResult) are imported lazily on first attribute access so that merely loading a calibrator JSON at runtime does not pull in pandas / gecko / scikit-learn.

CalibrationBin

Bases: BaseModel

Statistics for a single calibration bin.

CalibrationDataset

Bases: BaseModel

Collection of labeled examples.

labeled_examples property

labeled_examples

Only examples with scores and labels populated.

CalibrationMetrics

Bases: BaseModel

Summary calibration metrics.

IsotonicCalibrator

Bases: BaseModel

Isotonic regression calibrator using piecewise linear interpolation.

predict

predict(raw_score, *, query_len=None)

Piecewise linear interpolation with boundary clamping.

LabeledExample

Bases: BaseModel

One labeled example for calibration training.

LogisticScoringModel

Bases: BaseModel

Trained logistic regression scoring model.

Stores feature names, weights, and bias extracted from a fitted sklearn LogisticRegression. At predict time it applies the dot product + sigmoid in pure Python — no sklearn dependency.

Sigmoid convention: 1 / (1 + exp(-logit)) — identical to sklearn's default, not the negated Platt convention.

model_version property

model_version

Version string for trace metadata.

predict

predict(features)

Return calibrated probability for the given feature vector.

Parameters:

Name Type Description Default
features FeatureVector

Any object satisfying the FeatureVector protocol (must have a to_dict() method).

required

Returns:

Type Description
float

Probability in [0, 1].

predict_dict

predict_dict(features)

Return calibrated probability for a raw feature dict keyed by name.

Parameters:

Name Type Description Default
features dict[str, float]

Feature values keyed by feature name (e.g. a stored features_dict). Missing names are vectorized as 0.

required

Returns:

Type Description
float

Probability in [0, 1].

ModelTrainResult dataclass

Result of a scoring model training run.

PlattCalibrator

Bases: BaseModel

Platt scaling calibrator: maps raw scores to probabilities via sigmoid.

predict

predict(raw_score, *, query_len=None)

Apply Platt scaling: sigmoid(a * raw_score + b).

StratifiedCalibrator

Bases: BaseModel

Calibrator that dispatches to sub-calibrators based on query length.

Addresses non-monotonic score-accuracy relationships by training separate calibrators for short vs long queries, where within each group the relationship is more monotonic.

predict

predict(raw_score, *, query_len=None)

Predict calibrated score, dispatching by query length.

Falls back to long_calibrator when query_len is unknown (None), since long queries are the majority case.

TrainResult dataclass

Result of a calibration training run.

brier_score

brier_score(predicted, actual)

Compute Brier score: mean squared error between predicted probs and labels.

calibration_curve_data

calibration_curve_data(predicted, actual, n_bins=10)

Build calibration curve data grouped into equal-width bins.

evaluate_calibration

evaluate_calibration(predicted, actual, n_bins=10)

Compute full calibration metrics including Brier score, ECE, and bin data.

expected_calibration_error

expected_calibration_error(predicted, actual, n_bins=10)

Compute Expected Calibration Error (ECE) using equal-width bins.

label_examples

label_examples(examples, resolver, *, include_features=False, context_enrichment_rate=0.0)

Run unlabeled examples through ResolveKit and auto-label.

For each example: 1. Run through the resolver pipeline via resolve_detailed() to get candidates 2. Extract raw (pre-calibration) score from the top candidate 3. label = 1 if top_entity_id == expected_entity_id, else 0

Uses the runner directly to access scores.raw_score (pre-calibration). If the resolver has a calibrator loaded, result.confidence would be the calibrated score — using raw_score avoids training on already-calibrated data.

Each example is resolved against its declared domain's pack runner directly, bypassing the router. This prevents AUTO-mode routing from sending examples to the wrong pack and contaminating domain-specific calibration data.

Parameters:

Name Type Description Default
examples list[LabeledExample]

Unlabeled examples to label.

required
resolver Resolver

Loaded Resolver instance.

required
include_features bool

If True, capture raw feature dicts from candidates.

False
context_enrichment_rate float

Fraction of examples (0.0-1.0) that will receive geographic context (country hint) derived from the expected entity's country. Defaults to 0.0 (no enrichment). When > 0 a seeded RNG (seed=42) deterministically selects which examples are enriched. Enrichment is best-effort: examples whose country cannot be determined are resolved without context.

0.0

load_calibrator

load_calibrator(path)

Load a calibrator from a JSON file, dispatching on the 'method' field.

load_examples_jsonl

load_examples_jsonl(path)

Load examples from JSONL.

load_scoring_model

load_scoring_model(path)

Load a LogisticScoringModel from a JSON file.

run_adapters

run_adapters(domain, adapter_names, store, *, limit_per_adapter=None, cache_dir=None)

Run calibration adapters and return deduplicated pairs.

save_calibrator

save_calibrator(calibrator, path)

Save a calibrator to a JSON file.

save_examples_jsonl

save_examples_jsonl(examples, path)

Save examples as JSONL.

save_scoring_model

save_scoring_model(model, path)

Save a LogisticScoringModel to a JSON file.

train_calibrator

train_calibrator(resolver, domain, *, adapter_names=None, method='platt', limit_per_adapter=None, eval_split=0.2, cache_dir=None, output_path=None, examples_output=None, seed=42)

Train a calibrator end-to-end: adapters -> label -> fit -> evaluate.

Parameters:

Name Type Description Default
resolver Resolver

A loaded Resolver instance.

required
domain str

Domain to calibrate ("geo" or "org").

required
adapter_names list[str] | None

Which data adapters to run (default: all for domain).

None
method Literal['platt', 'isotonic', 'stratified']

"platt" or "isotonic".

'platt'
limit_per_adapter int | None

Max examples per adapter.

None
eval_split float

Fraction held out for evaluation (0-1).

0.2
cache_dir str | Path | None

Directory for caching adapter downloads.

None
output_path str | Path | None

Save fitted calibrator JSON to this path.

None
examples_output str | Path | None

Save labeled examples as JSONL.

None
seed int

Random seed for reproducibility.

42

Returns:

Type Description
TrainResult

TrainResult with calibrator, metrics, and counts.

train_model

train_model(resolver, domain, *, adapter_names=None, regularization=1.0, limit_per_adapter=None, eval_split=0.2, cache_dir=None, output_path=None, examples_output=None, seed=42)

Train a logistic scoring model end-to-end.

Mirrors train_calibrator but uses include_features=True and fit_scoring_model() instead of a calibrator fitter.

Parameters:

Name Type Description Default
resolver Resolver

A loaded Resolver instance.

required
domain str

Domain to train for ("geo" or "org").

required
adapter_names list[str] | None

Which data adapters to run (default: all for domain).

None
regularization float

LogisticRegression C parameter.

1.0
limit_per_adapter int | None

Max examples per adapter.

None
eval_split float

Fraction held out for evaluation (0-1).

0.2
cache_dir str | Path | None

Directory for caching adapter downloads.

None
output_path str | Path | None

Save fitted model JSON to this path.

None
examples_output str | Path | None

Save labeled examples as JSONL.

None
seed int

Random seed for reproducibility.

42

Returns:

Type Description
ModelTrainResult

ModelTrainResult with model, metrics, and counts.


Next

Module-level API — The convenience functions (rk.resolve, rk.bulk, rk.entity, etc.) for day-to-day resolution without touching builder or calibration.

Confidence and calibration — How scores are produced, what the calibrator parameters mean, and why confidence is pack-specific.