Advanced¶

These modules are for building and calibrating your own data packs and evaluating accuracy. Most users never import them. They require the [data] and [calibration] extras.

Why

The core Resolver loads pre-built, bundled data. The builder and calibration modules are for data engineers who want to generate custom entity packs, train calibrators on labeled examples, or run reproducible accuracy benchmarks.

Data builder¶

resolvekit.builder ¶

Build tooling for module-oriented data generation and packaging.

BuildOptions ¶

Bases: BaseModel

Execution options for builder runs.

registry_path `property` ¶

registry_path

Registry file path for successful releases.

runs_root `property` ¶

runs_root

Directory where per-run state is stored.

shared_geo_root `property` ¶

shared_geo_root

Directory for persistent shared geo staging store.

BuildOutcome ¶

Bases: BaseModel

Outcome returned by build/resume commands.

BuildPlan ¶

Bases: BaseModel

Plan containing recipes and execution settings.

unique_recipe_ids `classmethod` ¶

unique_recipe_ids(value)

Ensure recipe IDs are unique.

BuildStatus ¶

Bases: StrEnum

Build status values.

DiscoveredEntityFacts ¶

Bases: BaseModel

Inspection facts for one discovered source entity.

DomainInspection ¶

Bases: BaseModel

Inspection report for one build domain.

EntityClassificationSummary ¶

Bases: BaseModel

Aggregated classification counts for one inspected domain.

EntityFilter ¶

Bases: BaseModel

Filtering options for one emitted module/domain artifact.

InspectionOutcome ¶

Bases: BaseModel

Outcome returned by inspect().

ModuleRecipe ¶

Bases: BaseModel

Definition of one installable module artifact.

QualityPolicy ¶

Bases: BaseModel

Release quality thresholds.

ReleaseRecord ¶

Bases: BaseModel

Registered successful release entry.

build ¶

build(plan, *, adapter_builder=None)

Run a full build for the provided plan.

Caller errors (BuildPlan validation, missing run dir) raise; content-level failures collect into outcome.errors.

adapter_builder: optional factory overriding the default registry (used for testing/injection).

inspect ¶

inspect(plan, *, adapter_builder=None)

Inspect source coverage for the provided plan.

Caller errors (BuildPlan validation, missing run dir) raise; content-level failures collect into outcome.errors.

list_releases ¶

list_releases(module_id=None, options=None)

List successful releases from registry.

resume ¶

resume(run_id, options=None, *, adapter_builder=None)

Resume a previously failed or interrupted run.

Caller errors (BuildPlan validation, missing run dir) raise; content-level failures collect into outcome.errors.

adapter_builder: optional factory overriding the default registry (used for testing/injection).

Calibration¶

resolvekit.calibration ¶

Calibration pipeline for ResolveKit scorers.

Provides: - PlattCalibrator and IsotonicCalibrator models - Fitting routines (fit_platt, fit_isotonic) - Evaluation metrics (brier_score, expected_calibration_error, evaluate_calibration) - load_calibrator / save_calibrator helpers - Dataset models and labeling utilities - LogisticScoringModel and related helpers - train_model / ModelTrainResult for end-to-end ML model training

Training-only symbols (train_calibrator, train_model, run_adapters, ModelTrainResult, TrainResult) are imported lazily on first attribute access so that merely loading a calibrator JSON at runtime does not pull in pandas / gecko / scikit-learn.

CalibrationBin ¶

Bases: BaseModel

Statistics for a single calibration bin.

CalibrationDataset ¶

Bases: BaseModel

Collection of labeled examples.

labeled_examples `property` ¶

labeled_examples

Only examples with scores and labels populated.

CalibrationMetrics ¶

Bases: BaseModel

Summary calibration metrics.

IsotonicCalibrator ¶

Bases: BaseModel

Isotonic regression calibrator using piecewise linear interpolation.

predict ¶

predict(raw_score, *, query_len=None)

Piecewise linear interpolation with boundary clamping.

LabeledExample ¶

Bases: BaseModel

One labeled example for calibration training.

LogisticScoringModel ¶

Bases: BaseModel

Trained logistic regression scoring model.

Stores feature names, weights, and bias extracted from a fitted sklearn LogisticRegression. At predict time it applies the dot product + sigmoid in pure Python — no sklearn dependency.

Sigmoid convention: 1 / (1 + exp(-logit)) — identical to sklearn's default, not the negated Platt convention.

model_version `property` ¶

model_version

Version string for trace metadata.

predict ¶

predict(features)

Return calibrated probability for the given feature vector.

Parameters:

Name	Type	Description	Default
`features`	`FeatureVector`	Any object satisfying the FeatureVector protocol (must have a `to_dict()` method).	required

Returns:

Type	Description
`float`	Probability in [0, 1].

predict_dict ¶

predict_dict(features)

Return calibrated probability for a raw feature dict keyed by name.

Parameters:

Name	Type	Description	Default
`features`	`dict[str, float]`	Feature values keyed by feature name (e.g. a stored `features_dict`). Missing names are vectorized as 0.	required

Returns:

Type	Description
`float`	Probability in [0, 1].

ModelTrainResult `dataclass` ¶

Result of a scoring model training run.

PlattCalibrator ¶

Bases: BaseModel

Platt scaling calibrator: maps raw scores to probabilities via sigmoid.

predict ¶

predict(raw_score, *, query_len=None)

Apply Platt scaling: sigmoid(a * raw_score + b).

StratifiedCalibrator ¶

Bases: BaseModel

Calibrator that dispatches to sub-calibrators based on query length.

Addresses non-monotonic score-accuracy relationships by training separate calibrators for short vs long queries, where within each group the relationship is more monotonic.

predict ¶

predict(raw_score, *, query_len=None)

Predict calibrated score, dispatching by query length.

Falls back to long_calibrator when query_len is unknown (None), since long queries are the majority case.

TrainResult `dataclass` ¶

Result of a calibration training run.

brier_score ¶

brier_score(predicted, actual)

Compute Brier score: mean squared error between predicted probs and labels.

calibration_curve_data ¶

calibration_curve_data(predicted, actual, n_bins=10)

Build calibration curve data grouped into equal-width bins.

evaluate_calibration ¶

evaluate_calibration(predicted, actual, n_bins=10)

Compute full calibration metrics including Brier score, ECE, and bin data.

expected_calibration_error ¶

expected_calibration_error(predicted, actual, n_bins=10)

Compute Expected Calibration Error (ECE) using equal-width bins.

label_examples ¶

label_examples(examples, resolver, *, include_features=False, context_enrichment_rate=0.0)

Run unlabeled examples through ResolveKit and auto-label.

For each example: 1. Run through the resolver pipeline via resolve_detailed() to get candidates 2. Extract raw (pre-calibration) score from the top candidate 3. label = 1 if top_entity_id == expected_entity_id, else 0

Uses the runner directly to access scores.raw_score (pre-calibration). If the resolver has a calibrator loaded, result.confidence would be the calibrated score — using raw_score avoids training on already-calibrated data.

Each example is resolved against its declared domain's pack runner directly, bypassing the router. This prevents AUTO-mode routing from sending examples to the wrong pack and contaminating domain-specific calibration data.

Parameters:

Name	Type	Description	Default
`examples`	`list[LabeledExample]`	Unlabeled examples to label.	required
`resolver`	`Resolver`	Loaded Resolver instance.	required
`include_features`	`bool`	If True, capture raw feature dicts from candidates.	`False`
`context_enrichment_rate`	`float`	Fraction of examples (0.0-1.0) that will receive geographic context (country hint) derived from the expected entity's country. Defaults to 0.0 (no enrichment). When > 0 a seeded RNG (seed=42) deterministically selects which examples are enriched. Enrichment is best-effort: examples whose country cannot be determined are resolved without context.	`0.0`

load_calibrator ¶

load_calibrator(path)

Load a calibrator from a JSON file, dispatching on the 'method' field.

load_examples_jsonl ¶

load_examples_jsonl(path)

Load examples from JSONL.

load_scoring_model ¶

load_scoring_model(path)

Load a LogisticScoringModel from a JSON file.

run_adapters ¶

run_adapters(domain, adapter_names, store, *, limit_per_adapter=None, cache_dir=None)

Run calibration adapters and return deduplicated pairs.

save_calibrator ¶

save_calibrator(calibrator, path)

Save a calibrator to a JSON file.

save_examples_jsonl ¶

save_examples_jsonl(examples, path)

Save examples as JSONL.

save_scoring_model ¶

save_scoring_model(model, path)

Save a LogisticScoringModel to a JSON file.

train_calibrator ¶

train_calibrator(resolver, domain, *, adapter_names=None, method='platt', limit_per_adapter=None, eval_split=0.2, cache_dir=None, output_path=None, examples_output=None, seed=42)

Train a calibrator end-to-end: adapters -> label -> fit -> evaluate.

Parameters:

Name	Type	Description	Default
`resolver`	`Resolver`	A loaded Resolver instance.	required
`domain`	`str`	Domain to calibrate ("geo" or "org").	required
`adapter_names`	`list[str] \| None`	Which data adapters to run (default: all for domain).	`None`
`method`	`Literal['platt', 'isotonic', 'stratified']`	"platt" or "isotonic".	`'platt'`
`limit_per_adapter`	`int \| None`	Max examples per adapter.	`None`
`eval_split`	`float`	Fraction held out for evaluation (0-1).	`0.2`
`cache_dir`	`str \| Path \| None`	Directory for caching adapter downloads.	`None`
`output_path`	`str \| Path \| None`	Save fitted calibrator JSON to this path.	`None`
`examples_output`	`str \| Path \| None`	Save labeled examples as JSONL.	`None`
`seed`	`int`	Random seed for reproducibility.	`42`

Returns:

Type	Description
`TrainResult`	TrainResult with calibrator, metrics, and counts.

train_model ¶

train_model(resolver, domain, *, adapter_names=None, regularization=1.0, limit_per_adapter=None, eval_split=0.2, cache_dir=None, output_path=None, examples_output=None, seed=42)

Train a logistic scoring model end-to-end.

Mirrors train_calibrator but uses include_features=True and fit_scoring_model() instead of a calibrator fitter.

Parameters:

Name	Type	Description	Default
`resolver`	`Resolver`	A loaded Resolver instance.	required
`domain`	`str`	Domain to train for ("geo" or "org").	required
`adapter_names`	`list[str] \| None`	Which data adapters to run (default: all for domain).	`None`
`regularization`	`float`	LogisticRegression C parameter.	`1.0`
`limit_per_adapter`	`int \| None`	Max examples per adapter.	`None`
`eval_split`	`float`	Fraction held out for evaluation (0-1).	`0.2`
`cache_dir`	`str \| Path \| None`	Directory for caching adapter downloads.	`None`
`output_path`	`str \| Path \| None`	Save fitted model JSON to this path.	`None`
`examples_output`	`str \| Path \| None`	Save labeled examples as JSONL.	`None`
`seed`	`int`	Random seed for reproducibility.	`42`

Returns:

Type	Description
`ModelTrainResult`	ModelTrainResult with model, metrics, and counts.

Next¶

Module-level API — The convenience functions (rk.resolve, rk.bulk, rk.entity, etc.) for day-to-day resolution without touching builder or calibration.

Confidence and calibration — How scores are produced, what the calibrator parameters mean, and why confidence is pack-specific.

Advanced¶

Data builder¶

resolvekit.builder ¶

BuildOptions ¶

registry_path property ¶

runs_root property ¶

shared_geo_root property ¶

BuildOutcome ¶

BuildPlan ¶

unique_recipe_ids classmethod ¶

BuildStatus ¶

DiscoveredEntityFacts ¶

DomainInspection ¶

EntityClassificationSummary ¶

EntityFilter ¶

InspectionOutcome ¶

ModuleRecipe ¶

QualityPolicy ¶

ReleaseRecord ¶

build ¶

inspect ¶

list_releases ¶

resume ¶

Calibration¶

resolvekit.calibration ¶

CalibrationBin ¶

CalibrationDataset ¶

labeled_examples property ¶

CalibrationMetrics ¶

IsotonicCalibrator ¶

predict ¶

LabeledExample ¶

LogisticScoringModel ¶

model_version property ¶

predict ¶

predict_dict ¶

ModelTrainResult dataclass ¶

PlattCalibrator ¶

predict ¶

StratifiedCalibrator ¶

predict ¶

TrainResult dataclass ¶

brier_score ¶

calibration_curve_data ¶

evaluate_calibration ¶

expected_calibration_error ¶

label_examples ¶

load_calibrator ¶

load_examples_jsonl ¶

load_scoring_model ¶

run_adapters ¶

save_calibrator ¶

save_examples_jsonl ¶

save_scoring_model ¶

train_calibrator ¶

train_model ¶

Next¶

registry_path `property` ¶

runs_root `property` ¶

shared_geo_root `property` ¶

unique_recipe_ids `classmethod` ¶

labeled_examples `property` ¶

model_version `property` ¶

ModelTrainResult `dataclass` ¶

TrainResult `dataclass` ¶