Advanced¶
These modules are for building and calibrating your own data packs and evaluating accuracy. Most users never import them. They require the [data] and [calibration] extras.
Why
The core Resolver loads pre-built, bundled data. The builder and calibration modules are for data engineers who want to generate custom entity packs, train calibrators on labeled examples, or run reproducible accuracy benchmarks.
Data builder¶
resolvekit.builder ¶
Build tooling for module-oriented data generation and packaging.
BuildOptions ¶
Bases: BaseModel
Execution options for builder runs.
BuildOutcome ¶
Bases: BaseModel
Outcome returned by build/resume commands.
BuildPlan ¶
Bases: BaseModel
Plan containing recipes and execution settings.
DiscoveredEntityFacts ¶
Bases: BaseModel
Inspection facts for one discovered source entity.
DomainInspection ¶
Bases: BaseModel
Inspection report for one build domain.
EntityClassificationSummary ¶
Bases: BaseModel
Aggregated classification counts for one inspected domain.
EntityFilter ¶
Bases: BaseModel
Filtering options for one emitted module/domain artifact.
InspectionOutcome ¶
Bases: BaseModel
Outcome returned by inspect().
ModuleRecipe ¶
Bases: BaseModel
Definition of one installable module artifact.
QualityPolicy ¶
Bases: BaseModel
Release quality thresholds.
ReleaseRecord ¶
Bases: BaseModel
Registered successful release entry.
build ¶
Run a full build for the provided plan.
Caller errors (BuildPlan validation, missing run dir) raise; content-level failures collect into outcome.errors.
adapter_builder: optional factory overriding the default registry (used for testing/injection).
inspect ¶
Inspect source coverage for the provided plan.
Caller errors (BuildPlan validation, missing run dir) raise; content-level failures collect into outcome.errors.
resume ¶
Resume a previously failed or interrupted run.
Caller errors (BuildPlan validation, missing run dir) raise; content-level failures collect into outcome.errors.
adapter_builder: optional factory overriding the default registry (used for testing/injection).
Calibration¶
resolvekit.calibration ¶
Calibration pipeline for ResolveKit scorers.
Provides: - PlattCalibrator and IsotonicCalibrator models - Fitting routines (fit_platt, fit_isotonic) - Evaluation metrics (brier_score, expected_calibration_error, evaluate_calibration) - load_calibrator / save_calibrator helpers - Dataset models and labeling utilities - LogisticScoringModel and related helpers - train_model / ModelTrainResult for end-to-end ML model training
Training-only symbols (train_calibrator, train_model,
run_adapters, ModelTrainResult, TrainResult) are imported
lazily on first attribute access so that merely loading a calibrator
JSON at runtime does not pull in pandas / gecko / scikit-learn.
CalibrationBin ¶
Bases: BaseModel
Statistics for a single calibration bin.
CalibrationDataset ¶
Bases: BaseModel
Collection of labeled examples.
CalibrationMetrics ¶
Bases: BaseModel
Summary calibration metrics.
IsotonicCalibrator ¶
Bases: BaseModel
Isotonic regression calibrator using piecewise linear interpolation.
predict ¶
Piecewise linear interpolation with boundary clamping.
LabeledExample ¶
Bases: BaseModel
One labeled example for calibration training.
LogisticScoringModel ¶
Bases: BaseModel
Trained logistic regression scoring model.
Stores feature names, weights, and bias extracted from a fitted sklearn LogisticRegression. At predict time it applies the dot product + sigmoid in pure Python — no sklearn dependency.
Sigmoid convention: 1 / (1 + exp(-logit)) — identical to
sklearn's default, not the negated Platt convention.
predict ¶
Return calibrated probability for the given feature vector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features
|
FeatureVector
|
Any object satisfying the FeatureVector protocol
(must have a |
required |
Returns:
| Type | Description |
|---|---|
float
|
Probability in [0, 1]. |
predict_dict ¶
Return calibrated probability for a raw feature dict keyed by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features
|
dict[str, float]
|
Feature values keyed by feature name (e.g. a stored
|
required |
Returns:
| Type | Description |
|---|---|
float
|
Probability in [0, 1]. |
ModelTrainResult
dataclass
¶
Result of a scoring model training run.
PlattCalibrator ¶
Bases: BaseModel
Platt scaling calibrator: maps raw scores to probabilities via sigmoid.
StratifiedCalibrator ¶
Bases: BaseModel
Calibrator that dispatches to sub-calibrators based on query length.
Addresses non-monotonic score-accuracy relationships by training separate calibrators for short vs long queries, where within each group the relationship is more monotonic.
predict ¶
Predict calibrated score, dispatching by query length.
Falls back to long_calibrator when query_len is unknown (None), since long queries are the majority case.
TrainResult
dataclass
¶
Result of a calibration training run.
brier_score ¶
Compute Brier score: mean squared error between predicted probs and labels.
calibration_curve_data ¶
Build calibration curve data grouped into equal-width bins.
evaluate_calibration ¶
Compute full calibration metrics including Brier score, ECE, and bin data.
expected_calibration_error ¶
Compute Expected Calibration Error (ECE) using equal-width bins.
label_examples ¶
Run unlabeled examples through ResolveKit and auto-label.
For each example: 1. Run through the resolver pipeline via resolve_detailed() to get candidates 2. Extract raw (pre-calibration) score from the top candidate 3. label = 1 if top_entity_id == expected_entity_id, else 0
Uses the runner directly to access scores.raw_score (pre-calibration). If the resolver has a calibrator loaded, result.confidence would be the calibrated score — using raw_score avoids training on already-calibrated data.
Each example is resolved against its declared domain's pack runner directly, bypassing the router. This prevents AUTO-mode routing from sending examples to the wrong pack and contaminating domain-specific calibration data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
examples
|
list[LabeledExample]
|
Unlabeled examples to label. |
required |
resolver
|
Resolver
|
Loaded Resolver instance. |
required |
include_features
|
bool
|
If True, capture raw feature dicts from candidates. |
False
|
context_enrichment_rate
|
float
|
Fraction of examples (0.0-1.0) that will receive geographic context (country hint) derived from the expected entity's country. Defaults to 0.0 (no enrichment). When > 0 a seeded RNG (seed=42) deterministically selects which examples are enriched. Enrichment is best-effort: examples whose country cannot be determined are resolved without context. |
0.0
|
load_calibrator ¶
Load a calibrator from a JSON file, dispatching on the 'method' field.
run_adapters ¶
Run calibration adapters and return deduplicated pairs.
train_calibrator ¶
train_calibrator(resolver, domain, *, adapter_names=None, method='platt', limit_per_adapter=None, eval_split=0.2, cache_dir=None, output_path=None, examples_output=None, seed=42)
Train a calibrator end-to-end: adapters -> label -> fit -> evaluate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resolver
|
Resolver
|
A loaded Resolver instance. |
required |
domain
|
str
|
Domain to calibrate ("geo" or "org"). |
required |
adapter_names
|
list[str] | None
|
Which data adapters to run (default: all for domain). |
None
|
method
|
Literal['platt', 'isotonic', 'stratified']
|
"platt" or "isotonic". |
'platt'
|
limit_per_adapter
|
int | None
|
Max examples per adapter. |
None
|
eval_split
|
float
|
Fraction held out for evaluation (0-1). |
0.2
|
cache_dir
|
str | Path | None
|
Directory for caching adapter downloads. |
None
|
output_path
|
str | Path | None
|
Save fitted calibrator JSON to this path. |
None
|
examples_output
|
str | Path | None
|
Save labeled examples as JSONL. |
None
|
seed
|
int
|
Random seed for reproducibility. |
42
|
Returns:
| Type | Description |
|---|---|
TrainResult
|
TrainResult with calibrator, metrics, and counts. |
train_model ¶
train_model(resolver, domain, *, adapter_names=None, regularization=1.0, limit_per_adapter=None, eval_split=0.2, cache_dir=None, output_path=None, examples_output=None, seed=42)
Train a logistic scoring model end-to-end.
Mirrors train_calibrator but uses include_features=True and
fit_scoring_model() instead of a calibrator fitter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resolver
|
Resolver
|
A loaded Resolver instance. |
required |
domain
|
str
|
Domain to train for ("geo" or "org"). |
required |
adapter_names
|
list[str] | None
|
Which data adapters to run (default: all for domain). |
None
|
regularization
|
float
|
LogisticRegression C parameter. |
1.0
|
limit_per_adapter
|
int | None
|
Max examples per adapter. |
None
|
eval_split
|
float
|
Fraction held out for evaluation (0-1). |
0.2
|
cache_dir
|
str | Path | None
|
Directory for caching adapter downloads. |
None
|
output_path
|
str | Path | None
|
Save fitted model JSON to this path. |
None
|
examples_output
|
str | Path | None
|
Save labeled examples as JSONL. |
None
|
seed
|
int
|
Random seed for reproducibility. |
42
|
Returns:
| Type | Description |
|---|---|
ModelTrainResult
|
ModelTrainResult with model, metrics, and counts. |
Next¶
Module-level API — The convenience functions (rk.resolve, rk.bulk, rk.entity, etc.) for day-to-day resolution without touching builder or calibration.
Confidence and calibration — How scores are produced, what the calibrator parameters mean, and why confidence is pack-specific.