Module-level API¶

As of v0.1.

The functions below are the primary entry point for most use cases. They share a singleton Resolver created on first call — you don't instantiate anything. Import convention throughout this reference:

import resolvekit as rk

Heads up

These module-level functions run on a shared Resolver built with the default AUTO routing mode, where the per-call domain= argument raises ValueError. To restrict resolution to one domain, build a resolver with only the modules you need — Resolver.from_modules(module_ids=["geo.countries"]) or Resolver.auto(domains=["geo"]) — or construct one with routing_mode=RoutingMode.EXPLICIT to enable per-call domain=. See the Resolver reference.

Note

within(), members_of(), is_member(), related(), and known_groups() are not module-level functions. Call them on a Resolver instance: rk.default().within(...), or build one with Resolver.auto(). See the Resolver reference.

Functions¶

`resolve`¶

rk.resolve(
    text: str,
    *,
    to: str | None = UNSET,
    as_result: bool = False,
    domain: str | list[str] | None = None,
    context: ResolutionContext | None = None,
    from_system: str | None = None,
    include_entity: bool = True,
    timeout: float | None = None,
) -> ResolutionResult | Any

Resolve a text string or code against all loaded modules.

Parameters

Name	Meaning
`text`	The string to resolve. Can be a name, spelling variant, or code.
`to`	Pivot the resolved entity to a specific representation. Omit (default) to use the `default_to` configured via `configure()`, or return a raw `ResolutionResult` when no default is set. Pass `None` to always return a `ResolutionResult`. When set to a code system name (`"iso3"`, `"iso2"`, `"name"`, `"flag"`, `"aliases"`, `"dcid"`, etc.), returns the pivot value directly. A per-call `to=` overrides the configured default.
`as_result`	Return the full `ResolutionResult` even when a `default_to` is configured — equivalent to passing `to=None`. Cannot be combined with an explicit `to=`.
`domain`	Restrict resolution to one or more domains (`"geo"`, `"org"`, or a list).
`context`	A `ResolutionContext` with hints (entity type, parent, country, language).
`from_system`	Treat `text` as a code in this system (e.g. `"iso2"`, `"iso3"`, `"dcid"`, `"wikidata"`). Skips name-matching.
`include_entity`	Populate `result.entity`. Defaults to `True` at the module level for notebook ergonomics. Set to `False` in pipelines where you don't need the full entity.
`timeout`	Maximum seconds before the pipeline is cut short. `None` = no limit.

Returns

A ResolutionResult when to=None or as_result=True, or when no default_to is configured and to is omitted.
The pivot value directly (typically str | None) when to is set or a default_to is active.

Example — full result

>>> rk.resolve("Germany")
ResolutionResult(status='resolved', entity_id='country/DEU', confidence=≈0.91, pack_id='geo')

Example — pivot

>>> rk.resolve("DE", from_system="iso2", to="flag")
'🇩🇪'

`resolve_id`¶

rk.resolve_id(
    text: str,
    *,
    on_ambiguous: Literal["raise", "null", "best"] = "raise",
    from_system: str | None = None,
    domain: str | list[str] | None = None,
    context: ResolutionContext | None = None,
    timeout: float | None = None,
) -> str | None

Resolve text and return the entity ID string, or None on no match.

Parameters

Name	Meaning
`text`	Text or code to resolve.
`on_ambiguous`	What to do when multiple entities match. `"raise"` (default) raises `AmbiguousResolutionError`; `"null"` returns `None`; `"best"` returns the top candidate's ID.
`from_system`	Force input to be interpreted as a code in this system.
`domain`	Restrict to one or more domains.
`context`	Resolution hints.
`timeout`	Maximum seconds.

Returns str | None — entity ID, or None when no match (or ambiguous with on_ambiguous="null").

Raises AmbiguousResolutionError when on_ambiguous="raise" and the query is ambiguous.

Example

>>> rk.resolve_id("United States")
'country/USA'
>>> rk.resolve_id("Cote dIvoire")
'country/CIV'
>>> rk.resolve_id("Congo", on_ambiguous="null")   # ambiguous → None
None
>>> rk.resolve_id("Congo", on_ambiguous="best")
'country/COD'

`bulk`¶

rk.bulk(
    *,
    values: list | tuple | dict | pd.Series | pl.Series | np.ndarray,
    to: str | None = UNSET,
    on_missing: Literal["raise", "null", "auto"] = UNSET,
    domain: str | list[str] | None = None,
    context: ResolutionContext | None = None,
    output: Literal["series", "record", "frame"] = "series",
    from_system: str | None = None,
    not_found: str = "null",
    on_error: Literal["raise", "null", "keep"] = "raise",
    on_ambiguous: Literal["null", "raise", "best"] = "null",
    crosswalk: Crosswalk | None = None,
) -> Any

Resolve a collection of values. Identical inputs are deduplicated automatically before the pipeline runs, so repeated values don't multiply the work.

See how to clean a DataFrame column for the automatic path, and how to reconcile a column with a review for the Crosswalk round-trip.

Parameters

Name	Meaning
`values`	Input collection. Accepts a list, tuple, `dict`, pandas `Series`, polars `Series`, or NumPy array. A `dict` resolves its values and returns a same-keyed dict. `Added in v0.1.`
`to`	Pivot each resolved entity. Omit to use the `default_to` configured via `configure()`. Pass `None` to always return a `BulkResult`. When set to a code system name, returns the native input shape (e.g. `pd.Series`) of pivot values; unresolved rows become `None`.
`on_missing`	Miss policy override for the configured output chain. Omit to inherit the resolver's `on_missing` policy. `"auto"` = null per row with `UserWarning` for bulk; `"raise"` = raises `OutputMissingError` on the first resolved-but-missing entity; `"null"` = returns `None` per row silently. Only relevant when `to` is omitted and a `default_to` is configured.
`domain`	Domain filter, broadcast to every row.
`context`	Context hints, broadcast to every row.
`output`	Shape of the returned object when `to=None`: `"series"` (default) — series of values; `"record"` — series of structs; `"frame"` — DataFrame. Ignored when `to` is set.
`from_system`	Treat every value as a code in this system.
`not_found`	What fills unresolved rows in the output. `"null"` (default) → `None`; `"raise"` → raises; any other string → used as a literal sentinel value.
`on_error`	`"raise"` (default), `"null"`, or `"keep"` (pass the original value through).
`on_ambiguous`	`"null"` (default), `"raise"`, or `"best"`.
`crosswalk`	A `Crosswalk` of pre-decided `value → entity_id` overrides. A value in the crosswalk short-circuits resolution — it bypasses `from_system`, `on_ambiguous`, and `not_found`; an `IGNORE` entry yields `None`. Values absent from the crosswalk resolve normally. `Added in v0.1.`

Returns

When to is set: the native shape (e.g. pd.Series, or a same-keyed dict for dict input) of pivot values.
When to=None: a BulkResult.

Raises CrosswalkError when a crosswalk (built with strict=True, the default) maps a value to an entity ID that no loaded pack carries.

Example — pandas Series with pivot

>>> import pandas as pd
>>> rk.bulk(
...     values=pd.Series(["United States", "Brasil", "Cote dIvoire", "zzznotacountry"]),
...     to="iso3",
... )
0    USA
1    BRA
2    CIV
3    None
dtype: object

Example — list with custom not-found sentinel

>>> rk.bulk(values=["Germany", "zzz"], to="iso3", not_found="UNKNOWN")
['DEU', 'UNKNOWN']

Example — dict input (same keys back)

>>> rk.bulk(values={"hq": "France", "branch": "France", "other": "Germany"}, to="iso3")
{'hq': 'FRA', 'branch': 'FRA', 'other': 'DEU'}

Example — crosswalk overrides

>>> cw = rk.Crosswalk.from_dict({"Congo": "country/COG", "Atlantis": rk.IGNORE})
>>> rk.bulk(values=["Congo", "Atlantis", "France"], to="iso3", crosswalk=cw)
['COG', None, 'FRA']

Note

All parameters to bulk() are keyword-only. There is no positional values shortcut.

`snap`¶

rk.snap(
    *,
    query: str,
    candidates: list[str],
    max_distance: float = 0.5,
    to: Any = None,
    domain: str | list[str] | None = None,
    context: ResolutionContext | None = None,
) -> Any

Return the closest matching candidate from a caller-supplied list, or None when nothing clears the threshold.

snap is for constrained matching: you already know the valid options and want to map a messy input onto one of them. It differs from resolve, which searches the full installed catalog.

Parameters

Name	Meaning
`query`	The string to match.
`candidates`	Entity IDs or names to match against (e.g. `["country/TZA", "country/ZMB"]`).
`max_distance`	Confidence floor. Candidates below this threshold are rejected. Default `0.5`.
`to`	Pivot the matched entity (same semantics as `resolve`).
`domain`	Domain filter.
`context`	Resolution hints.

Returns The best matching candidate (entity ID or pivot value), or None when below threshold.

Example

>>> rk.snap(query="Tanzanya", candidates=["country/TZA", "country/ZMB", "country/KEN"])
'country/TZA'
>>> rk.snap(query="Zzzzzzz", candidates=["country/TZA", "country/ZMB", "country/KEN"])
None

Heads up

snap works reliably when candidates contains entity IDs (e.g. "country/TZA"). Passing plain name strings (e.g. "Tanzania") will likely return None at the default threshold — the resolver resolves each candidate name first, and near-miss names don't always clear 0.5 confidence. Use entity IDs for predictable results.

`entity`¶

rk.entity(
    text_or_id: str | None = None,
    *,
    alpha_2: str | None = None,
    alpha_3: str | None = None,
    numeric: str | None = None,
    iso2: str | None = None,
    iso3: str | None = None,
    dcid: str | None = None,
    domain: str | list[str] | None = None,
    **code_kwargs: str,
) -> EntityRecord | None

Look up a fully hydrated EntityRecord by name, entity ID, or code. Returns None when no match is found.

Parameters

Name	Meaning
`text_or_id`	Name or entity ID (e.g. `"France"` or `"country/FRA"`).
`iso2`	ISO 3166-1 alpha-2 code.
`iso3`	ISO 3166-1 alpha-3 code.
`dcid`	Data Commons entity ID.
`alpha_2`, `alpha_3`, `numeric`	pycountry-compatible aliases for `iso2`, `iso3`, and the numeric code.
`**code_kwargs`	Any other code system by name (e.g. `wikidata="Q30"`).
`domain`	Domain filter.

Pass exactly one lookup — text_or_id or one code kwarg. Passing two code kwargs raises ValueError.

Returns EntityRecord or None.

Example

>>> rk.entity("France").entity_id
'country/FRA'
>>> rk.entity(iso2="JP").canonical_name
'Japan'
>>> rk.entity(wikidata="Q30").entity_id
'country/USA'

`modules`¶

rk.modules() -> list[ModuleInfo]

Return the full module catalog, sorted by module_id.

Each entry carries identity metadata and cache state. Bundled modules are always is_available=True. Remote modules are is_available=True only when their data is on disk.

Note

On a fresh pip install, all six remote geo modules (geo.admin1 through geo.cities) report is_available=False until you call download.

Example

>>> [(m.module_id, m.distribution, m.is_available) for m in rk.modules()]
[
  ('geo.admin1', 'remote', False),
  ('geo.admin2', 'remote', False),
  ...
  ('geo.countries', 'bundled', True),
  ('org.companies', 'bundled', True),
  ...
]

ModuleInfo fields

Field	Type	Meaning
`module_id`	`str`	Dot-separated identifier, e.g. `"geo.countries"`.
`domain`	`str`	Domain pack ID, e.g. `"geo"`.
`entity_types`	`tuple[str, ...]`	Entity types this module covers.
`distribution`	`"bundled" \| "remote"`	How the data ships.
`is_available`	`bool`	Whether data is usable now without a download.
`size_mb`	`float \| None`	Uncompressed on-disk size. `None` for uncached remote modules.
`download_size_mb`	`float \| None`	Compressed download size. `None` for bundled modules.
`remote_url`	`str \| None`	Download URL.
`data_version`	`str \| None`	CalVer string for the module's data, e.g. `"2026.06"`.
`cache_path`	`Path \| None`	On-disk path when cached; `None` otherwise.

`download`¶

rk.download(target: str, *, force: bool = False) -> dict[str, Path]

Download remote module data to the local cache.

Parameters

Name	Meaning
`target`	Module ID (`"geo.cities"`) or domain (`"geo"`) to download all modules in that domain.
`force`	Re-download even if already cached. Default `False`.

Returns dict[str, Path] mapping module_id → cache_path for each downloaded module.

See managing data packs for download patterns and offline configuration.

`download_all`¶

rk.download_all(*, force: bool = False) -> dict[str, Path]

Download all installed remote modules.

Parameters

Name	Meaning
`force`	Re-download even if already cached. Default `False`.

Returns dict[str, Path] mapping module_id → cache_path.

`configure`¶

rk.configure(
    *,
    auto_download: bool | None = None,
    cache_dir: str | Path | None = None,
    default_to: str | list[str] | None = None,
    on_missing: Literal["raise", "null", "auto"] = UNSET,
) -> None

Set runtime defaults and discard the singleton resolver so the next call rebuilds with the new configuration.

Parameters

Name	Meaning
`auto_download`	When `True`, remote packs are downloaded automatically on first use. Default `False`.
`cache_dir`	Custom directory for cached remote data.
`default_to`	Default output code system or name variant applied to every subsequent `resolve()`, `bulk()`, and `snap()` call. A string (`"iso3"`) or a list of strings for a fallback chain (`["iso3", "name"]`). `None` clears the default, restoring the legacy `ResolutionResult` return.
`on_missing`	Miss policy when the configured output chain has no value for a resolved entity. Omitting leaves any previously configured policy unchanged. `"auto"` = raises `OutputMissingError` for scalar `resolve()`/`snap()`, returns `None` with a `UserWarning` for `bulk()`; `"raise"` = always raise; `"null"` = always return `None`.

Raises

UnknownOutputError — When default_to contains a malformed name: grammar token. Also raised immediately when a singleton resolver already exists and default_to names an unknown code system (deferred to next resolver build otherwise).

Example

import resolvekit as rk

rk.configure(default_to="iso3")
rk.resolve("France")         # → "FRA"
rk.bulk(values=["France", "Germany"], to="name")  # per-call to= overrides default

rk.configure(default_to=["iso3", "name"], on_missing="null")
rk.resolve("France")         # → "FRA"
rk.resolve("zzznotacountry") # → None  (no raise)

rk.configure(default_to=None)  # clear — resolves return ResolutionResult again

`to`¶

rk.to(
    output: str | list[str],
    *,
    on_missing: Literal["raise", "null", "auto"] = "auto",
) -> OutputView

Return an OutputView bound to the given output spec, using the singleton default resolver.

All resolution methods on the returned view apply output automatically — no need to pass to= on every call. The view is a lightweight forwarding object; it holds a reference to the same underlying resolver.

Parameters

Name Meaning

output Target code system or name variant (e.g. "iso3", ["iso3", "name"], "name:fr"). The name grammar accepts name, name:<lang>, name:<kind>, and optionally name:<kind>:<script>. <kind> ∈ {canonical, alias, endonym, exonym, acronym} (abbr is accepted as an alias for acronym). Kind tokens resolve only for packs that carry the corresponding name kind; the bundled packs provide canonical names, per-language names (en, fr, es, de, ru, ja, it, pt, zh, ar), and aliases.

on_missing Miss policy for the output chain. "auto" (default) = raise for scalar resolve()/snap(), null + UserWarning for bulk(); "raise" = always raise OutputMissingError; "null" = always return None.

Returns — OutputView.

Raises

UnknownOutputError — When output contains a malformed token or names an unknown code system.

Example

import resolvekit as rk

iso3 = rk.to("iso3")
iso3.resolve("France")          # → "FRA"
iso3.resolve_id("France")       # → "country/FRA"  (entity ID, not pivoted)
iso3.bulk(values=["France", "Germany"])  # → ["FRA", "DEU"]

`clear_cache`¶

rk.clear_cache(target: str | None = None) -> None

Remove cached module data from disk.

Parameters

Name	Meaning
`target`	Module ID to clear, or `None` to clear all.

`reset`¶

rk.reset() -> None

Close and discard the singleton resolver. The next call to any resolution function constructs a fresh one via Resolver.auto().

`default`¶

rk.default() -> Resolver

Return the singleton Resolver instance, creating it on first call. The same object is reused until reset() or configure() is called.

`parse`¶

rk.parse(
    text: str,
    *,
    to: str | list[str] | None = None,
    include_nil: bool = False,
    domain: str | list[str] | None = None,
    context: ResolutionContext | None = None,
    confidence_threshold: float | None = None,
    timeout: float | None = None,
) -> ParseResult

Extract and link every pack-known entity mention in free text, returning character offsets and calibrated confidence for each span. Detection is dictionary-first over the loaded packs.

Heads up

Requires the [parsing] extra: pip install 'resolvekit[parsing]'. Without it, calling parse() raises ImportError.

Parameters

Name	Meaning
`text`	Free-text input to scan.
`to`	Pivot each resolved entity to a code system (e.g. `"iso3"`). The pivot value is stored in `ParsedEntity.output`.
`include_nil`	When `True`, below-threshold detected spans are included in the result with `status="no_match"` instead of going to `dropped_spans`. Default `False`.
`domain`	Restrict entity matching to one or more domains.
`context`	Resolution hints broadcast to every candidate span.
`confidence_threshold`	Minimum calibrated confidence to accept a match. `None` uses each pack's built-in threshold.
`timeout`	Soft per-call time budget in seconds. `None` = no limit.

Returns ParseResult.

Raises ImportError when the [parsing] extra is not installed.

Example

import resolvekit as rk   # pip install 'resolvekit[parsing]'

result = rk.parse("The summit in Nairobi gathered leaders from Kenya, Uganda and the United States.")
for e in result:
    if e.entity_id:
        print(f"{e.surface!r} [{e.start}:{e.end}] -> {e.entity_id} ({e.entity_type}) {e.confidence:.2f}")
# 'Kenya' [44:49] -> country/KEN (geo.country) 0.91
# 'Uganda' [51:57] -> country/UGA (geo.country) 0.91
# 'the United States' [62:79] -> country/USA (geo.country) 0.91

# Nairobi is NOT detected on a fresh install — cities are a remote pack, not bundled.
[(d.surface, d.reason) for d in result.dropped_spans]   # e.g. [('and', 'code_case_mismatch'), ('in', 'deny_list')]

# Pivot each linked entity to a code:
[(e.surface, e.output) for e in rk.parse("Travel from France to Brazil", to="iso3")]
# [('France', 'FRA'), ('Brazil', 'BRA')]

`parse_bulk`¶

rk.parse_bulk(
    *,
    values: list[str] | tuple | pd.Series | pl.Series,
    to: str | list[str] | None = None,
    include_nil: bool = False,
    domain: str | list[str] | None = None,
    context: ResolutionContext | None = None,
    confidence_threshold: float | None = None,
    timeout: float | None = None,
) -> ParseResult

Extract entities from a collection of text strings. Each ParsedEntity carries a row_idx field identifying its source row.

Heads up

Requires the [parsing] extra: pip install 'resolvekit[parsing]'. Without it, raises ImportError.

Parameters

Name	Meaning
`values`	Collection of text strings to scan. Accepts a list, tuple, pandas `Series`, or polars `Series`.
`to`	Pivot each resolved entity. Stored in `ParsedEntity.output`.
`include_nil`	Include below-threshold spans in the result. Default `False`.
`domain`	Domain filter, broadcast to every row.
`context`	Resolution hints, broadcast to every row.
`confidence_threshold`	Minimum calibrated confidence to accept a match. `None` uses each pack's built-in threshold.
`timeout`	Soft per-call time budget in seconds. `None` = no limit.

Returns ParseResult.

Raises

ImportError — When the [parsing] extra is not installed.
TypeError — When values is not a list, tuple, pd.Series, or pl.Series.

Example

import resolvekit as rk   # pip install 'resolvekit[parsing]'

df = rk.parse_bulk(values=["Visited Kenya and Peru", "Meeting in Japan"]).to_dataframe()
list(df.columns)
# ['row_idx', 'surface', 'entity_id', 'entity_type', 'pack_id', 'status', 'confidence', 'start', 'end', 'to']

Models¶

`ResolutionResult`¶

Frozen Pydantic model. Returned by resolve() (when to is not set) and by the Resolver class methods.

Fields

Field	Type	Meaning
`status`	`ResolutionStatus`	Always set — never `None`.
`entity_id`	`str \| None`	Resolved entity ID (e.g. `"country/USA"`). Present only when `status == RESOLVED`.
`confidence`	`float \| None`	Calibrated confidence in `[0, 1]`. Present only when `status == RESOLVED`.
`entity`	`EntityRecord` `\| None`	Populated when `include_entity=True`.
`pack_id`	`str \| None`	Domain pack that produced the result (e.g. `"geo"`).
`match_tier`	`str \| None`	How the match was found: `"exact_code"`, `"exact_name"`, `"acronym"`, `"fts"`, `"fuzzy"`, or `"fallback"`.
`candidates`	`list[CandidateSummary]`	Top candidates (up to 10), including the winner on a resolved result.
`reasons`	`list[str]`	Reason codes explaining the outcome (e.g. `["exact_code_match"]`). Currently always a single-element list.
`refinement_hints`	`list[str]`	Suggestions for a retry that would likely succeed (e.g. `["entity_types"]`).
`query_text`	`str \| None`	The original input text as seen by the resolver.

Convenience properties (delegate to entity; return None when entity is not populated)

Property	Returns
`.iso2`	ISO 3166-1 alpha-2 code
`.iso3`	ISO 3166-1 alpha-3 code
`.name`	Canonical name
`.flag`	Flag emoji
`.is_resolved`	`True` when `status == RESOLVED`
`.is_ambiguous`	`True` when `status == AMBIGUOUS`
`.best_candidate`	Highest-confidence `CandidateSummary`, or `None`

Methods

Method	Returns	Meaning
`.top_candidates(n=3)`	`list[CandidateSummary]`	Top n candidates by confidence.
`.explain(verbosity="standard")`	`Scorecard`	Re-run with full tracing. Verbosity: `"minimal"`, `"standard"`, `"full"`. Raises `ExplainNotAvailableError` on detached results.
`.to_dict()`	`dict`	JSON-serializable dict (delegates to `model_dump()`).
`.to_json(indent=None)`	`str`	JSON string.

Example

>>> r = rk.resolve("United States")
>>> r.status
<ResolutionStatus.RESOLVED: 'resolved'>
>>> r.entity_id
'country/USA'
>>> r.confidence
≈0.91
>>> r.is_resolved
True
>>> r.reasons
[<ReasonCode.EXACT_NAME_MATCH: 'exact_name_match'>]

Explain

>>> print(r.explain(verbosity="full").as_text())
Resolution Scorecard
============================================================
Query: "United States"
Normalized: "united states"
Status: RESOLVED
Entity: country/USA
Confidence: 93.3%
Reasons: exact_name_match
Pack: geo
Match Tier: exact_name

`EntityRecord`¶

Frozen Pydantic model. Domain-neutral entity representation.

Fields

Field	Type	Meaning
`entity_id`	`str`	Unique identifier, e.g. `"country/DEU"`.
`entity_type`	`str`	Type string, e.g. `"geo.country"`.
`canonical_name`	`str`	Primary display name.
`names`	`list[NameRecord]`	All name records, including aliases in multiple languages.
`codes`	`list[CodeRecord]`	Code identifiers (ISO, Wikidata, DCID, etc.).
`relations`	`list[RelationRecord]`	Relations to other entities (containment, membership).
`attributes`	`dict[str, str \| int \| float \| bool]`	Domain-specific lightweight attributes (e.g. `prominence`).
`valid_from`, `valid_until`	`date \| None`	Validity window.

Convenience properties

Property	Returns
`.name`	Same as `canonical_name`.
`.iso2`	ISO 3166-1 alpha-2 code, or `None`.
`.iso3`	ISO 3166-1 alpha-3 code, or `None`.
`.numeric`	ISO 3166-1 numeric code string, or `None`.
`.flag`	Flag emoji derived from `iso2`, or `None`.
`.continent`	Continent name from `attributes`, or `None`.
`.aliases`	Non-preferred name strings, in declaration order.
`.codes_dict`	`{system: value}` mapping from all `CodeRecord` entries.

Methods

Method	Returns	Meaning
`.code(system)`	`str \| None`	Code value for a named system, e.g. `.code("wikidata")`.
`.attribute(key, default=None)`	`Any`	Attribute by key, with optional default.
`.to(system)`	`Any`	Pivot to a code system or computed property. Raises `UnknownCodeSystemError` if the system isn't recognized.

Example

>>> e = rk.entity("Germany")
>>> e.entity_id
'country/DEU'
>>> e.iso3
'DEU'
>>> e.code("wikidata")
'Q183'
>>> e.aliases[:3]
['Alemanha', 'Alemania', 'Allemagne']
>>> e.attribute("source")
'datacommons'

Reading relations

>>> e = rk.entity("Germany")
>>> [(r.relation_type, r.target_id) for r in e.relations if r.relation_type == "member_of"][:3]
[('member_of', 'EuropeanUnion'), ('member_of', 'groups/NATO'), ('member_of', 'groups/G7')]

Each edge exposes .relation_type, .target_id, .valid_from, and .valid_until (all duck-typed; RelationRecord is not a public import).

Traversing relations

Reading entity.relations and calling rk.entity(rel.target_id) directly can return None — some target_id values (e.g. "WesternEurope", "geoId/06") don't resolve in the bundled packs. Use Resolver.related() to get resolved entities only:

r = rk.default()
parents = r.related("country/DEU", relation="contained_in")
for parent in parents:
    print(parent.canonical_name)

Unresolvable targets are omitted. To see which targets are dangling, use r.diagnostics.unresolved_relations():

for edge in r.diagnostics.unresolved_relations("country/DEU", relation="contained_in"):
    print(edge["target_id"], "is dangling")

See The entity graph for the full edge-type vocabulary and traversal recipes.

`OutputView`¶

Frozen dataclass. Returned by rk.to() and Resolver.to(). Binds a fixed output spec to a resolver so every call through it returns the configured representation without repeating to=.

from resolvekit import Resolver

view = Resolver.auto().to("iso3")
view.resolve("France")       # → "FRA"
view.resolve_id("France")    # → "country/FRA"  (entity ID, never pivoted)
view.bulk(values=["France", "Germany"])  # → ["FRA", "DEU"]

Methods

Method	Returns	Notes
`.resolve(text, *, as_result=False, domain=None, context=None, from_system=None, timeout=None)`	`str \\| None` or `ResolutionResult`	Applies the bound spec. `as_result=True` returns a raw `ResolutionResult`.
`.resolve_id(text, *, on_ambiguous="raise", from_system=None, domain=None, context=None, timeout=None)`	`str \\| None`	Always returns entity ID — the bound output spec has no effect here.
`.bulk(*, values, on_missing=UNSET, output="series", domain=None, context=None, from_system=None, not_found="null", on_error="raise", on_ambiguous="null")`	native shape or `BulkResult`	Applies the bound spec to every row.
`.snap(*, query, candidates, max_distance=0.5, domain=None, context=None)`	`str \\| None`	Returns the closest match pivoted to the bound output.

OutputView is not exported from resolvekit directly; it is the return type of rk.to() and Resolver.to().

`ResolutionContext`¶

Frozen Pydantic model. Pass to any resolution function to narrow or constrain matching.

from resolvekit import ResolutionContext

Fields

Field	Type	Meaning
`as_of`	`date \| None`	Resolve against entities valid at this date.
`entity_types`	`frozenset[str] \| None`	Restrict to specific entity types (e.g. `{"geo.country"}`). Must be a collection — a bare string raises `ValueError`.
`parent_ids`	`list[str] \| None`	Restrict to entities contained within these parent IDs.
`country`	`str \| None`	ISO 3166-1 alpha-2 country code hint. Max 2 characters.
`languages`	`list[str] \| None`	Preferred language codes for name matching.
`attributes`	`dict`	Escape hatch for domain-specific hints.

Methods

Method	Returns	Meaning
`.replace(**updates)`	`ResolutionContext`	Return a new instance with specified fields updated. Full validation is run on the result.

Example

>>> from datetime import date
>>> from resolvekit import ResolutionContext
>>> ctx = ResolutionContext(country="US", entity_types={"geo.state"})
>>> ctx.replace(as_of=date(2020, 1, 1))
ResolutionContext(as_of=datetime.date(2020, 1, 1), entity_types=frozenset({'geo.state'}), parent_ids=None, country='US', languages=None, attributes={})

`BulkResult`¶

Frozen dataclass. Returned by bulk() when to=None.

Attributes

Attribute	Type	Meaning
`values`	`Any`	Native output shape (`pd.Series`, `pl.Series`, `list`, etc.).
`source`	`Sequence[ResolutionResult]`	Per-row `ResolutionResult` instances.
`kind`	`"pandas" \| "polars" \| "numpy" \| "list" \| "tuple" \| "dict"`	Which native shape is in `values`. `"dict"` when `bulk` was called with a `dict`.

Methods

Method	Returns	Meaning
`.summary()`	`ResolutionSummary`	Per-status counts: `total`, `resolved`, `ambiguous`, `no_match`, `error`.
`.failures`	`BulkResult`	Sub-result containing only non-RESOLVED rows.
`.unnest()`	`pd.DataFrame \| pl.DataFrame \| list[dict]`	Flatten `source` into columns: `status`, `entity_id`, `confidence`, `pack_id`, `query_text`.
`.to_list()`	`list`	Convert `values` to a plain Python list.
`.to_pandas()`	`pd.Series`	Convert to pandas. Raises `ImportError` if pandas isn't installed.
`.to_polars()`	`pl.Series`	Convert to polars.
`.to_review(path, *, top_n=3)`	`None`	Write the ambiguous and no-match unique values (deduplicated) to a CSV for human review, each with up to `top_n` candidates. Resolved rows are omitted; an all-resolved result writes a header-only file. `Added in v0.1.`
`.to_crosswalk(review=None, *, strict=True)`	`Crosswalk`	Build a complete crosswalk from this result's resolved rows, optionally merged with a filled review file (`review=` path). The filled `chosen` column overrides the auto-resolved entries. `Added in v0.1.`
`.explain(verbosity="standard")`	`list[Scorecard \| None]`	Scorecards for every row; `None` for detached rows.

BulkResult supports len(), iteration, and integer/slice indexing (over source).

Review round-trip

>>> result = rk.bulk(values=["France", "Congo", "Atlantis"], to=None)
>>> result.to_review("review.csv")          # writes Congo (ambiguous) + Atlantis (no_match)
>>> # ... a human fills the `chosen` column ...
>>> crosswalk = result.to_crosswalk(review="review.csv")
>>> rk.bulk(values=["France", "Congo", "Atlantis"], to="iso3", crosswalk=crosswalk)
['FRA', 'COG', None]

See how to reconcile a column with a review for the full workflow and CSV formats.

Example

>>> import pandas as pd
>>> br = rk.bulk(values=pd.Series(["Germany", "France", "zzznotacountry"]), to=None)
>>> br
BulkResult(total=3, resolved=2, no_match=1, ambiguous=0, error=0, kind='pandas')
>>> br.summary()
ResolutionSummary(total=3, resolved=2, ambiguous=0, no_match=1, error=0)
>>> br.unnest()[["status", "entity_id", "confidence"]]
     status    entity_id  confidence
0  resolved  country/DEU       ≈0.91
1  resolved  country/FRA       ≈0.91
2  no_match         None         NaN

`Crosswalk`¶

Added in v0.1.

Frozen value-object: a complete value → entity_id table that overrides resolution when passed to bulk(crosswalk=…). Build one by hand with from_dict, load a saved one with from_csv, or have BulkResult.to_crosswalk() assemble it from a review round-trip.

from resolvekit import Crosswalk, IGNORE
# or: rk.Crosswalk, rk.IGNORE

Constructors

Crosswalk.from_dict(mapping, *, strict=True) — mapping is dict[str, str | None] mapping each input value to an entity ID (e.g. "country/COG") or to IGNORE (equivalently None) to map it to no output. Entity IDs are validated structurally (must be pack/code); existence is checked later, at bulk time.
Crosswalk.from_csv(path, *, strict=True) — load a crosswalk written by to_csv. Columns must be exactly value,entity_id; an IGNORE token or empty entity_id cell is read as ignore.

Parameters

strict (bool, default True) — carried on the instance and applied when the crosswalk is used. With strict=True, an entity ID that no loaded pack carries raises CrosswalkError at bulk time. With strict=False, such a value becomes a per-value miss (follows not_found).

Methods

Method	Returns	Meaning
`.to_csv(path)`	`None`	Write the table to a `value,entity_id` CSV. `IGNORE` entries are written as the literal token `IGNORE`.
`len(cw)`	`int`	Number of entries.
`value in cw`	`bool`	Whether a value is in the table.

Raises

ValueError — at construction, when an entry's value isn't None/IGNORE or a well-formed pack/code entity ID, or (from from_csv) when columns are missing or a value is duplicated.

Example

>>> cw = rk.Crosswalk.from_dict({"Congo": "country/COG", "Atlantis": rk.IGNORE})
>>> len(cw), "Congo" in cw
(2, True)
>>> cw.to_csv("crosswalk.csv")
>>> rk.Crosswalk.from_csv("crosswalk.csv")  # round-trips, including IGNORE

CSV format written by to_csv:

value,entity_id
Congo,country/COG
Atlantis,IGNORE

`IGNORE`¶

Added in v0.1.

Sentinel marking a Crosswalk entry that maps a value to no output (None). Use it in from_dict; in a CSV it's the literal token IGNORE. Importable as rk.IGNORE or from resolvekit import IGNORE.

>>> cw = rk.Crosswalk.from_dict({"placeholder row": rk.IGNORE})
>>> rk.bulk(values=["placeholder row"], to="iso3", crosswalk=cw)
[None]

`ParseResult`¶

Returned by parse() and parse_bulk(). Iterable over ParsedEntity instances.

Attributes

Attribute	Type	Meaning
`dropped_spans`	`list[DroppedSpan]`	Spans detected but filtered out before linking. `DroppedSpan` is a `NamedTuple` with fields `surface`, `start`, `end`, `pack_id`, `reason` — where `reason` is one of `short_input`, `sentinel`, `word_boundary`, `below_threshold`, `deny_list`, `code_case_mismatch`.

Methods

Method	Returns	Meaning
`__iter__`	`Iterator[ParsedEntity]`	Iterate over linked entities.
`__len__`	`int`	Number of linked entities.
`.to_dataframe()`	`pd.DataFrame`	Columns: `row_idx`, `surface`, `entity_id`, `entity_type`, `pack_id`, `status`, `confidence`, `start`, `end`, `to`. Requires `pandas`.

`ParsedEntity`¶

Represents a single entity mention extracted by parse() or parse_bulk().

Fields

Field	Type	Meaning
`surface`	`str`	The literal text span as it appears in the input.
`start`	`int`	Start character offset (inclusive).
`end`	`int`	End character offset (exclusive).
`entity_id`	`str \| None`	Resolved entity ID, e.g. `"country/KEN"`.
`entity_type`	`str \| None`	Entity type, e.g. `"geo.country"`.
`pack_id`	`str \| None`	Domain pack that produced the match.
`status`	`str`	Resolution status string (e.g. `"resolved"`, `"no_match"`).
`confidence`	`float \| None`	Calibrated confidence in `[0, 1]`.
`resolution`	`ResolutionResult \| None`	Full resolution result — call `.resolution.explain()` for a detailed scorecard.
`output`	`Any`	The `to=` pivot value, or `None` when `to` was not set.
`row_idx`	`int \| None`	Source-row index when produced by `parse_bulk()`; `None` for single-text `parse()`.

`ResolutionStatus`¶

StrEnum with four values. Every ResolutionResult.status is one of these — it's never None.

Value	String	Meaning
`RESOLVED`	`"resolved"`	A single entity matched with sufficient confidence.
`AMBIGUOUS`	`"ambiguous"`	Multiple plausible matches, none dominant.
`NO_MATCH`	`"no_match"`	No candidates found, or all below threshold.
`ERROR`	`"error"`	Internal pipeline error.

`SuggestionResult`¶

Frozen Pydantic model. One ranked suggestion returned by Resolver.suggest(). suggest() returns a list[SuggestionResult], best-first. Not root-exported; import the type from resolvekit.core.model:

from resolvekit.core.model import SuggestionResult, MatchClass

Fields

Field	Type	Meaning
`entity_id`	`str`	The matched entity ID, e.g. `"country/USA"`.
`canonical_name`	`str \| None`	The entity's canonical name.
`entity_type`	`str \| None`	Entity type string, e.g. `"geo.country"`.
`pack_id`	`str \| None`	Pack that produced the candidate.
`match_class`	`MatchClass`	How the candidate was found.
`fuzzy_score`	`float \| None`	Raw RapidFuzz `partial_ratio` (0–100). `None` unless `match_class == FUZZY`. A similarity score, not a calibrated confidence.
`ranking_quality`	`"ranked" \| "unranked"`	Tier-based honesty hint about the sort. `"ranked"` for tiers with prominence data — `geo.country` and the region tiers (`geo.subregion`, `geo.region`, `geo.continental_union`); `"unranked"` otherwise (continents, organizations, admin/city — match-class + alphabetical). Tier-based, not per-candidate: a country with no prominence value still reports `"ranked"`.
`display`	`str \| None`	The `to=`-rendered output string, or `canonical_name` when no `to=` was set. `None` on an output miss.
`highlight_ranges`	`list[tuple[int, int]]`	Unicode code-point offsets (not UTF-16), end-exclusive, into `display`. Empty for fuzzy matches and for matches that hit an alias rather than `display`. JS/browser callers must convert offsets.

Example

>>> from resolvekit import Resolver
>>> r = Resolver.lite()
>>> s = r.suggest("germany", top_k=1)[0]
>>> s.entity_id
'country/DEU'
>>> s.match_class
<MatchClass.EXACT_PREFIX: 'exact_prefix'>
>>> s.ranking_quality
'ranked'
>>> s.display
'Germany'
>>> s.highlight_ranges
[(0, 7)]

`MatchClass`¶

StrEnum. Reports how a SuggestionResult candidate was matched. Not root-exported:

from resolvekit.core.model import MatchClass

The four values, in ranking order (best first):

Value	String	Meaning
`EXACT_PREFIX`	`"exact_prefix"`	The display/name starts with the query.
`TOKEN_PREFIX`	`"token_prefix"`	A word/token inside the name starts with the query.
`INFIX`	`"infix"`	The query appears mid-name.
`FUZZY`	`"fuzzy"`	A RapidFuzz near-match (typo tolerance).

suggest() sorts by a lexicographic cascade: (match_class, whole-name match, typo_count, -prominence, name-kind, name length, entity_id). So an exact prefix outranks a fuzzy match; among ties, an entity whose complete name was typed (e.g. an acronym like "EU") wins, then fewer typos, then more-prominent entities (where the tier is ranked).

`AugmentResult`¶

Returned by Resolver.augment(..., return_report=True). See the Resolver reference for the full augment signature.

Fields

Field	Type	Meaning
`resolver`	`Resolver`	The updated resolver with new attributes/codes attached.
`linked`	`int`	Rows that matched an existing entity and had their data merged in.
`minted`	`int`	Rows that matched no entity and were created as new entities (`on_miss="mint"`).
`skipped`	`int`	Rows that matched no entity and were dropped (`on_miss="skip"`).
`ambiguous`	`int`	Rows where the link key matched more than one entity.
`errors`	`list[str]`	Diagnostic messages for rows that raised during linking.

Heads up

augment requires a single-domain base. link_on accepts code systems (iso3, dcid, wikidata, …) and the "name" sentinel. Code-based linking is case-insensitive on the code value (the shared casefold normaliser applies, so "FRA", "fra", and "Fra" all link). Name-based linking (link_on=["name"]) requires at least one of add_aliases or add_codes to identify the name column.

Example

from resolvekit import Resolver

base = Resolver.from_records(
    [{"id": "KE", "label": "Kenya", "iso3": "KEN"},
     {"id": "UG", "label": "Uganda", "iso3": "UGA"}],
    domain="custom", name="label", id="id", codes={"iso3": "iso3"},
)
report = base.augment(
    [{"iso3": "KEN", "pop_m": 55}],
    link_on=["iso3"],
    add_attrs=["pop_m"],
    on_miss="skip",
    return_report=True,
)
report.linked, report.minted, report.skipped, report.ambiguous   # 1, 0, 0, 0
report.resolver.entity("Kenya").attributes["pop_m"]              # 55

Resolver.from_records and Resolver.augment are methods on Resolver, not module-level functions. See the Resolver reference for their full signatures.

`SentinelBlocklist`¶

from resolvekit import SentinelBlocklist

Immutable set of normalized forms that the resolver rejects before running the pipeline. Blocked inputs return NO_MATCH with reason "sentinel_blocked".

The default set covers common placeholders ("unknown", "n/a", "null", "tbd", …), junk strings ("qwerty", "lorem", …), pure-punctuation sequences ("---", "...", …), and specific pure-digit strings ("000", "999"). Strings longer than 20 characters are never blocked regardless of content.

SentinelBlocklist(
    *,
    extra: frozenset[str] | set[str] | None = None,
    replace: frozenset[str] | set[str] | None = None,
)

Parameter	Meaning
`extra`	Additional terms to block (merged with defaults). Normalized via casefold + strip.
`replace`	Replace the entire default set. When set, `extra` is ignored.

Methods

Method	Returns	Meaning
`.is_blocked(text)`	`bool`	Case-insensitive match against the blocked set.
`text in blocklist`	`bool`	Same as `.is_blocked(text)`.

Example

>>> bl = SentinelBlocklist()
>>> "unknown" in bl
True
>>> "Germany" in bl
False

>>> bl2 = SentinelBlocklist(extra={"myplaceholder"})
>>> "myplaceholder" in bl2
True

Pass a custom blocklist to Resolver.auto(sentinel_blocklist=...), or disable blocking entirely with Resolver.auto(sentinel_blocklist=None).

Errors¶

All public errors are importable from resolvekit or the dedicated resolvekit.errors namespace. The resolution errors most callers need:

from resolvekit import (
    ResolverError,
    ResolutionError,
    AmbiguousResolutionError,
    EntityNotFoundError,
    GroupNotFoundError,
)

Output-related errors introduced with configurable default output:

from resolvekit.errors import UnknownOutputError, OutputMissingError

`ResolverError`¶

Base class for all resolvekit errors. Carries an optional .hint attribute (a str | None) surfaced as a PEP 678 __notes__ entry — it appears in tracebacks automatically.

`ResolutionError(ResolverError)`¶

A resolution attempt did not produce a usable result.

Attribute	Type	Meaning
`.status`	`ResolutionStatus`	The status that triggered the error.
`.candidates`	`list[CandidateSummary]`	Available candidates (may be empty).

`AmbiguousResolutionError(ResolutionError)`¶

Raised by resolve_id() (default on_ambiguous="raise") when multiple entities match.

Attribute	Type	Meaning
`.candidates`	`list[CandidateSummary]`	The ambiguous candidates.

Example

from resolvekit import AmbiguousResolutionError

try:
    rk.resolve_id("Congo")
except AmbiguousResolutionError as e:
    entity_ids = [c.entity_id for c in e.candidates]
    # ['country/COD', 'country/COG']

`GroupNotFoundError(ResolutionError)`¶

Raised by Resolver.members_of() and Resolver.is_member() when the group string resolves to no entity.

`EntityNotFoundError(ResolutionError)`¶

Raised by Resolver.related() and Resolver.diagnostics.unresolved_relations() when the string entity_or_id argument matches no entity.

import resolvekit as rk
from resolvekit import EntityNotFoundError

try:
    rk.default().related("NoSuchPlaceXYZ")
except EntityNotFoundError as e:
    print(e)

`UnknownOutputError(ValueError, ResolverError)`¶

Raised at configuration or compile time when default_to (or a per-call to=) contains a malformed token or names a code system that no loaded pack carries.

from resolvekit.errors import UnknownOutputError

Attribute	Type	Meaning
`.token`	`str`	The unrecognised token.
`.available`	`list[str]`	Code and pivot names available in the relevant scope.

Carries .hint with difflib did-you-mean suggestions.

`OutputMissingError(ResolverError)`¶

Raised at runtime when a resolved entity (and the full fallback chain) has no value for the requested output, under on_missing="raise" (or on_missing="auto" for scalar resolve()/snap()).

from resolvekit.errors import OutputMissingError

Attribute	Type	Meaning
`.entity_id`	`str`	The entity that was resolved but lacked the output.
`.requested`	`str`	The output token that was requested (last in the fallback chain).
`.available_codes`	`list[str]`	Code systems the entity does carry.

Carries .hint listing the available codes.

`CrosswalkError(ResolverError)`¶

Added in v0.1.

Raised by bulk when a Crosswalk built with strict=True (the default) maps one or more values to entity IDs that no loaded pack carries — typically a crosswalk saved before a data rebuild that changed IDs.

from resolvekit.errors import CrosswalkError

Attribute	Type	Meaning
`.offenders`	`list[str]`	The unknown entity IDs found in the crosswalk.

Carries .hint pointing to strict=False as the way to downgrade unknown IDs to per-value misses. Rebuild the crosswalk with Crosswalk.from_dict(..., strict=False) / from_csv(..., strict=False) to apply it.

Next¶

Resolver class reference — constructors (auto, lite, from_modules), concurrency notes, and the full method list including members_of, diagnostics, and context-manager protocol.

Convert between code systems — end-to-end patterns for from_system / to pivots and bulk column normalization.

Module-level API¶

Functions¶

resolve¶

resolve_id¶

bulk¶

snap¶

entity¶

modules¶

download¶

download_all¶

configure¶

to¶

clear_cache¶

reset¶

default¶

parse¶

parse_bulk¶

Models¶

ResolutionResult¶

EntityRecord¶

OutputView¶

ResolutionContext¶

BulkResult¶

Crosswalk¶

IGNORE¶

ParseResult¶

ParsedEntity¶

ResolutionStatus¶

SuggestionResult¶

MatchClass¶

AugmentResult¶

SentinelBlocklist¶

Errors¶

ResolverError¶

ResolutionError(ResolverError)¶

AmbiguousResolutionError(ResolutionError)¶

GroupNotFoundError(ResolutionError)¶

EntityNotFoundError(ResolutionError)¶

UnknownOutputError(ValueError, ResolverError)¶

OutputMissingError(ResolverError)¶

CrosswalkError(ResolverError)¶

Next¶

`resolve`¶

`resolve_id`¶

`bulk`¶

`snap`¶

`entity`¶

`modules`¶

`download`¶

`download_all`¶

`configure`¶

`to`¶

`clear_cache`¶

`reset`¶

`default`¶

`parse`¶

`parse_bulk`¶

`ResolutionResult`¶

`EntityRecord`¶

`OutputView`¶

`ResolutionContext`¶

`BulkResult`¶

`Crosswalk`¶

`IGNORE`¶

`ParseResult`¶

`ParsedEntity`¶

`ResolutionStatus`¶

`SuggestionResult`¶

`MatchClass`¶

`AugmentResult`¶

`SentinelBlocklist`¶

`ResolverError`¶

`ResolutionError(ResolverError)`¶

`AmbiguousResolutionError(ResolutionError)`¶

`GroupNotFoundError(ResolutionError)`¶

`EntityNotFoundError(ResolutionError)`¶

`UnknownOutputError(ValueError, ResolverError)`¶

`OutputMissingError(ResolverError)`¶

`CrosswalkError(ResolverError)`¶