We're updating our knowledge graph!
All services are available, but you might observe degraded performance. Sorry for the inconvenience.

URI Transmutation

This page is a work in progress. Please contact us if you have any questions.

URI transmutation is the process of converting any URI into a set of equivalent URIs, equivalence being defined as directly or indirectly identifying the same resource.

URI normalization

The first step of the transmutation process is to normalize the input URI. URI normalization (a.k.a. canonicalization or standardization) is defined in part in the RFC for the generic syntax of URIs, but most rules depend on the scheme of the input URI. For example, DOIs are case insensitive, and ORCID iDs should be hyphenated.

URI equivalence

Two URIs are considered to be equivalent if they identify the same resource.

For some URIs, equivalence can be computed on the fly. For others, equivalence needs to be learned from a database. For example, ORCID iDs are assigned from a well-documented, reserved block of ISNI identifiers, so all ORCID iDs are valid ISNI identifiers, and we know which ISNI identifiers can be converted to ORCID iDs. On the other hand, even if pmid:23193287, pmcid:3531190, and doi:10.1093/NAR/GKS1195 refer to the same publication, the URIs have nothing in common and we must learn the relation from data made available by PubMed.

URI interpolation

The URI transmutation API will soon include an option to disable interpolations. Please contact us if you are interested.

There is one small class of equivalence rules that require special attention, because they make simplifying assumptions on URIs and URI equivalence. We call them interpolations because they allow us to simplify the transmutation process and improve the user experience, while not affecting any of the important conclusions drawn from the data.

The transmutation API currently makes the following interpolations:

  • Any HTTP URI is considered to be equivalent to the same URI with the HTTPS scheme, and vice versa;
  • For HTTP and HTTPS URIs, any URI host prefixed with the www subdomain is considered to be equivalent to the same host without the prefix, and vice versa;
  • For HTTP and HTTPS URIs, any URI whose path ends with a trailing forward slash is considered to be equivalent to the same URI without the trailing slash, and vice versa;
  • For HTTP and HTTPS URIs, any URI with a fragment identifier is considered to be equivalent to the same URI without the fragment;
  • For HTTP and HTTPS URIs, any URI returned in the Link headers of an HTTP HEAD request is considered to be equivalent if it is typed with one of the following relation types: alternate, bookmark, canonical, cite-as, duplicate, identifier, latest-version, memento, predecessor-version, self, successor-version, working-copy-of. Note that this interpolation is unstable over time and thus requires the use of the X-Release: unstable header. See the page on HTTP headers for more information.

These assumptions do not hold in all cases, but they hold in most cases and the impact of false positives is minimal.

Data sources

Coming soon.