Zonal Matching
Meerkat DSA supports Zonal Matching which is a Mapping-Based Matching (defined in ITU Recommendation X.501 (2019), Section 13.6.2) that is defined in ITU Recommendation X.520 (2019), Section 8.8. Essentially, if a user-provided search filter does not yield any results by selecting for entries in a specific locality (e.g. city, township), Meerkat DSA can perform a geographically-intelligent replacement of the excessively restrictive search filter items to a match on "zones" (if requested by the user). In Meerkat DSA's case, postal codes are used as "zones."
This can be useful if, say, you are searching for a person that may not
technically live in C=US,ST=FL,L=Tampa
, but might live within the greater
outlying "Tampa metropolitan area." If requested, and if no entries under
C=US,ST=FL
had the locality name Tampa
(as specified in the search filter),
search filter item asserting Tampa
would be replaced with one or more equality
assertions of the postalCode
attribute, whose asserted values would be postal
codes within and surrounding Tampa.
Meerkat DSA's Zonal Matching Definition
There are no specific implementations of zonal matching defined in the X.500 specifications, and--as far as I know--none were ever defined anywhere. So Meerkat DSA had to define its own zonal matching. Below is its ASN.1 specification, followed by an explanation of how it works.
id-zmr-postalZonalMatch OBJECT IDENTIFIER ::= { 1 3 6 1 4 1 56490 58 1 }
postalZonalMatch ZONAL-MATCHING ::= {
SELECT BY {
id-at-countryName
| id-at-stateOrProvinceName
| id-at-localityName }
APPLICABLE TO { stateOrProvinceName | localityName }
SUBTYPES INCLUDED TRUE
COMBINABLE TRUE
USER CONTROL TRUE
EXCLUSIVE TRUE
MATCHING RULE zonalMatch.&id
ID id-zmr-postalZonalMatch
}
The above means that, if zonal matching is requested, it will be chosen if the
base object name and the search filter together can produce countryName
,
stateOrProvinceName
, and localityName
assertions. If any one of these is
missing, this zonal matching will not be used. Since this is the only zonal
matching implemented in Meerkat DSA, this also means that no zonal matching will
be used at all.
The above specification also indicates that localityName
and
stateOrProvinceName
will be replaced by the assertions this zonal mapping
produces in the search filter.
This implementation is combinable, meaning that multiple filter items can be used to produce a mapping. It is user-controlled, meaning that the user can specify different levels of breadth with which to expand the search area (e.g. 20 miles outside of the city or just 5). It is exclusive, which means that the user can specify that they want their search to return only the marginal set of results that were not present in the unrelaxed result set.
Meerkat DSA's Zonal Matching Algorithm
As stated earlier, when zonal matching is requested, and when the proper
attribute value assertions are present in the base object name and search
filter, the asserted values in the filter (not in the base object name) are
replaced with postalCode
assertions whose asserted values are the postal codes
associated with that locality. This corresponds to a zonal "area" of 0. These
postal codes are associated with one or more longitude-latitude points, which
are queried from the "gazetteer" and sorted.
If a greater area is specified, the "diameter" of the original area is obtained by defining a box whose lower bound is the southern-most point, whose upper bound is the northern-most point, whose left bound is the western-most point, and whose right bound is the eastern-most point. The diagonal area of this box is defined as the "diameter," for our purposes. ("Hypotenuse" would be a more technically correct term, but since zonal matching is conceptually thought of as outwardly-expanding concentric circles, and since it is "fuzzy" by nature, we do not have to nitpick these terms.)
Each subsequent "area" or "level" of the zonal relaxation adds or subtracts (whichever makes the box bigger) R / L to each bound of the box described above, where R is the "radius" of the level-0 area, and L is the level. (Note that this only applies above level-0, so that no division by zero happens.) In other words, at level 1, all four edges of the box expand outwards by R. At level 2, this is by R + R/2.
The rationale for this seemingly obtuse algorithm is that, as the "diameter" of a circle doubles, the area within that circle more than doubles. Likewise, if the expansion of the box proceeded at a constant length at each level, the area captured would increase "exponentially," and so would the entries evaluated in each marginal relaxation of the zonal match. Instead, we want an algorithm that is inclined to return a roughly constant number of entries at each expansion of the area. Dividing the added radius by the level at each level will result in the box expanding by a decreasing radius at each iteration, but the area added by each iteration will be less volatile.
The algorithm doesn't end here. Once the box for a given zonal level is
determined, all postal codes that have a single point within that box are
selected as the replacement postalCode
assertions for the localityName
and
stateOrProvinceName
assertions in the filter. It is a known and understood
drawback of this algorithm that this may result in a really jagged, irregular
search area.
Nuances
- None of the selected locale attributes in the distinguished name of the base
object are replaced. In other words, if your search the subtree under
C=US,ST=FL
, no amount of zonal relaxation will make your search cross the Floridian border into Georgia and return results underC=US,ST=GA
. - The algorithm never returns
multiple-mappings
, even if there are two separate real localities that have the same exact names within the country and state-or-province. In such a case, all of their postal codes will be considered as one, leading to some very strange results. - Your database is empty by default, so zonal matching will not work at all unless you seed it with postal codes and their geographic coordinates.
Seeding the Gazetteer (Zonal Mapping Database)
To populate the gazetteer (the internal database used for zonal matching),
you will need to seed the PostalCodesGazetteEntry
and
PostalCodeBoundaryPoints
. The former contains the country, state-or-province,
and locality name associated with each postal code. The latter contains one or
more geographic coordinates corresponding to points on the boundaries of the
postal code regions. These coordinates are composed of northing
and easting
components, which are meters (postive or negative) from the prime equator and
prime meridian, respectively.
Despite the name, you do not have to insert the coordinates of the boundaries of the postal code region in your gazeteer. You can just insert any point within the postal code region. This will have the effect of preventing the seemingly absurd scenario of a postal code from being included in the mapping just by having a single boundary point within the box described above. Boundary points were selected just because they are readily available online; something else, such as a "center of gravity" will have to be calculated from this data.
USA-only Gazetteer Seed
As part of developing Meerkat DSA, a USA-only dataset for the gazetteer was created, and made freely available. Because these files were too large to commit to Meerkat DSA's git repository, they are available instead as downloads from blob storage. There is no license included with them, but if it matters at all, I hereby release them under an MIT license. You can do whatever you want with this data. I won't sue.
First, download the files. Using the curl
command found on many unix-like
systems, you can run these commands from within the root of the cloned
Meerkat DSA repo:
mkdir -p data/zonal
curl https://wildboarprod.blob.core.windows.net/public-data/boundary.csv -o data/zonal/boundary.csv
curl https://wildboarprod.blob.core.windows.net/public-data/gazette.csv -o data/zonal/gazette.csv
If you have your DATABASE_URL
environment variable defined, you can then run
node ./tools/seed.mjs
(provided that you have NodeJS installed). This will
take a minute or two, but it will load up your database with the gazetteer data.
You may set your DATABASE_URL
in the root-level .env
file. Just make sure
not to commit it, since this file is not in .gitignore
!