Package: zoomerjoin 0.2.2

zoomerjoin: Superlatively Fast Fuzzy Joins

Empowers users to fuzzily-merge data frames with millions or tens of millions of rows in minutes with low memory usage. The package uses the locality sensitive hashing algorithms developed by Datar, Immorlica, Indyk and Mirrokni (2004) <doi:10.1145/997817.997857>, and Broder (1998) <doi:10.1109/SEQUEN.1997.666900> to avoid having to compare every pair of records in each dataset, resulting in fuzzy-merges that finish in linear time.

Authors:Beniamino Green [aut, cre, cph], Etienne Bacher [ctb], The authors of the dependency Rust crates [ctb, cph]

zoomerjoin_0.2.2.tar.gz
zoomerjoin_0.2.2.zip(r-4.7)zoomerjoin_0.2.2.zip(r-4.6)zoomerjoin_0.2.2.zip(r-4.5)
zoomerjoin_0.2.2.tgz(r-4.6-x86_64)zoomerjoin_0.2.2.tgz(r-4.6-arm64)zoomerjoin_0.2.2.tgz(r-4.5-x86_64)zoomerjoin_0.2.2.tgz(r-4.5-arm64)
zoomerjoin_0.2.2.tar.gz(r-4.7-arm64)zoomerjoin_0.2.2.tar.gz(r-4.7-x86_64)zoomerjoin_0.2.2.tar.gz(r-4.6-arm64)zoomerjoin_0.2.2.tar.gz(r-4.6-x86_64)
zoomerjoin_0.2.2.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
zoomerjoin/json (API)
NEWS

# Install 'zoomerjoin' in R:
install.packages('zoomerjoin', repos = c('https://beniaminogreen.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/beniaminogreen/zoomerjoin/issues

Datasets:

On CRAN:

Conda:

blazinglyfastfuzzyjoinjoinrustzoomercargo

8.27 score 108 stars 2 packages 12 scripts 627 downloads 25 exports 22 dependencies

Last updated from:d441bcdea9. Checks:9 OK, 4 NOTE. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK189
linux-devel-x86_64OK195
source / vignettesOK291
linux-release-arm64OK183
linux-release-x86_64OK201
macos-release-arm64NOTE191
macos-release-x86_64NOTE342
macos-oldrel-arm64OK225
macos-oldrel-x86_64OK449
windows-develNOTE266
windows-releaseNOTE276
windows-oldrelOK196
wasm-releaseOK173

Exports:em_linkeuclidean_anti_joineuclidean_full_joineuclidean_inner_joineuclidean_left_joineuclidean_probabilityeuclidean_right_joinfuzzy_join_corehamming_anti_joinhamming_distancehamming_full_joinhamming_inner_joinhamming_left_joinhamming_probabilityhamming_right_joinjaccard_anti_joinjaccard_curvejaccard_full_joinjaccard_hyper_grid_searchjaccard_inner_joinjaccard_left_joinjaccard_probabilityjaccard_right_joinjaccard_similarityjaccard_string_group

Dependencies:clicollapsecpp11dplyrgenericsgluelifecyclemagrittrpillarpkgconfigpurrrR6Rcpprlangstringistringrtibbletidyrtidyselectutf8vctrswithr

A Zoomerjoin Guided Tour

Rendered fromguided_tour.Rmdusingknitr::rmarkdownon May 17 2026.

Last update: 2026-03-13
Started: 2023-03-09

Benchmarks

Rendered frombenchmarks.Rmdusingknitr::rmarkdownon May 17 2026.

Last update: 2024-09-23
Started: 2023-03-09

Matching Vectors Based on Euclidean Distance

Rendered frommatching_vectors.Rmdusingknitr::rmarkdownon May 17 2026.

Last update: 2024-02-14
Started: 2023-08-06