Package: zoomerjoin 0.2.0

zoomerjoin: Superlatively Fast Fuzzy Joins

Empowers users to fuzzily-merge data frames with millions or tens of millions of rows in minutes with low memory usage. The package uses the locality sensitive hashing algorithms developed by Datar, Immorlica, Indyk and Mirrokni (2004) <doi:10.1145/997817.997857>, and Broder (1998) <doi:10.1109/SEQUEN.1997.666900> to avoid having to compare every pair of records in each dataset, resulting in fuzzy-merges that finish in linear time.

Authors:Beniamino Green [aut, cre, cph], Etienne Bacher [ctb], The authors of the dependency Rust crates [ctb, cph]

zoomerjoin_0.2.0.tar.gz
zoomerjoin_0.2.0.zip(r-4.5)zoomerjoin_0.2.0.zip(r-4.4)zoomerjoin_0.2.0.zip(r-4.3)
zoomerjoin_0.2.0.tgz(r-4.5-x86_64)zoomerjoin_0.2.0.tgz(r-4.5-arm64)zoomerjoin_0.2.0.tgz(r-4.4-x86_64)zoomerjoin_0.2.0.tgz(r-4.4-arm64)zoomerjoin_0.2.0.tgz(r-4.3-arm64)
zoomerjoin_0.2.0.tar.gz(r-4.5-noble)zoomerjoin_0.2.0.tar.gz(r-4.4-noble)
zoomerjoin.pdf |zoomerjoin.html
zoomerjoin/json (API)
NEWS

# Install 'zoomerjoin' in R:
install.packages('zoomerjoin', repos = c('https://beniaminogreen.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/beniaminogreen/zoomerjoin/issues

Pkgdown site:https://beniamino.org

Datasets:

On CRAN:

Conda:

blazinglyfastfuzzyjoinjoinrustzoomercargo

7.31 score 102 stars 11 scripts 159 downloads 25 exports 23 dependencies

Last updated 2 months agofrom:8ae8556b3e. Checks:8 OK, 3 NOTE. Indexed: yes.

TargetResultLatest binary
Doc / VignettesOKMar 28 2025
R-4.5-win-x86_64NOTEMar 28 2025
R-4.5-mac-x86_64NOTEMar 28 2025
R-4.5-mac-aarch64NOTEMar 28 2025
R-4.5-linux-x86_64OKMar 28 2025
R-4.4-win-x86_64OKMar 28 2025
R-4.4-mac-x86_64OKMar 28 2025
R-4.4-mac-aarch64OKMar 28 2025
R-4.4-linux-x86_64OKMar 28 2025
R-4.3-win-x86_64OKMar 28 2025
R-4.3-mac-aarch64OKMar 28 2025

Exports:em_linkeuclidean_anti_joineuclidean_full_joineuclidean_inner_joineuclidean_left_joineuclidean_probabilityeuclidean_right_joinfuzzy_join_corehamming_anti_joinhamming_distancehamming_full_joinhamming_inner_joinhamming_left_joinhamming_probabilityhamming_right_joinjaccard_anti_joinjaccard_curvejaccard_full_joinjaccard_hyper_grid_searchjaccard_inner_joinjaccard_left_joinjaccard_probabilityjaccard_right_joinjaccard_similarityjaccard_string_group

Dependencies:clicollapsecpp11dplyrfansigenericsgluelifecyclemagrittrpillarpkgconfigpurrrR6Rcpprlangstringistringrtibbletidyrtidyselectutf8vctrswithr

A Zoomerjoin Guided Tour

Rendered fromguided_tour.Rmdusingknitr::rmarkdownon Mar 28 2025.

Last update: 2024-02-14
Started: 2023-03-09

Benchmarks

Rendered frombenchmarks.Rmdusingknitr::rmarkdownon Mar 28 2025.

Last update: 2024-09-23
Started: 2023-03-09

Matching Vectors Based on Euclidean Distance

Rendered frommatching_vectors.Rmdusingknitr::rmarkdownon Mar 28 2025.

Last update: 2024-02-14
Started: 2023-08-06