Package: zoomerjoin 0.2.0

zoomerjoin: Superlatively Fast Fuzzy Joins

Empowers users to fuzzily-merge data frames with millions or tens of millions of rows in minutes with low memory usage. The package uses the locality sensitive hashing algorithms developed by Datar, Immorlica, Indyk and Mirrokni (2004) <doi:10.1145/997817.997857>, and Broder (1998) <doi:10.1109/SEQUEN.1997.666900> to avoid having to compare every pair of records in each dataset, resulting in fuzzy-merges that finish in linear time.

Authors:Beniamino Green [aut, cre, cph], Etienne Bacher [ctb], The authors of the dependency Rust crates [ctb, cph]

zoomerjoin_0.2.0.tar.gz
zoomerjoin_0.2.0.zip(r-4.5)zoomerjoin_0.2.0.zip(r-4.4)zoomerjoin_0.2.0.zip(r-4.3)
zoomerjoin_0.2.0.tgz(r-4.5-arm64)zoomerjoin_0.2.0.tgz(r-4.4-x86_64)zoomerjoin_0.2.0.tgz(r-4.4-arm64)zoomerjoin_0.2.0.tgz(r-4.3-arm64)
zoomerjoin_0.2.0.tar.gz(r-4.5-noble)zoomerjoin_0.2.0.tar.gz(r-4.4-noble)
zoomerjoin.pdf |zoomerjoin.html
zoomerjoin/json (API)
NEWS

# Install 'zoomerjoin' in R:
install.packages('zoomerjoin', repos = c('https://beniaminogreen.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/beniaminogreen/zoomerjoin/issues

Pkgdown site:https://beniamino.org

Datasets:

On CRAN:

blazinglyfastfuzzyjoinjoinrustzoomercargo

7.55 score 103 stars 11 scripts 191 downloads 25 exports 23 dependencies

Last updated 24 days agofrom:8ae8556b3e. Checks:9 OK. Indexed: yes.

TargetResultLatest binary
Doc / VignettesOKJan 27 2025
R-4.5-win-x86_64OKJan 27 2025
R-4.5-mac-aarch64OKJan 27 2025
R-4.5-linux-x86_64OKJan 27 2025
R-4.4-win-x86_64OKJan 27 2025
R-4.4-mac-x86_64OKJan 27 2025
R-4.4-mac-aarch64OKJan 27 2025
R-4.3-win-x86_64OKJan 27 2025
R-4.3-mac-aarch64OKJan 27 2025

Exports:em_linkeuclidean_anti_joineuclidean_full_joineuclidean_inner_joineuclidean_left_joineuclidean_probabilityeuclidean_right_joinfuzzy_join_corehamming_anti_joinhamming_distancehamming_full_joinhamming_inner_joinhamming_left_joinhamming_probabilityhamming_right_joinjaccard_anti_joinjaccard_curvejaccard_full_joinjaccard_hyper_grid_searchjaccard_inner_joinjaccard_left_joinjaccard_probabilityjaccard_right_joinjaccard_similarityjaccard_string_group

Dependencies:clicollapsecpp11dplyrfansigenericsgluelifecyclemagrittrpillarpkgconfigpurrrR6Rcpprlangstringistringrtibbletidyrtidyselectutf8vctrswithr

A Zoomerjoin Guided Tour

Rendered fromguided_tour.Rmdusingknitr::rmarkdownon Jan 27 2025.

Last update: 2024-02-14
Started: 2023-03-09

Benchmarks

Rendered frombenchmarks.Rmdusingknitr::rmarkdownon Jan 27 2025.

Last update: 2024-09-23
Started: 2023-03-09

Matching Vectors Based on Euclidean Distance

Rendered frommatching_vectors.Rmdusingknitr::rmarkdownon Jan 27 2025.

Last update: 2024-02-14
Started: 2023-08-06