Package: zoomerjoin 0.2.2

zoomerjoin: Superlatively Fast Fuzzy Joins

Empowers users to fuzzily-merge data frames with millions or tens of millions of rows in minutes with low memory usage. The package uses the locality sensitive hashing algorithms developed by Datar, Immorlica, Indyk and Mirrokni (2004) <doi:10.1145/997817.997857>, and Broder (1998) <doi:10.1109/SEQUEN.1997.666900> to avoid having to compare every pair of records in each dataset, resulting in fuzzy-merges that finish in linear time.

Authors:Beniamino Green [aut, cre, cph], Etienne Bacher [ctb], The authors of the dependency Rust crates [ctb, cph]

zoomerjoin_0.2.2.tar.gz
zoomerjoin_0.2.2.zip(r-4.7)zoomerjoin_0.2.2.zip(r-4.6)zoomerjoin_0.2.2.zip(r-4.5)
zoomerjoin_0.2.2.tgz(r-4.6-x86_64)zoomerjoin_0.2.2.tgz(r-4.6-arm64)zoomerjoin_0.2.2.tgz(r-4.5-x86_64)zoomerjoin_0.2.2.tgz(r-4.5-arm64)
zoomerjoin_0.2.2.tar.gz(r-4.7-arm64)zoomerjoin_0.2.2.tar.gz(r-4.7-x86_64)zoomerjoin_0.2.2.tar.gz(r-4.6-arm64)zoomerjoin_0.2.2.tar.gz(r-4.6-x86_64)
zoomerjoin_0.2.2.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
zoomerjoin/json (API)

# Install 'zoomerjoin' in R:
install.packages('zoomerjoin', repos = c('https://beniaminogreen.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/beniaminogreen/zoomerjoin/issues

Datasets:

On CRAN:

Conda:

blazinglyfastfuzzyjoinjoinrustzoomercargo

8.37 score 109 stars 2 packages 15 scripts 587 downloads 25 exports 22 dependencies

Last updated from:d441bcdea9. Checks:9 OK, 4 NOTE. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK196
linux-devel-x86_64OK196
source / vignettesOK278
linux-release-arm64OK235
linux-release-x86_64OK203
macos-release-arm64NOTE144
macos-release-x86_64NOTE327
macos-oldrel-arm64OK143
macos-oldrel-x86_64OK354
windows-develNOTE208
windows-releaseNOTE193
windows-oldrelOK196
wasm-releaseOK163

Exports:em_linkeuclidean_anti_joineuclidean_full_joineuclidean_inner_joineuclidean_left_joineuclidean_probabilityeuclidean_right_joinfuzzy_join_corehamming_anti_joinhamming_distancehamming_full_joinhamming_inner_joinhamming_left_joinhamming_probabilityhamming_right_joinjaccard_anti_joinjaccard_curvejaccard_full_joinjaccard_hyper_grid_searchjaccard_inner_joinjaccard_left_joinjaccard_probabilityjaccard_right_joinjaccard_similarityjaccard_string_group

Dependencies:clicollapsecpp11dplyrgenericsgluelifecyclemagrittrpillarpkgconfigpurrrR6Rcpprlangstringistringrtibbletidyrtidyselectutf8vctrswithr

A Zoomerjoin Guided Tour
Introduction: | How Does it Work? | Basic Syntax: | Standardizing String Names After A Merge | References:

Last update: 2026-03-13
Started: 2023-03-09

Benchmarks
Introduction | Benchmarking Code:

Last update: 2024-09-23
Started: 2023-03-09

Matching Vectors Based on Euclidean Distance
Introduction | Demonstration

Last update: 2024-02-14
Started: 2023-08-06