Function

Clean Your Data: Remove Duplicates Instantly

Smartly detect and clean duplicates from your dataset (CSV or Excel).

This function scans your data to find:

  • 🔁 Exact duplicates — identical rows or repeated entries.
  • 🤖 Fuzzy duplicates — similar rows with small differences
    (typos, spacing, casing, or minor text variations).

It automatically keeps the first valid occurrence of each duplicate
and exports everything neatly organized in a single downloadable ZIP.

📦 Inside the ZIP you’ll get:

  1. deduplicated_<name>.csv
    — your cleaned dataset (duplicates removed)
  2. duplicates_removed_<name>.csv
    — all duplicate rows that were dropped
  3. fuzzy_pairs_<name>.csv
    — pairs of rows that look alike (based on similarity)

Args:
file (FilePath): The uploaded CSV or Excel file to analyze.
subset (str): Optional — comma-separated list of column names to check.
If left empty, all columns are analyzed.
similarity_threshold (int): Optional — how strict fuzzy matching should be (0–100).
Higher = only very similar values are flagged.
Default = 90 (good balance).

Returns:
str: Generated ZIP archive containing the cleaned dataset
and detailed duplicate reports.

Run Function

Authentication Required
You are not connected. Sign in or create an account to run this function.
Run Cost
◆ 0.01 + ◆ 1/s

Reviews

Total Score

0.0

Based on 0 reviews
Sign in to rate this function.

Integration

Share URL
X logoWhatsApp logoLinkedIn logoFacebook logoTelegram logoReddit logo