Hello

Bonjour

स्वागत हे

Ciao

Olá

おい

Hallå

Guten tag

Hallo

Home

Work

Operational-Insight

Data Sanitization

Excel VBA Form

Clubfoot Orthosis

PyautoPDF

Autonom SAEINDIA

Saarthee.ai

JPMorgan

JCF

About

Contact

Success

Archive

Error

Styleguide

Next case

Menu

Data-Sanitization

Role / Services
  • Development & Solutions

  • Tools & Technology
  • Python

  • Year
  • 2024

  • Data Sanitization Process

    Initial Records

    5,003

    Final Records

    4,915

    Columns

    14

    1 Issues Identification

    Column Names

    Spaces, special characters, and inconsistent formatting identified

    Missing Values

    Found in priority, assigneeName, alert_severity columns

    Duplicates

    Multiple duplicate entries identified

    Format Issues

    Case sensitivity and irregular whitespace found

    2 Cleaning Steps

    Column Name Cleaning

    → Standardized names to lowercase, removed spaces, replaced special characters

    Missing Values Treatment
    • → priority → "unknown"
    • → assigneename → "unassigned"
    • → organization_id → "unknown"
    • → Dropped customer_first_response_utc (>90% missing)
    Format Standardization
    • → Converted dates to datetime format
    • → Standardized categorical values
    • → Removed unnecessary whitespace
    • → Replaced medium_ with medium in alert_severity

    3 Final Summary

    Final Dataset Statistics
    • → 4,915 rows (88 records removed)
    • → 14 columns (after cleaning)
    • → All missing values addressed
    • → Consistent formatting across columns
    • → Spam and test records removed