AutoWrangle: Automated Data Pre-processing Pipeline
/ 1 min read
Table of Contents
Project Links
Project Overview
Developed AutoWrangle, an end-to-end automated pipeline designed to handle large-scale data cleaning and feature engineering. [The system reduces manual pre-processing time by automating outlier detection, missing value imputation, and database schema alignment[cite: 10, 24].
Key Insights & Impact
Automated Data Cleaning: Implemented robust Python scripts using Pandas and Scikit-Learn to automate the detection and handling of null values and outliers. Feature Engineering: Designed custom transformers to generate high-value features, improving model training readiness. SQL Optimization: Integrated automated SQL query optimization to ensure cleaned data is efficiently indexed and stored in relational databases. Scalable Architecture: Built using a modular approach, allowing the pipeline to integrate seamlessly with cloud databases like Supabase or SAP Datasphere.