Data Pipeline uses Semantic Kernel to build and manage ETL (Extract, Transform, Load) pipelines with AI-powered schema mapping, data cleaning, and transformation. Instead of writing transformation scripts, describe what you want in natural language and the skill generates and executes the pipeline.
Traditional ETL pipelines require extensive configuration and scripting. Data Pipeline flips this by letting you describe your desired data transformations in natural language. Tell it “merge these two CSVs on email, normalize phone numbers to E.164 format, and load into Postgres” — and it builds the pipeline.
Built on Semantic Kernel, the skill handles schema inference, intelligent column mapping (it knows “First Name” and “fname” are the same thing), data cleaning (removing duplicates, fixing formats), and error handling (quarantining bad records instead of crashing).
It supports batch and streaming modes, with built-in monitoring and data quality checks.
from data_pipeline import Pipeline
pipeline = Pipeline()
pipeline.extract("s3://bucket/raw-data.csv")
pipeline.transform("Normalize dates to ISO 8601, merge duplicate emails, add country from phone prefix")
pipeline.load("postgresql://localhost/warehouse", table="customers")
pipeline.run()
Pipeline: CSV → Clean → Postgres
Extract: sales_2026.csv (14,523 rows)
Transform:
✅ Parsed dates (3 formats detected → ISO 8601)
✅ Normalized currencies (USD, EUR, GBP → USD)
✅ Removed 234 duplicate records
⚠️ Quarantined 12 rows (invalid email format)
Load: customers table (14,277 rows inserted)
Duration: 8.3s | Quality Score: 99.2%
AI agents that work well with Data Pipeline.
Open-source LLM observability — trace, evaluate, and debug your AI agent workflows in production.
Google's MCP toolbox for databases — connect AI agents to PostgreSQL, MySQL, BigQuery, Spanner, and more.
Agent monitoring, cost tracking, and evaluation — observability built specifically for AI agents.