Overview

Traditional ETL pipelines require extensive configuration and scripting. Data Pipeline flips this by letting you describe your desired data transformations in natural language. Tell it “merge these two CSVs on email, normalize phone numbers to E.164 format, and load into Postgres” — and it builds the pipeline.

Built on Semantic Kernel, the skill handles schema inference, intelligent column mapping (it knows “First Name” and “fname” are the same thing), data cleaning (removing duplicates, fixing formats), and error handling (quarantining bad records instead of crashing).

It supports batch and streaming modes, with built-in monitoring and data quality checks.

How It Works

Connect — Define source and destination data stores
Describe — Explain transformations in natural language
Generate — The skill creates an executable pipeline with validation
Execute — Pipeline runs with progress tracking and error handling
Monitor — Data quality metrics and pipeline health dashboards

Use Cases

Data migration — Move data between systems with schema translation
Data warehouse loading — Regular ETL jobs from operational databases
Data cleaning — Standardize formats, deduplicate, and validate data
API integration — Pull data from multiple APIs and combine
Report generation — Transform raw data into analysis-ready datasets

Getting Started

from data_pipeline import Pipeline

pipeline = Pipeline()
pipeline.extract("s3://bucket/raw-data.csv")
pipeline.transform("Normalize dates to ISO 8601, merge duplicate emails, add country from phone prefix")
pipeline.load("postgresql://localhost/warehouse", table="customers")
pipeline.run()

Example

Pipeline: CSV → Clean → Postgres

Extract: sales_2026.csv (14,523 rows)
Transform:
  ✅ Parsed dates (3 formats detected → ISO 8601)
  ✅ Normalized currencies (USD, EUR, GBP → USD)
  ✅ Removed 234 duplicate records
  ⚠️ Quarantined 12 rows (invalid email format)
Load: customers table (14,277 rows inserted)

Duration: 8.3s | Quality Score: 99.2%

Alternatives

dbt — SQL-based data transformation framework
Apache Airflow — Workflow orchestration for data pipelines
Fivetran — Automated data integration platform

Data Pipeline

Input / Output

Accepts

Produces

Overview

How It Works

Use Cases

Getting Started

Example

Alternatives

Tags

Compatible Agents

Datadog AI

Similar Skills

Langfuse

GenAI Toolbox

AgentOps