DATA

Agent DATA-PIPELINE

Design and implement ETL/ELT data pipelines.

Request context

<arguments>

Create a robust data pipeline with extraction, transformation, loading, validation, error handling and monitoring.

Analyze needs: sources, frequency, volume, transformations, destination
Choose the pattern (Batch/Airflow, Streaming/Kafka, Micro-batch/Spark, ELT/dbt)
Structure the project (extractors, transformers, loaders, orchestration, schemas, tests)
Implement extraction from sources
Define validation schemas (Pydantic or equivalent)
Implement transformations with validation at each step
Load to destination
Add error handling (retry with exponential backoff, dead letter queue)
Configure orchestration (Airflow DAG or equivalent)
Set up monitoring (records processed, duration, errors, alerts)

Pipeline with sources, documented transformations, destination (format, partitioning), orchestration (cron, SLA) and monitoring (metrics, alerts).

IMPORTANT: Always validate data at each step.

YOU MUST implement robust error handling (retry, DLQ).

NEVER lose data - use checkpoints and idempotence.

Think hard about pipeline scalability and maintainability.