Commercial
Senior Data Engineer — Aggregator Integration Platform
- Location
- Remote, US hours
- Type
- 6-month contract
Six-month build for an education-sector client: a production-grade student data integration platform. Architect the connector framework, canonical schema, and operational infrastructure from the ground up. Strong Scala required.
About the engagement
Six-month build for an education-sector client: a production-grade student data integration platform that ingests rostering data from a third-party aggregator (and, by design, future aggregators) into a source-agnostic canonical schema, then serves it to downstream services through both a real-time API and a batch delivery channel.
You will not be slotted into an existing pipeline factory. You will help architect the connector framework, the canonical schema, and the operational infrastructure from the ground up. The first concrete output is a connector for the initial aggregator, scoped to clients running on a specific upstream SIS. The framework you build must extend to additional aggregators with no changes to the canonical schema or the consumer-facing API.
What you will build
- A production-grade aggregator connector covering every entity in scope (Districts, Schools, Users, Sections, Courses, Terms, Contacts, Enrollments) against the aggregator's data, events, and sync APIs, including OAuth setup, district token provisioning, sandbox flows, pagination, and rate-limit handling.
- Initial batch sync, cursor-based delta sync via event polling, and an on-demand resync path for school-year rollover.
- An abstract AggregatorConnector base class with the first concrete implementation, validated for extensibility through a spike against a second aggregator standard.
- A source-agnostic canonical schema and field-level mapping layer (people, contacts, relationships, locations, classes, courses, terms, rosters, groups, group memberships), with identity resolution keyed on (client_id, source_vendor, source_id) and a minted UUID as the canonical ID.
- Read-only REST endpoints over the canonical store with cursor pagination and updated-since filters.
- Batch delivery via message-queue subscriptions and optional object-storage exports under a single versioned envelope.
- A four-stage validation gate (connectivity, structural, semantic, referential), a quarantine system with inspector access, and replay capability across any historical batch.
- Observability across per-client and per-entity batch success rates, delta lag, delivery lag, and dead-letter counts.
Responsibilities
- Own the connector architecture and canonical schema, including versioning, governance, and field-level mapping documentation.
- Design and implement event ordering, previous-attributes diffing, and source-side quirks (for example, the aggregator surfacing roster changes as section updates rather than user updates).
- Build and operate batch + streaming ingestion against third-party APIs at production scale.
- Partner with the client's product and engineering teams on canonical mapping sign-off, identity provider integration, and initial cohort client onboarding.
- Stand up cloud infrastructure for the canonical store, message queue, validation pipelines, and observability surface.
- Treat data quality, lineage, replay, and dead-letter handling as first-class operational primitives, not afterthoughts.
- Validate framework extensibility by spiking a second aggregator without modifying the core.
Required experience
- 5 to 8+ years in data engineering, with at least one production-grade integration platform delivered end to end.
- Strong Scala. Scala is primary with some Python on this project.
- Direct experience building connectors against third-party REST APIs, including OAuth flows, pagination, cursor-based event consumption, and rate-limit strategy.
- Strong data modeling chops (Kimball, Data Vault 2.0, or equivalent), with a track record of designing source-agnostic canonical schemas that survive multiple upstream sources.
- Hands-on with cloud data infrastructure (GCP, AWS, or Azure), including warehousing, managed messaging, object storage, and IAM.
- Modern orchestration experience (Airflow, Dagster, or Prefect).
- Comfortable with distributed systems design: idempotency, exactly-once vs at-least-once trade-offs, replay, dead-lettering, backpressure.
- Experience defining and enforcing data contracts, validation gates, and quarantine flows in production.
Bonus
- EdTech, SIS, or student rostering experience.
- Experience building abstract integration frameworks where adding a new source is configuration plus a thin adapter, not a rewrite.
- Experience integrating with enterprise identity providers for application authentication.
- Track record co-owning canonical schema sign-off conversations with non-engineering stakeholders.
Engagement details
- Six-month engagement, full-time contract.
- Remote, with overlap in US business hours.
- Reports into the project lead; collaborates directly with the client's product, engineering, and initial cohort teams.
Apply
Apply for Senior Data Engineer — Aggregator Integration Platform
Send us your resume and a short note. Applications go directly to the partners. We read everything and respond.