Job Title: Data Tester / Data Quality Engineer (Databricks + Oracle) – Modernization Program

Location : Louisville, KY (100% Permanent remote work accepted from anywhere in US; however, we’d prefer EST and CST based resources)

Duration : 12+ Months Contract

Teams Meeting Interview

Job Description:

Typically operates with little supervision
Owns test design and execution for assigned data domains
Collaborates closely with data engineers, analysts, and product owners
Contributes to automation and quality standards, but is not expected to define enterprise-wide strategy alone

We are seeking a Data Tester / Data Quality Engineer to support a data modernization initiative, validating data pipelines and reconciling legacy (Oracle) and modern (Databricks/Lakehouse) environments. This role will be responsible for designing and executing data validation tests, ensuring data completeness, accuracy, consistency, and timeliness as datasets migrate and pipelines are rebuilt or replatformed.

You’ll work in an Agile delivery model with cross-functional teams to validate ETL/ELT transformations, ensure correct business rule implementation, and build repeatable, automated validation frameworks that scale across domains.

Key Responsibilities:

Data Testing & Validation

Design and execute data test strategies, test plans, and test cases for ingestion, transformation, and curated layers.
Validate data at rest and in motion across Oracle source systems and Databricks target platforms.

Perform source-to-target reconciliation, including:

record counts, checksum/hashing, aggregates, sampling
null/constraint checks, referential integrity, duplicates
transformation logic validation (business rules, SCD logic, dedup, enrichment)
Validate incremental loads, CDC patterns, and rerun/recovery scenarios.

Automation & Frameworks

Build and maintain automated data quality checks using SQL and/or Python (e.g., PySpark).
Develop reusable data validation utilities and parameterized scripts to reduce manual effort.
Integrate data tests into CI/CD pipelines where applicable (e.g., Azure DevOps, GitHub, Jenkins).

Defect Management & Collaboration

Log, triage, and manage defects with clear reproduction steps and root-cause hints.
Collaborate with data engineers to troubleshoot pipeline failures and data anomalies.
Partner with business stakeholders/analysts to confirm expected outcomes and acceptance criteria.

Documentation & Governance Support

Document test evidence, reconciliation results, and sign-off artifacts for releases.
Support data governance objectives (quality KPIs, issue tracking, lineage/metadata readiness).

Required Skills & Qualifications

Experience:

5 or more years of hands-on experience in data testing / QA / data quality engineering in data warehouse, lake, or analytics modernization initiatives.
Demonstrated experience validating data pipelines involving Oracle (source or warehouse) and Databricks (target lakehouse).

Technical Skills (Must Have)

Oracle SQL: complex queries, joins, aggregations, performance-aware validation queries.

Databricks: Experienced with Spark/Databricks SQL. Familiarity with Delta Lake concepts (tables, merges/upserts, partitions, time travel helpful)

Data validation techniques:

reconciliations, profiling, anomaly detection basics
test data creation and boundary testing for transformations
Python (preferred) and/or PySpark for automation and scalable validations.
Strong understanding of ETL/ELT concepts, data warehousing fundamentals, and dimensional modeling basics.

Data Quality Analyst