Data Quality Analyst


Job Title: Data Tester / Data Quality Engineer (Databricks + Oracle) – Modernization Program

Location : Louisville, KY (100% Permanent remote work accepted from anywhere in US; however, we’d prefer EST and CST based resources)

Duration : 12+ Months Contract

Teams Meeting Interview

Job Description:

  • Typically operates with little supervision
  • Owns test design and execution for assigned data domains
  • Collaborates closely with data engineers, analysts, and product owners
  • Contributes to automation and quality standards, but is not expected to define enterprise-wide strategy alone

We are seeking a Data Tester / Data Quality Engineer to support a data modernization initiative, validating data pipelines and reconciling legacy (Oracle) and modern (Databricks/Lakehouse) environments. This role will be responsible for designing and executing data validation tests, ensuring data completeness, accuracy, consistency, and timeliness as datasets migrate and pipelines are rebuilt or replatformed.

You’ll work in an Agile delivery model with cross-functional teams to validate ETL/ELT transformations, ensure correct business rule implementation, and build repeatable, automated validation frameworks that scale across domains.

Key Responsibilities:

Data Testing & Validation

  • Design and execute data test strategies, test plans, and test cases for ingestion, transformation, and curated layers.
  • Validate data at rest and in motion across Oracle source systems and Databricks target platforms.

Perform source-to-target reconciliation, including:

  • record counts, checksum/hashing, aggregates, sampling
  • null/constraint checks, referential integrity, duplicates
  • transformation logic validation (business rules, SCD logic, dedup, enrichment)
  • Validate incremental loads, CDC patterns, and rerun/recovery scenarios.

Automation & Frameworks

  • Build and maintain automated data quality checks using SQL and/or Python (e.g., PySpark).
  • Develop reusable data validation utilities and parameterized scripts to reduce manual effort.
  • Integrate data tests into CI/CD pipelines where applicable (e.g., Azure DevOps, GitHub, Jenkins).

Defect Management & Collaboration

  • Log, triage, and manage defects with clear reproduction steps and root-cause hints.
  • Collaborate with data engineers to troubleshoot pipeline failures and data anomalies.
  • Partner with business stakeholders/analysts to confirm expected outcomes and acceptance criteria.

Documentation & Governance Support

  • Document test evidence, reconciliation results, and sign-off artifacts for releases.
  • Support data governance objectives (quality KPIs, issue tracking, lineage/metadata readiness).

Required Skills & Qualifications

Experience:

  • 5 or more years of hands-on experience in data testing / QA / data quality engineering in data warehouse, lake, or analytics modernization initiatives.
  • Demonstrated experience validating data pipelines involving Oracle (source or warehouse) and Databricks (target lakehouse).

Technical Skills (Must Have)

Oracle SQL: complex queries, joins, aggregations, performance-aware validation queries.

Databricks: Experienced with Spark/Databricks SQL. Familiarity with Delta Lake concepts (tables, merges/upserts, partitions, time travel helpful)

Data validation techniques:

  • reconciliations, profiling, anomaly detection basics
  • test data creation and boundary testing for transformations
  • Python (preferred) and/or PySpark for automation and scalable validations.
  • Strong understanding of ETL/ELT concepts, data warehousing fundamentals, and dimensional modeling basics.