Site Reliability Engineer


Job Title: Platform Engineer/Site Reliability Engineer/ Observability Engineering

Experience: 5+ years

Work Mode: Remote

________________________________________

Job Description:

We are looking for a Data Engineer with strong experience in Python automation, observability (Prometheus), MongoDB, ETL workflows, and cloud data platforms. The ideal candidate should have hands-on skills in monitoring/logging, data engineering pipelines, and performance optimization across databases and cloud environments.


Key Responsibilities:

•         Develop and maintain data pipelines, ETL processes, and data workflows using SQL, Python, and Unix shell scripting.

•         Manage and optimize MongoDB including schema design, indexing, aggregation, and performance tuning.

•         Design and implement observability systems using Prometheus for metrics, logs, events, and traces.

•         Lead migration and validation of dashboards across observability/monitoring platforms.

•         Build automation scripts for monitoring, log retrieval, and system analysis.

•         Work with cloud platforms (Azure/Snowflake/Databricks) for data movement, storage, and scaling.

•         Collaborate with cross-functional engineering teams to ensure data reliability and system visibility.


Requirements:

•         3–5+ years of experience in Data Engineering or Observability Engineering.

•         Strong proficiency in Python scripting, automation workflows, and Unix/Linux systems.

•         Hands-on experience with Prometheus or similar monitoring tools.

•         Strong SQL experience and understanding of data warehousing concepts.

•         Experience with MongoDB (architecture, aggregation framework, performance optimization).

•         Understanding of cloud platforms such as Azure, Snowflake, or Databricks.

•         Good analytical, debugging, and communication skills.