Senior Data Engineer
Chennai, Tamil Nadu, India · Full Time
Be the first to apply
- Experience
- Any
- Salary
- INR 900,000 – INR 1,200,000 / year
- Openings
- 1
- Posted
- 9 hours ago
- Work mode
- In office
- Education
- Any graduate
- Eligibility
- Any graduate can apply. Candidates with degrees in Computer Science, Engineering, or related disciplines, or those with equivalent practical experience, are eligible.
- Resume
- Required to apply
Where you'll work
Job description
About the Company
Merit Data and Technology is a London-headquartered AI-led technology company with engineering hubs in Chennai, Mumbai, and London. The business focuses on collecting, enriching, and engineering data, and it supports trusted B2B brands through proprietary data management systems and data solutions. Its work spans resilient, scalable cloud and on-premise products, from straightforward web apps to large enterprise-grade data systems.
Role Overview
This position is for a Senior Data Engineer focused on scraping and large-scale data harvesting. The role is responsible for building dependable pipelines that gather, parse, refine, and deliver high-quality information from web sources and APIs. The engineer will work with modern scraping tools, manage anti-bot restrictions, process data at scale, and coordinate end-to-end workflows alongside the DataHarvest team.
Key Responsibilities
The selected candidate will design and support scalable scraping and data-harvesting pipelines, develop scrapers using Python-based tools, handle JavaScript-heavy sites, and work around anti-bot controls such as proxy rotation, user-agent rotation, rate limiting, and CAPTCHA challenges. The role also includes ETL development, large-scale data processing, workflow orchestration, storage management across SQL and NoSQL systems, and ensuring robust monitoring, logging, retries, and error handling. Compliance with robots.txt, site terms, and privacy rules is also part of the role, along with collaboration with technical and downstream stakeholders to meet quality and delivery expectations.
Technical Scope
- Build and maintain web scraping and harvesting workflows that can operate reliably at scale.
- Use Python scraping libraries and browser automation tools to extract data from static and dynamic web sources.
- Work with REST and GraphQL APIs, including reverse-engineering internal endpoints when required.
- Process and transform structured and semi-structured data using ETL methods and distributed computing.
- Schedule and orchestrate pipelines using tools such as Apache Airflow, Dagster, Prefect, or similar systems.
- Store, manage, and move data across PostgreSQL, MySQL, MongoDB, and standard file formats such as CSV, JSON, XML, and Parquet.
- Set up monitoring, alerting, retries, and fail-safes to keep pipelines stable and recoverable.
- Follow legal, policy, and data privacy requirements while collecting information from external sources.
Requirements
- Strong Python development skills; Node.js or JavaScript knowledge is an added advantage.
- Practical experience with scraping stacks such as Scrapy, BeautifulSoup, lxml, requests/httpx, Selenium, Playwright, or Puppeteer.
- Good command of web technologies including HTML, CSS, DOM structure, XPath, CSS selectors, and HTTP concepts such as headers, cookies, sessions, and status codes.
- Experience with JSON, XML, HTML parsing, ETL workflows, and common data formats.
- Hands-on exposure to PySpark or similar distributed data-processing approaches.
- Familiarity with orchestration platforms such as Apache Airflow, with Dagster, Prefect, or Luigi as a plus.
- Working knowledge of SQL and NoSQL databases.
- Understanding of concurrency, asynchronous programming, and distributed scraping for high-volume workloads.
- Comfort with Git, Docker, cloud environments, and pipeline monitoring/alerting practices.
- Awareness of compliance and legal considerations related to scraping and data collection.
- Education in Computer Science, Engineering, or a related discipline, or equivalent hands-on experience.
Eligibility
Any graduate may apply. Candidates with a bachelor’s or master’s degree in Computer Science, Engineering, or a related area are preferred, though equivalent practical experience is also acceptable.
Compensation
The salary range offered for this role is INR 9,00,000 to INR 12,00,000 per year.
Additional Information
This position is based in Chennai, India. No vacancy count, joining date, or notice-period details were specified in the source information.