Job Title: Python Developer with PySpark
Location: Northompton
Job Type: Contract
About the Role:
We are seeking a skilled Python Developer with expertise in PySpark to join our dynamic team. The ideal candidate will have strong experience in building and optimizing large-scale data processing pipelines and a deep understanding of distributed data systems. You will play a key role in designing and implementing data solutions that drive critical business decisions.
Key Responsibilities:
* Develop, optimize, and maintain large-scale data pipelines using PySpark and Python.
* Collaborate with data engineers, analysts, and stakeholders to gather requirements and implement data solutions.
* Perform ETL (Extract, Transform, Load) processes on large datasets and ensure efficient data workflows.
* Analyze and debug data processing issues to ensure accuracy and reliability of pipelines.
* Work with distributed computing frameworks to handle large datasets efficiently.
* Develop reusable components, libraries, and frameworks for data processing.
* Optimize PySpark jobs for performance and scalability.
* Integrate data pipelines with cloud platforms like AWS, Azure, or Google Cloud (if applicable).
* Monitor and troubleshoot production data pipelines to minimize downtime and data issues.
Key Skills and Qualifications:
Technical Skills:
* Strong programming skills in Python with hands-on experience in PySpark.
* Experience with distributed data processing frameworks (e.g., Spark).
* Proficiency in SQL for querying and transforming data.
* Understanding of data partitioning, serialization formats (Parquet, ORC, Avro), and data compression techniques.
* Familiarity with Big Data technologies such as Hadoop, Hive, and Kafka (optional but preferred).
Cloud Platforms (Preferred):
* Hands-on experience with AWS services like S3, EMR, Glue, or Redshift.
* Knowledge of Azure Data Lake, Databricks, or Google BigQuery is a plus.
Additional Tools and Frameworks:
* Familiarity with CI/CD pipelines and version control tools (Git, Jenkins).
* Experience with orchestration tools like Apache Airflow or Luigi.
* Understanding of containerization and orchestration tools like Docker and Kubernetes (preferred).
Experience:
* Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
* 5+ years of experience in Python programming.
* 4+ years of hands-on experience with PySpark.
* Experience with Big Data ecosystems and tools.
#J-18808-Ljbffr