This contract with our government client is for a Data Engineer for 4 months based in London, Newcastle, Coventry / Hybrid, 3 days per week onsite. The Data Engineer will have direct oversight and responsibility for the development, implementation, review, and establishment of data and techniques used in the Advanced Data Analytics Portfolio (DAP). As a Data Engineer, you will work with project managers, data scientists, counter fraud experts, stakeholders, and senior management across the full extent of the organisation and their wider stakeholders in pursuit of the above. Your primary responsibilities will include developing data pipelines, processing data, and preparing data suitable for analysis. Additionally, you will evaluate, maintain, and manage data pipelines, develop and maintain data models, and support the design of analytical services, products, and models that meet the needs of the organisation. You will also guide, develop, document, and maintain data pipelines and storage solutions used in the analytics environment, ensuring that the infrastructure and methods are secure, optimised, and scalable within the cloud computing environment. All of this must be aligned within the strategic framework that drives innovation and sustained organisational impact across the client and wider public sector. Main responsibilities: • Develop and maintain robust, efficient, secure and reliable data pipelines using appropriate business tools. • Build data pipelines that clean, transform, and present granular and aggregate data from disparate sources. • Design, build, test, automate, and maintain architectures and processing workflows. • Plan, develop and evaluate methods and processes for gathering, extracting, transforming and cleaning data and information. • To undertake development of the data warehouse, including overall design, technical development and documentation of the data warehouse, infrastructure and ETL solutions covering multiple sources of data, working alongside the organisation's specialists. • Write ETL (extract, transform, load) scripts and code to ensure the ETL process performs optimally. • Define and document the pre-processing steps, data wrangling techniques and feature engineering undertaken on a given dataset. • Design and maintain robust processes to ensure the organisation and external stakeholder datasets are separated and securely stored across within the tenancy. • Implement data flows to connect data for analytics and business intelligence (BI) systems. • Develop, control and complete data quality process assessments on dataset ensuring data fit for purpose. • Work with data and analytics/data science experts to strive for greater functionality in their data systems. • Identify, design, and implement internal process improvements: automating manual processes, optimising data delivery, re-designing infrastructure for greater scalability, etc. • Develop efficient and easily maintainable processes to handle highly complex, large-scale datasets. • Ensure highly complex data process tasks are fully automated and scheduled to run out of hours wherever practically possible. • Identify and advise on inefficient existing queries and propose appropriate changes. • Oversee the development of data transformation routines from both complex technical specifications and natural language agreements. • Work in a multitude of trusted research environments to produce extracts of data pipelining to data scientist ready for analysis. • Rapidly understand and evaluate different databases from their structure, documentation, and contents and where required engage with system and domain experts to obtain understanding of the data. • Review and develop processes and applications as the portfolio of work, technology and techniques emerge, guaranteeing systems and processes remain current and fit for purpose. • Maintain the security of the organisation and stakeholder data, ensuring the confidentiality of personal and commercially sensitive data at all times in line with the relevant information governance and technical security standards and policies of each database and data source acquired. • Design and maintain appropriate and robust data retention processes to ensure timely and effective removal of obsolete data in line with the organisation policy and the wider IG requirements. • Work on a range of relational database systems (for example, Oracle, Microsoft SQL server, Sybase SQL server SQL, MySQL etc) • Knowledge and experience of tools and programming languages such as SQL, XML, R or Python, Alteryx (Designer and Server) and Power BI etc. • Use techniques to ensure efficient queries on large datasets in the organisation's environment, including batch processing, partitions, and indexing etc. • Produce flexible, programmatic code that can be re-used in multiple settings. • Continuously develop and document data engineering methods and tools to transform data into suitable structures for analysis. • Manage a structured testing program for data workflows and engineering tools developed and deployed to the organisation. • Inspire best practice for database management on all data products including efficient processing practices, providing advice on best practice and standards. • Support the effective development and deployment of new products, such as Machine Learning models, dashboards, or dynamic reports. • Knowledge of machine learning techniques and how data is prepared to use and optimise within the data science framework. • Undertake machine learning for engineering practices, such as meta data driven intelligent ETL and pipe processes. • Provide a specialist data engineering services to the organisation and portfolio of work. • Develop business intelligence reports that can be reused by other team members which are easy to interpret e.g., system management (cloud computing resource). • Work with portfolio members including external stakeholders to identify, plan, develop and deliver data products within the Advanced Data Analytics Portfolio (DAP).
Requirements
Key Essential Skills: • Practical implementation of data solutions using Azure tech stack (Data factory, DataLakes, Notebooks, Synapse warehouse) • Worked with varied data sources (On-Prem databases, Data Lakes, ERP systems) • API Integration • Complex data transformation using PYSPARK, SQL and optimization techniques. • Good understanding of Data modelling and Warehousing concepts. • Data Documentation • Basic Machine learning experience Good to have: Experience in Fabric