Big Data Engineer

2 weeks ago


الرياض, Saudi Arabia Insights Advisory Full time

**Job Summary**:
We are looking for a Data Engineer with in-depth experience in working with Cloudera, Informatica, and Alteryx to design, implement, and manage robust data engineering solutions. In this technical role, you will work with large-scale data processing systems, build high-performance ETL pipelines, and ensure the smooth integration of data from multiple sources. This position requires proficiency in big data technologies, data integration platforms, and automation tools, along with a strong ability to optimize workflows for performance and scalability.

**Key Responsibilities**:
1. Design, implement, and optimize data pipelines for batch and real-time data processing using Cloudera (Hadoop, Hive, Spark, Impala) and Informatica (PowerCenter, Cloud Data Integration).

2. Build data extraction, transformation, and loading (ETL) workflows using Informatica PowerCenter for large-scale data integration from source systems (e.g., relational databases, flat files, APIs) into Cloudera Data Lake or data warehouse environments.

3. Implement Spark jobs on Cloudera for distributed data processing and optimization of data workflows.

4. Leverage Informatica for orchestrating ETL workflows, including data extraction, cleansing, transformation, and loading into data repositories (HDFS, Hive, SQL databases, etc.).

5. Create Alteryx workflows to automate data preparation, cleansing, and transformation, making data available for downstream analysis or reporting.

6. Leverage Alteryx's native connectors to integrate with external data sources (e.g., SQL databases, APIs, cloud services).

7. Optimize the Informatica and Alteryx workflows to minimize runtime, ensure smooth data integration, and maintain high data quality.

8. Utilize Hadoop and Spark on Cloudera to process large datasets and implement data transformations using MapReduce, Spark SQL, and PySpark.

9. Leverage Impala for low-latency SQL queries on Hadoop, ensuring real-time access to processed data.

10. Implement partitioning, bucketing, and indexing strategies in Hive and HBase to improve query performance on large datasets.

11. Implement and enforce data quality rules within Informatica and Alteryx workflows, ensuring that all transformations meet the required standards for completeness, consistency, and accuracy.

12. Ensure compliance with data governance and security protocols (e.g., encryption, masking, access control) in accordance with industry best practices.

13. Automation and Scheduling: Automate ETL workflows using Informatica and Alteryx Server, integrating with Airflow, Nifi or other workflow orchestration tools for scheduling and monitoring jobs.

14. Utilize Cloudera Navigator for monitoring and auditing data processes within the Hadoop ecosystem.

15. Perform regular tuning of the ETL pipelines, data flows, and SQL queries to ensure optimal performance.

**Required Qualifications**:
1. Education: Major in Computer Science or related filed.

2. Years of experience: 4+

3. Cloudera Platform Experience: Proven experience with the Cloudera Distribution of Hadoop (CDH), including expertise in HDFS, Hive, Impala, Spark, and HBase.

4. Informatica Expertise: Strong hands-on experience with Informatica PowerCenter (ETL), EDC, IDQ, B2B, and Axon.

5. Alteryx Expertise: Proficiency in developing and automating data workflows using Alteryx Designer and Alteryx Server for end-to-end data transformation, integration, and reporting automation.

6. Big Data & ETL Knowledge: Deep understanding of ETL best practices, data pipelines, and distributed computing technologies such as Spark, MapReduce, PySpark, and Hadoop ecosystem components.

7. SQL Proficiency: Advanced SQL skills for data manipulation, aggregation, optimization, and reporting across relational and non-relational data stores (e.g., SQL Server, MySQL, PostgreSQL, Hive, Impala).

8. Programming Skills: Experience in Python and SQL.

Data Warehousing: Strong background in data warehousing principles and data modeling, including dimensional modeling (star schema, snowflake schema) and OLAP/OLTP considerations.


  • Big Data Engineer

    2 weeks ago


    الرياض, Saudi Arabia Reactive Talents Full time

    Are you an experienced **Big Data Engineer** looking for an opportunity to design and implement cutting-edge data pipelines and optimize data platforms? We are seeking a talented professional to join our team and play a key role in building scalable and efficient big data solutions. As a Big Data Engineer, you will design and develop robust data pipelines,...


  • الرياض, Saudi Arabia Reactive Talents Full time

    Are you a skilled **Big Data Administration Engineer** with a passion for managing and optimizing big data environments? Join our team and play a critical role in ensuring the stability, security, and performance of our big data platforms across production and development environments. As a Big Data Administration Engineer, you will manage clusters,...


  • الرياض, Saudi Arabia Giza Systems Full time

    The Role Job Description - Design and implement large-scale data processing systems and pipelines. - Develop, test, and deploy robust big data solutions using technologies like Hadoop, Spark, and Kafka. - Optimize data storage and retrieval strategies for performance and efficiency. - Collaborate with data scientists, analysts, and stakeholders to understand...

  • Big Data Consultant

    2 weeks ago


    الرياض, Saudi Arabia Reactive Talents Full time

    Are you a seasoned expert in Big Data with a proven track record in designing and implementing data strategies that drive business success? Do you excel in building scalable architectures and extracting actionable insights from large datasets? If so, we are looking for a **Big Data Consultant** to join our team and help businesses transform their data into...

  • Big Data Quality

    2 weeks ago


    الرياض, Saudi Arabia Reactive Talents Full time

    Are you passionate about ensuring high-quality data and building robust data models that empower organizations to make better decisions? We are seeking a **Big Data Quality & Modeling Engineer** to join our team and play a critical role in maintaining data quality and creating scalable data models. As a Big Data Quality & Modeling Engineer, you will be...

  • Big Data Specialist

    20 hours ago


    الرياض, Saudi Arabia Master-Works Full time

    Master-Works is looking for a talented Big Data Specialist to join our team and help us leverage large-scale data for strategic insights. In this role, you will be responsible for designing and implementing advanced big data solutions that enhance our analytical capabilities and drive business decision-making. **Key Responsibilities**: - Develop and...


  • الرياض, Saudi Arabia Talent Pal Full time

    Design and implement large-scale data processing systems and pipelines. - Develop, test, and deploy robust big data solutions using technologies like Hadoop, Spark, and Kafka. - Optimize data storage and retrieval strategies for performance and efficiency. - Collaborate with data scientists, analysts, and stakeholders to understand data requirements. -...


  • الرياض, Saudi Arabia Talent Pal Full time

    The Role Job Description - Design and implement large-scale data processing systems and pipelines. - Develop, test, and deploy robust big data solutions using technologies like Hadoop, Spark, and Kafka. - Optimize data storage and retrieval strategies for performance and efficiency. - Collaborate with data scientists, analysts, and stakeholders to understand...

  • Data Engineer

    20 hours ago


    الرياض, Saudi Arabia Master-Works Full time

    **Data Collection and Integration**: Data engineers collect data from various sources, including databases, APIs, external data providers, and streaming sources. They must design and implement efficient data pipelines to ensure a smooth flow of information into the data warehouse or storage system. **2. Data Storage and Management**: Once the data is...


  • الرياض, Saudi Arabia Insights Advisory Full time

    **Job Title**: Big Data Administrator **Job Summary**: **Key Responsibilities**: Informatica Administration: Install, configure, and maintain Informatica PowerCenter and Informatica Cloud Data Integration environments, ensuring optimal performance and availability. Manage and monitor Informatica repository, domain, and services, ensuring smooth operations...