Chat with us, powered by LiveChat

Big Data Architecture and Management

Big Data Architecture and Management is available as a postgraduate-level subject offered by the International College of Management, Sydney (ICMS). Please click the button below to find a postgraduate course.

 

Subject Code:

DAT801A

Subject Type:

Specialisation

Pre-requsite:

  • DAT701A Analytics for Business Intelligence 
  • ICT701A Software Design and Construction 
  • Course level study pre-requisite: a total of 16 credit points (4 subjects) prior to enrolling into the subject. 

Subject Level:

800

Credit Points:

4 credit points

Subject Aim:

Given the increasing digitalisation, massive amounts of structured and unstructured data are generated across various sources. This phenomenon of constantly growing data that is extremely complex in structure is referred to as Big Data. Traditional databases are not suitable for efficient indexing, sorting, searching, analysing, and visualising of Big Data. To be able to use this data to gain deeper insights and, thus, leverage a competitive advantage, organisations need the competencies to manage Big Data and to apply Big Data technologies and techniques effectively.

This subject provides students with an introduction to Big Data, its 5-V characteristics (Volume, Velocity, Veracity, Variety, and Value), resulting challenges for organisations, and state-of-the-art technologies and techniques such as NoSQL, parallel and distributed memory systems, and streaming models. It introduces key concepts on how to create business value from Big Data and consequently empowers students to apply their Big Data expertise to develop business solutions.

In this subject, students will study Big Data not only from a theoretical or technical perspective, but they will also learn to master the challenges of Big Data in a complex business context. This includes studying architectures, systems, and management techniques.

Learning Outcomes:

a) Communicate knowledge about Big Data characteristics, challenges for traditional database systems, and state-of-the-art Big Data architectures and techniques to professional audiences.

b) Develop requirements and contextualise Big Data concepts to complex use cases.

c) Critically evaluate and select distributed storage and computation techniques to manage Big Data and draw value from it. 

d) Design and implement a prototypical Big Data solution to a complex business problem by applying advanced data storage, processing, querying and analysis tools.

e) Plan and develop contemporary Big Data solution, critically analyse risks and opportunities, and effectively communicate recommendations to various stakeholders. 

Assessment Information:

Learning outcomes for this subject are assessed using a range of assessment tasks as described in the table below.

Broad topics to be covered: 

Week 1: Introduction to Big Data 

  • What is Big Data? 
  • Big Data Characteristics 
  • Big Data Vectors/ + other Vectors 
  • Extended Big Data Characteristics 
  • Why is Big Data valuable? 
  • What is driving Big Data? 
  • Big Data Challenges 
Week 2: Tradition Databases vs. Big Data 

  • Scaling Traditional Databases 
  • Data Sharing and Combination 
  • Amdahl’s Law 
  • Guidelines for effective parallelisation 
  • The CAP-Theorem (Limitations of Distributed Databases) 
  • The BASE Properties 
Week 3: Enterprise Architecture Management and Big Data 

  •  Enterprise Architecture Management and Big Data 
    • Short overview of EAM 
    • Competitive Advantage of Big Data 
  • EAM as a Starting Point for the Establishment of Big Data 
    • Introduction and Development of Enterprise Architecture 
    • Introduction of Big Data under Consideration of the Enterprise Architecture 
Week 4: MapReduce Programming Model 

  • Infrastructure Challenges 
  • Distributed Parallel Processing of Big Data 
    • MapReduce 
    • Apache Hadoop 
    • Real or Virtual Servers 
  • NoSQL Databases 
  • In-Memory Technologies 
  • Processing of Big Data Streams 
  • Reference Architectures for Big Data-Infrastructures 
  • Lambda-Architecture 
  • Operation of Big Data Infrastructures 
    • IaaS 
    • PaaS 
    • SaaS 
  • Data Science as a Service 
Week 5: MapReduce Programming Model 

  • MapReduce Concepts and History 
  • MapReduce Paradigm 
  • Map and Fold 
  • Map & Reduce from a Developers Point of View 
  • Discussion of a Practical Example 
Week 6: Hadoop Platform for Storing and Processing Big Data 

  • Hadoop Concepts 
    • Mapper 
    • Reducer 
    • Combiner (optional) 
    • Partitioner (optional) 
    • Driver/Job Configurator 
  • What is a File System? 
  • GFS and HDFS 
Week 7: Spark Platform for Storing and Processing Big Data 

  • Why Cluster Computing? 
  • What’s Hard About Cluster Computing? 
  • Apache Spark Motivation 
  • The Spark Computing Framework 
  • Spark Tools 
  • Spark and MapReduce Differences 
Week 8: Data Streaming: Introduction 

  • Why Data Streaming? 
  • Business Examples of Data Streaming 
  • Streaming Architecture 
  • Data Streaming Scenario Characteristics 
  • Data Stream Models 
  • Concepts of Data Streaming (e.g., Sampling) 
Week 9: Data Streaming: Sampling 

  • Types of Sampling e.g., 
    • Reservoir Sampling 
    • Distributed Reservoir Sampling 
    • Frequency Sampling 
  • Application Areas 
  • Parameters 
  • Distinct Element Estimates 
  • Estimating Moments 
  • Alon-Matias-Szegedy Algorithm 
  • DGIM 
  • Bloom Filters 
Week 10: Processing Big Data with Scala 

  • What is Scala? 
  • What makes Scala Scalable? 
  • Scala: Variables 
  • Scala: Expressions 
  • Scala: Unite Type 
  • Scala: Procedures 
  • Scala: Functional Objects 
Week 11 Current Trends and Future Potentials of Big Data 

  • Data Sovereignty 
  • Scalability of Big Data 
  • Cloud Computing Platforms for Big Data Adoption and Analytics 

 

Please note that these topics are often refined and subject to change so for up to date weekly topics and suggested reading resources, please refer to the Moodle subject page.