Chat with us, powered by LiveChat

Web and Text Analytics

This subject is available under ICMS undergraduate degrees, please click the button below to find an undergraduate course for you.

Subject Code:

DAT302A

Subject Type:

Specialisation 

Credit Points:

3 credit points

Pre-requisite/Co-requisite: 

DAT203A Big Data Systems  

 Course level of study pre-requisite 

a total of 24 credit points (15 credit points, including ICT101A, ICT102A, ICT103A, DAT101A from level 100 and 9 credit points from level 200 core subjects) prior enrolling into level 300 core and specialisation subjects. 

Subject Level:

300 

Subject Rationale:

With the emergence of Semantic Web and new advances in related technologies, more companies are investing in extracting value from the growing web-based content. Today’s organisations have begun to pay more attention to web data and analytics as a new driver of competitive advantage. This leads to a heavy reliance on tools and technologies that analyse web-based and web-generated data which contains a large amount of unstructured textual data. As a result, over the last few years, web and text analytics have become an essential part of business intelligence and data analytics, helping businesses understand how users interact with websites, make more informed decisions, and advance their strategic planning. 

 This subject equips students with a wide range of knowledge and skills required to perform web and text analytics. It covers key topics such as extracting and processing web-related data, similarity-, association-, and classification analyses, topic modelling as well as semantic and sentiment analyses. Privacy and ethical web analytics will also be discussed. 

 Students will gain necessary skills to be able to help organisations across industries to tap into the power of web and text analytics and to improve their decision making and subsequently overall performance.  

Learning Outcomes:

a) Assess different types of data embedded in web applications including textual data.

b) Critically evaluate and apply appropriate methods and analytical techniques to extract and process web-based data and integrate them ethically with organisational datasets.

c) Analyse different patterns and hidden relationships in web-based data using relevant web analytics techniques.

d) Design and implement web analytics pipelines to perform sentiment and semantic analyses to extract insights for organisations.

e) Formulate and present insights and recommendations to various stakeholders to translate website and textual data into valuable digital assets.

Student Assessment:

Broad Topics to be Covered:

Topic: 
Week 1: Introduction to Web-Based Data 

  • Types of web-based data 
  • Extracting web-based data from APIs 
  • Social webs and their content 
  • Natural language and its structure 
  • Natural language, text, and textual data 
  • Stop words 
Week 2: Extracting Web-Based Data 

  • Web scraping and data extraction 
  • Finding relevant URLs (e.g., sitemap.xml) 
  • Implications of Web 3.0/ Semantic Web 
  • Introduction to web digging with Python 
  • Privacy and ethical web analytics 
Week 3: Preparing Web-Based Data for Analysis 

  • Data pre-processing pipeline 
  • Attribute standardisation  
  • Noise and Regular expressions 
  • Character normalisation  
  • Tokenisation algorithms  
Week 4: Feature Engineering and Syntactic Similarity 

  • Vectorising documents 
  • Document-term matrix (DTM) 
  • Similarity matrix (SM) 
  • Bag of words 
  • Models of TF-IDF 
Week 5: Text Classification Algorithms 

  • Train-Test Split 
  • Overview of web and text analytics algorithms 
  • Supervised learning algorithms 
  • Unsupervised learning algorithms 
  • Selecting the model 
  • Training the model  
Week 6: Operation and Evaluation of Text Classification Algorithms  

  • Model validation and accuracy metrics 
  • Parameter and hyper parameter tuning 
  • Classification confidence 
  • Feature importance 
  • Predictive text modelling  
Week 7: Topic Modelling  

  • Corpus parameters 
  • Nonnegative Matrix Factorisation (NMF) 
  • Latent semantic analysis  
  • Latent Dirichlet analysis 
  • Visualising topic models 
Week 8: Text Summarisation 

  • Extractive methods 
  • Topic representation modelling  
  • Distributional semantics 
Week 9: Semantic Relationships  

  • Word embeddings 
  • Similarity queries  
  • Dimensionality reduction 
  • Constructing a similarity tree 
Week 10: Sentiment Analysis  

  • Lexicon-based approaches 
  • Supervised learning approach 
  • Transfer learning approach 
Week 11: Review and Reflection  

  • Limitations of web and text analytics 
  • Future of web and text analytics  
  • Implications and potential of Deep Learning in the context of web and text analytics 
  • Review and reflection  

Please note that these topics are often refined and subject to change so for up to date weekly topics and suggested reading resources, please refer to the Moodle subject page.