SIT731 - Data Wrangling

Unit details

Note: You are seeing the 2022 view of this unit information. These details may no longer be current. [Go to the current version]
Year:

2022 unit information

Important Update:

Unit delivery will be in line with the most current COVIDSafe health guidelines. We continue to tailor learning experiences for each unit to achieve the best possible mix of online and on-campus activities that successfully blend our approaches to learning, working and research. Please check your unit sites for announcements and updates.

Last updated: 4 March 2022

Enrolment modes:Trimester 1: Burwood (Melbourne), Online
Trimester 3: Burwood (Melbourne), Online
Credit point(s):1
EFTSL value:0.125
Unit Chair:Trimester 1: Marek Gagolewski
Trimester 3: Marek Gagolewski
Prerequisite:

SIT774.
For students enrolled in S506, S507, S508, S535, S536, S538, S576, S677, S735, S737, S739, S770, S776, S778, S779: Nil

Corequisite:

Nil

Incompatible with:

SIT220

Typical study commitment:

Students will on average spend 150 hours over the trimester undertaking the teaching, learning and assessment activities for this unit.

Scheduled learning activities - campus:

1 x 3-hour active class per week

Scheduled learning activities - cloud:

Online independent and collaborative learning including optional scheduled activities as detailed in the unit site.

Content

Data Science (DS) and Artificial Intelligence (AI) are popular fields in making sense of data that have been collected in large quantities from various sources. Performing accurate exploration and modelling using DS and AI heavily rely on appropriately prepared data. Data wrangling is the process of preparing the raw data appropriately for modelling purposes. The aim of this unit is to learn various data wrangling methodologies and programming techniques to perform them. This include programming in Python for performing various data wrangling tasks, learning data extraction methods  from different sources, working with different types of data, storing and retrieving them, applying sampling techniques and inspecting them, cleaning them by identifying outliers/anomalies, handling missing data, transforming, selecting and extracting features, performing exploratory analysis, visualisation using various tools, summarising data appropriately, performing basic statistical analysis and modelling using basic machine learning. Further, techniques for maintaining data privacy and exercising ethics in data manipulation will be covered in this unit.

ULO These are the Learning Outcomes (ULO) for this unit. At the completion of this unit, successful students can: Deakin Graduate Learning Outcomes
ULO1

Undertake data wrangling tasks by using appropriate programming and scripting languages to extract, clean, consolidate, and store data of different data types from a range of data sources

GLO1: Discipline-specific knowledge and capabilities
GLO3: Digital literacy

ULO2

Research data discovery and extraction methods and tools and apply resulting learning to handle extracting data based on project needs.

GLO3: Digital literacy
GLO5: Problem solving

ULO3

Design, implement, and explain the data model needed to achieve project goals, and the processes that can be used to convert data from data sources to both technical and non-technical audiences

GLO1: Discipline-specific knowledge and capabilities
GLO2: Communication
GLO5: Problem solving

ULO4

Use both statistical and machine learning techniques to perform exploratory analysis on data extracted, and communicate results to technical and non-technical audiences

GLO1: Discipline-specific knowledge and capabilities
GLO2: Communication
GLO5: Problem solving

ULO5

Apply and reflect on techniques for maintaining data privacy and exercising ethics in data handling.

GLO8: Global citizenship

These Unit Learning Outcomes are applicable for all teaching periods throughout the year.

Assessment

Trimester 1
Assessment Description Student output Grading and weighting
(% total mark for unit)
Indicative due week
Learning Portfolio Portfolio consists of a number of artefacts including scripts, business reports, presentations along with critique and reflections. 80% Week 12
Examination 2 hour written examination 20% Examination period
Trimester 3
Assessment Description Student output Grading and weighting
(% total mark for unit)
Indicative due week
Learning Portfolio Portfolio consists of a number of artefacts including scripts, business reports, presentations along with critique and reflections. 80% Week 12
Online Quiz 2 hour online quiz 20% Week 11

The assessment due weeks provided may change. The Unit Chair will clarify the exact assessment requirements, including the due date, at the start of the teaching period.

Hurdle requirement

Trimester 1: To be eligible to obtain a pass in this unit, students must meet certain milestones as part of the portfolio, and must achieve a mark of at least 50% in the examination.

Trimester 3: To be eligible to obtain a pass in this unit, students must meet certain milestones as part of the portfolio, and must achieve a mark of at least 50% in the online quiz.

Learning Resource

There is no prescribed text. Unit materials are provided via the unit site. This includes unit topic readings and references to further information.

The texts and reading list for the unit can be found on the University Library via the link below: SIT731 Note: Select the relevant trimester reading list. Please note that a future teaching period's reading list may not be available until a month prior to the start of that teaching period so you may wish to use the relevant trimester's prior year reading list as a guide only.

Unit Fee Information

Click on the fee link below which describes you: