Data Science Challenge

Who can Enter:

Anyone who is enrolled in CDU IT Courses (VET, undergraduate and postgraduate courses) in Academic year 2022

Your Mission


Family and domestic violence is a blight in the community.

In 2022, the Australian Institute of Health and Welfare reported that one in six women across Australia experienced physical or sexual violence by a current or previous partner since the age of 15.

Criminal and family law in Australia is regulated through six State and two Territory jurisdictions. Unfortunately, domestic violence legislation in each jurisdiction differs in its approach and complexity in tackling the problem.

While the calls for better consistency among these legislations are loud and clear, accomplishing that objective is difficult. Comparing legislations is a monumental task usually done manually by legal experts.


This challenge focuses on measuring the complexity of domestic violence legislation in Australia. Your team must propose, model, and demonstrate a scalable data science solution that automates this process.

The solution must enable the comparison of legislation complexity between states and territories and be scalable to larger or different legislation corpora in the future.

Important dates

  • Information Session
    • Date: Thursday 3rd November 2022
    • Time: 15:00 hrs
    • Venue: Virtually on Teams Meeting 
  • Close of Python solution submission
    • Date: Sunday 27th November 2022 
    • Time: 23:59 hrs
  • Challenge Date
    • Date: Tuesday 6th December 2022
    • Time: 9:30 – 12:00 hrs
    • Venue: Purple 12.1.15

Scope and dataset

For this Challenge, you will only analyse a set of family and domestic violence legislation listed in the table below. These legislation documents are available as HTML from the Australian Legal Information Institute website.

          Table 1. List of legislation to be analysed

JurisdictionRelevant legislation
Australian Capital Territory

Family Violence Act 2016 (ACT)

New South Wales

Crimes (Domestic and Personal Violence) Act 2007 (NSW)

Northern Territory

Domestic and Family Violence Act 2007 (NT)


Domestic and Family Violence Protection Act 2012 (Qld)


Family Violence Protection Act 2008 (Vic)

   RemarkSouth Australian, Tasmanian and Western Australian legislations are not part of this challenge.


  1. Participants must be in teams of 2 to 3 students. Individual submission is not accepted.

  2. The data science solution must be developed in Python. Results must be submitted as either .py scripts or Jupyter Notebooks written in Python.

  3. There is no restriction on the specific Python packages.

Specific tasks

Measuring the complexity of legislation is done by completing a list of well-defined analytic tasks below. Your solution must precisely compute the requested output in each task.

You must progressively tackle the tasks from the Easy level up to the Harder level (Task 1 to Task 9). Skipping tasks is not permitted; attempts made on a harder task will not be considered unless the easier tasks are also attempted.

A solution that could automatically scrape the HTML page of each legislation is highly desirable. 

Easy: Length and structural elements of a legislation

  • Task 1. Word count of a legislative text
  • Task 2. Word count per chapter, part, divisions, subdivisions, and sections
  • Task 3. Number of sections, subsections

Medium: Amendment frequency of a legislation

  • Task 4. Extract the year of the legislation enactment
  • Task 5. Extract the years and count the frequency of legislation amendments

Hard: Readability of a legislation

Harder: Cross-referencing of a legislation

  • Task 7. Number of internal cross-references (i.e. to provisions within the same legislative text)
  • Task 8. Number of external cross-references (i.e. to provisions in other legislative texts, such as cross-references from an Act to regulations or another Act)
  • Task 9. Uncover interesting cross-referencing pattern


Important! Your analyses must culminate with a comparative analysis of the complexity of the given legislation in Table 1. These results must be communicated to the judges during the presentation. You need not be a legal or domestic violence expert in providing the analysis.




The following items must be submitted before the closing date of the submission (not the presentation date):

  • One .zip file containing .py scripts / Jupyter Notebooks, all processed datasets, and the requirements.txt file to replicate your Python environment.


Clearly identify your submission with your team’s name. Late submissions will not be accepted.

Upload your submission here:

Team presentation

Your team is required to pitch your work to a panel of judges on the stated presentation date at Charles Darwin University (Casuarina Campus). This is your best chance to convince the panel of judges on the viability of your models.

Duration: 10 minutes pitching + 5 mins Q&A per team

Judging criteria

Participating teams will be judged along 5 criteria:

  1. Innovation (25%)
  2. Sophistication (15%)
  3. Results (30%)
  4. Scalability and automation (10%)
  5. Presentation (20%)

Judging Panel

  1. Dr Guzyal Hill, CDU
  2. Ewan Perrin, DCDD (NTG)
  3. Liam Ma, ALPA
  4. Patrick Orr, Office of the Parliamentary Counsel (NTG)

Recommended readings

To help you better understand the context of this challenge and gain inspirations, consult these recommended readings.







  • Ruhl, J. B., Katz, D. M., & Bommarito, M. J. (2017). Harnessing legal complexity. Science355(6332), 1377-1378.


  • Coupette, C., Beckedorf, J., Hartung, D., Bommarito, M., & Katz, D. M. (2021). Measuring Law Over Time: A network analytical framework with an application to statutes and regulations in the United States and Germany. Frontiers in Physics9, 658463.

Information Session Recording

Be a part of this event

Register now for this great event.