How machine learning and RNA-seq are helping patients diagnosed with the most common childhood cancer

New software developed by Peter Mac and collaborators is helping patients diagnosed with acute lymphoblastic leukaemia (ALL) to determine what subtype they have. ALL is the most common childhood cancer in the world, and also affects adults.

“Thirty to forty per cent of all childhood cancers are ALL, it’s a major paediatric cancer problem,” says Associate Professor Paul Ekert from Peter Mac and the Children’s Cancer Institute, who was involved in this work.

Over 300 people are diagnosed with the disease in Australia each year, and over half of those are young children under the age of 15. Determining what subtype of ALL a patient has provides valuable information about their prognosis, and how they should best be treated.

“Figuring out what genetic changes are driving a patient’s cancer is key to working out how intense their treatment should be and what drugs get used,” Associate Professor Ekert says.

But up until the advent of genomic technologies like RNA sequencing, methods for doing so were not as precise.

“Previously, genetic abnormalities were detected by looking down a microscope at individual chromosomes and looking for four or five main defects,” Associate Professor Ekert says.

“But we now know there are at least 23 subtypes for ALL.”

In a paper published in Blood Advances late last month, Peter Mac researchers and co-authors from the University of Melbourne, the Murdoch Children’s Research Institute and the Children’s Cancer Institute describe the ALLSorts, software that uses RNA sequencing data to identify a patient’s ALL subtype.

Overview of the ALLSorts classification strategy for new input

Green circles are where the probability exceeds threshold. No probabilities are calculated for the black circles as classification terminates at their meta-subtypes. In this example, two meta-subtypes exceed their thresholds at the first level. However, only one nested subtype succeeds. This would result in a multi-label classification consisting of the deepest subtypes/meta-subtypes that exceeded their respective thresholds.

“ALLSorts adds a different way of finding these genetic drivers, and classifying what subtype of ALL a patient has,” says Peter Mac’s Professor Alicia Oshlack, the senior author on the paper.

“And it can be used with even a single patient sample, so testing centres regardless of their size will be able to use it.”

To the best of the researchers’ knowledge, ALLSorts is also the first publicly available and open source tool of its kind.

“We used a machine learning approach, and validated the accuracy of our software on children’s cancer samples from the Royal Children’s Hospital and adult cancer samples from Peter Mac,” Professor Oshlack says.

In machine learning, it is the computer that puts all the information from a large dataset together to use the most informative features of the dataset, rather than relying on the human researchers to determine what the important pieces of the data are.

“Hopefully this software can be used across the world in testing ALL and informing treatment choices for patients,” Professor Oshlack says.

“And it’s also a nice example of the importance of computational biology in cancer research.”

SourcePeter MacCallum Cancer Centre


Schmidt BM, Brown LM, Ryland G, Lonsdale A, Kosasih HJ, Ludlow LEA, Majewski IJ, Blombery P, Ekert PG, Davidson N, Oshlack A. (2022) ALLSorts: a RNA-Seq subtype classifier for B-Cell Acute Lymphoblastic Leukemia. Blood Adv [Epub ahead of print]. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.