SVMs and Gene Therapy: The New Duo for Cancer Treatment

10 min readOct 23, 2023

For nearly every cancer patient, there are a set of standard treatments that anyone can receive: chemotherapy, radiation therapy, hormone therapy, and surgery. But with there being over 200 strains of cancer, why is there so little variety in treatment types? Should we not cater to the patient’s specific cancer strain to ensure that we can minimize the side-effects of our treatments?

The answer is that we need an efficient way to classify different types of therapeutics and a safer way of introducing these treatments into our bodies. If we can effectively find nuances between different treatments, we can be better suited to prescribe treatments that fit the exact profile of an individual’s cancer. Furthermore, our body contains many complex biological systems, and the broader a therapy is introduced, the greater the likelihood of inducing adverse and harmful side effects.

Luckily, with the advent of AI, we can use a specific machine learning model in SVMs (Support Vector Machines) with Gene Therapy to create safer and more personalized cancer treatments.

Introducing: Gene Therapy

Currently used in clinical trials with successful outcomes, Gene Therapy involves introducing DNA into a patient to treat a genetic disease. The new DNA usually contains a functioning gene to correct the effects of a disease-causing mutation. This DNA is packed into a within a vector, with the job of carrying the DNA into the cells of a patient. Vectors usually come in the form of a virus, bacterium, or plasmid, mainly because these vectors are naturally good at entering our bodies undetected and infecting our cells. Once inside the cells of the patient, the DNA is expressed by the cell’s normal machinery leading to production of the therapeutic protein and treatment of the patient’s disease.

Using a viral vector to transport a gene into a cell (Source)

There are three main Gene Therapy techniques:

Killing of specific cells

The aim is to insert DNA into a diseased cell that causes that cell to die. This technique is suitable for diseases such as cancer that can be treated by destroying certain groups of cells.

Illustration of two ways that this treatment works (Source)

2. Gene Inhibition Therapy

This method aims to eliminate the activity of a gene that encourages the growth of disease-related cells. This method is also suitable for cancer and other diseases caused by inappropriate gene activity.

3. Gene Augmentation Therapy

This treatment targets diseases caused by a mutation that stops a gene from producing a functioning product, such as proteins. Gene augmentation therapy aims to introduce DNA containing a functional version of the lost gene back into the cell. This method is only successful if the effects of the disease are reversible or have not resulted in lasting damage to the body.

So why are we not already using Gene Therapy?

While Gene Therapy is highly flexible, as there are multiple treatment approaches to solve a specific problem, how do we know which gene to use? Furthermore, how can we ensure that our gene will not produce any unnecessary immune responses?

Luckily for us, this is where SVMs come in.

Gene Therapy’s right-hand man: SVMs

SVM is a supervised learning model that creates a decision boundary between data groups. Being a supervised learning model, the data that is fed into SVMs has already been identified (image, text file, video, etc) and given some meaningful and informative labels to provide context for the model. This information could be as general as tagging whether an image has a bird to as granular as identifying the specific pixels in the image associated with the bird.

The essential objective of training an SVM model is to create these decision boundaries or hyperplanes that are oriented as far as possible from the closest data points (support vectors) from each of the classes or groups. The way that SVMs create these hyperplanes is by finding the optimal weight and biases that separates the support vectors and maximizes the margin or distance between those support vectors.

Let’s get technical here. The exact way in which the hyperplanes are calculated is using the equation:

w^t(x) + b = 0

Here, b is our intercept or bias term, and w is our weight. Notice how this equation is very similar to slope-intercept form: y = mx + b. The dimension of the hyperplane is simply one less than the dimension of how the data points are represented. For example, if we defined the data points in a two-dimensional space, our hyperplane would be one-dimensional or a line.

Now, to calculate the distance between the hyperplanes and the support vectors, we use the standard form of an equation for a line: Ax + By + C = 0 and the Euclidean norm. The first equation gives us the distance between a line and a point where x and y are the respective coordinates of that point. The Euclidean norm, on the other hand, tells how far a vector extends from its starting point to its endpoint, taking into account all the components of the vector. The Euclidean norm is calculated using the Pythagorean theorem (a²+b² = c²), where we are solving for the hypotenuse or the length of the line.

What is special about the Euclidean norm, however, is that it accounts for dimensions. For two dimensions, the equation is √(x² + y²) and for three dimensions, it’s √(x² + y² + z²). Generally, for an n-dimensional vector, the Euclidean norm is the square root of the sum of the squares of all n components. Thus, the distance of a hyperplane from the support vector on a two-dimensional plane can be written as:

d= |Ax + By + C|/ |w|

Here, |w| represents our Euclidean norm.

The reason why accounting for dimensions is useful is because SVMs can classify data in multiple dimensions. What I mean by this is that sometimes data may not be linearly separable, which means that some data may not be separated by one thing but instead by a combination of multiple factors. Adding higher dimensions allows us to compare various data features, better allowing us to cluster them into groups and form our hyperplanes. SVMs accomplish this through kernel functions, which are just a class of algorithms that can take some data points and make infinitely many dimensions to accurately separate them.

What may be inseparable in one dimension becomes easily separable once we add more dimensions/parameters

Drug discovery

Now that we know a bit too much about how SVMs work, let’s actually delve into the issue: how do we find the appropriate treatment for a patient that is as safe and effective as possible?

Currently, drugs for a variety of deadly cancers remain limited, and we are now at the stage of using general-purpose treatments that generally work for all types of cancers. The problem with these treatments is that they produce side effects (chemotherapy results in hair loss for example) and do not account for the individual and their personal bodily reactions to the treatment. Moreover, the current drug discovery process involves continuously testing if a compound works against a specific biological target, which is extremely time-consuming and costly when going through a large dataset of compounds.

SVMs can aid this process by using maximum-margin hyperplanes to separate the active compounds from the inactive ones. These specific hyperplanes make sure to create the largest possible distance between any labeled compound, significantly reducing the number of compounds that researchers need to test and conduct further research on.

Once researchers narrow down their list of potential anticancer drugs, SVMs could be further useful in drug classification, using a drug’s genomic features to separate which one would be most helpful given a certain cancer cell line. This breakthrough could usher in a new era of personalized medicine where patients can have treatments catered to not only their branch of cancer (breast, skin, brain) but also to their exact cancer cell line (including any potential mutations), reducing a potential side effects or risk of the treatment not working.

Visual of SVM using maximum margin hyperplane (Source)

SVMs use in drug classification has already been tested by a group of researchers led by Sudheer Gupta. In their work, they investigated the drug profile of 24 anticancer drugs that were tested against a large number of cell lines with the goal of understanding how changes in the genotype of a cancer cell line might be connected to the cell’s resistance to these drugs. The team used an SVM regression model to help them predict and prioritize how these drugs were against these cell lines. For reference, a cell line was only sensitive to a drug if its growth inhibition IC50 value is less than 0.5 μM.

In their study, they concluded that only a few drugs — Panobinostat, Paclitaxel, Irinotecan, 17AAG, Topotecan — were effective against more than 50% of the cell lines with Panobinostat and Paclitaxel being the most impressive. Panobinostat was observed to be effective against more than 99% cell lines while Paclitaxel was effective against 83% cell lines while also being effective against 100% of the cell lines belonging to Autonomic Ganglia tissue.

The effectiveness of all 24 drugs. The right column contains names of drugs and bottom row has names of tissues. The data points in the middle represent the percent of how sensitive a cell line is to this drug. In other words, if tissue is 50% responsive to drug Y, this means that only 50% of the cell types in the tissue are responsive to the drug (Source).

In examining the effect of variation and mutation in drug resistance, the team found that even altering one gene can completely change the efficacy of a drug. For example, the team found that the PDE4DIP gene mutated in 241 cell lines and 99% of these cell lines were also resistant to the anticancer drug PF2341066. While this does not obviously indicate a direct causation, it is interesting to note how gene mutation can make a drug completely useless against that cell line.

So why do these results matter?

Taking both of these results into account, it becomes increasingly clear that a one size fits all therapeutic is nearly impossible. With so few drugs initially proving effective against various cancer cell lines, it is challenging to envision creating a therapy that can also maintain its effectiveness against mutated cell lines. Thus, personalized medicine quickly becomes the most promising option for effective cancer treatment. Instead of giving everyone the same general treatment, how about we study everyone’s cell line, accounting for any potential mutations and variation, and provide them with the most effective drug based on their specific disease. As demonstrated in Sudheer Gupta’s work, this is exactly where SVMs come in as they are the ideal model for pinpointing what is the most effective therapeutic treatment. Going back to Gene Therapy, if SVMs can find the best gene to use for a patient, we can use Gene Therapy techniques and viral vectors to carry this treatment directly into the cells of the patient, greatly reducing the risk of extreme side-effects. Moreover, SVMs could also be used to test the potential therapeutic effect on other genes or biological processes within our body to ensure that it will neither produce unintended side effects nor trigger an immune response. Thus, in combining these two technologies, we can use SVMs to find the best therapeutic and then use Gene Therapy techniques to safely deliver this medication to the patient.

Challenges

So why is this not already a reality? Well, SVMs are not an entirely perfect model:

SVMs are extremely computational intensive, especially as we get to higher degrees of dimensions which will happen if we want to test extremely large and complicated datasets.

For reference, here are the most basic dimensions. You can imagine how complex and computationally intensive it would be to model past 4D (Source).

2. Finding the best model requires a lot of time in order to test the various combinations of kernels and model parameters.

3. SVMs can be slow to train, particularly if the input dataset has a large number of features.

Another issue in general is finding the genes to classify in the first place. While SVMs are good at finding nuanced differences in genes that will help researchers identify the best gene for the specific issue, researchers need to find promising genes in the first place.

However, I believe as computational power becomes even better and with the potential introduction of quantum computers to allow for even higher dimensions, SVMs will be the best method for discovering new target genes for various cancers. The reason why quantum computers can elevate the potential of SVMs is that they are extremely good at processing and storing data, meaning that they can classify more potential therapeutics in parallel than our current classical computers. Furthermore, since quantum computers have more processing power, they can also handle higher dimensions, allowing SVMs to separate data at a more granular level and ultimately give researchers a better picture of the most effective therapeutic for a particular scenario.

Now, with the added power of Gene Therapy, this duo will potentially solve an almost 5,000 year old problem (that’s right, we have known about cancer for 5,000 years as it was first documented in Ancient Egypt) and bring us one step closer to a cancer-free future.

Sources

Thanks for reading until the end. If you want to see where I got all of my information from, check out these links!

SVMs and Gene Therapy: The New Duo for Cancer Treatment

Introducing: Gene Therapy

So why are we not already using Gene Therapy?

Gene Therapy’s right-hand man: SVMs

Drug discovery

So why do these results matter?

Challenges

Sources

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Natan Kramskiy

No responses yet