AI may not actually be racist, it might just be the limitations of the data and processes

Limitations of Machine Learning and how to overcome them.

Machine learning is a tool we can use to solve real-world problems or respond to user needs. However, you might waste a lot of money and time trying to incorporate it into your own products without understanding it. In particular, product managers should be aware of the limitations of Machine Learning before they dive into such projects.

There are many new startups using AI on a daily basis across the world. While it is true that they may have great technology, but very few of them actually solve problems. One of the reasons is that in most cases it just ends with the talent acquisition. So, the main focus should be on clearly defining the problem instead of just using machine learning.

Problems for which machine learning algorithms are useful

Machine learning is basically useful for solving problems that require pattern recognition. Most problems can be divided into the following:

1. When you don’t know what you need from a lot of information

A search such as Google is a good example. Behind the scenes we use a number of machine learning algorithms. It can also be used to categorize more information automatically. So, for example, this is an article on politics, some of the technologies that use Git, and so on.

2. Recognizing complex things

Automated driving must recognize the complex situations surrounding it. In the case of photo services, you can analyze an image and automatically information from it. For example, location, people objects, etc. For such complex recognition, it is effective to provide a large amount of data to the pattern recognition algorithm.

3. Forecasting

Machine learning is useful when you want to predict user patterns. Such as if they will like articles they have read, or if they will cancel their subscriptions. We can also use it to forecast what will happen in the next three months. Even information such as inventory stocks and so on can be predicted using basic machine learning algorithms.

4. Detecting outliers

It can be used to detect something that is doing something unusual. Machine learning is good at pattern recognition. So you can use it to detect abnormal behavioral patterns or anomalies. Examples include detecting fraudulent use of a credit card or home intrusion systems.

5. Assist decision making

Some times we want to propose information to help human decision making. In other words, recommendation systems. This is the mechanism by which Amazon and Netflix recommend books and movies. These recommendations themselves are based on the patterns in the user’s previous purchasing and viewing experience data.

6. Communicate with several people

You can use machine learning to perform natural language processing. For example, Alexa, Siri, and Google Assitant use human language to viable tasks.

7. Creating new experiences using augmented reality etc.

For example, SnapChat has many filters that use a face recognition algorithm to process the face. It helps to create a better user experience.

Limitations of Machine Learning Algorithms

There are five skills that are important for product managers. It’s a customer perspective and an understanding of design, communication, collaboration, business strategy, and technology. As machine learning is increasingly used these days, understanding of technology is even more important. As a product manager, you don’t necessarily need a deep understanding of machine learning. However, you still need to know the basics of how machine learning works. This will help you make good product decisions.

Once you know the impact and limitations of machine learning algorithms, you can decide on how to solve the problem. Depending on the case, you might want to use regular product design techniques, or use machine learning. Here are five machine learning limitations that all product managers should know.

1. Bias in the data

It is important that you have data that represents the user you are targeting. The bias problem of this data is a problem that comes out quite often when doing machine learning projects. For example, when there are not many people converting, you will have more data about people who have not converted. It is difficult to predict who will convert on the basis of such data. Another recent example was when Google had labeled people of African-American ethnicity as Gorillas. This was also because the original data did not contain enough data for African-American photographs.

Even if there is nothing wrong with the way the data is collected, the data at hand might be biased. So it is important to clean up the data in advance.

2. Trade-off between precision and recall

Consider an example of filtering users of a service. Suppose one team wants to get rid of only bad users. You want good users to keep using the services. So you have to prevent them from being removed by mistake. In this case, precision is important. 

Now, suppose another team does not want bad users to not use the service at all. Even if it gets rid of some good users. In other words, recall is important for them. 

The trade-off between precision and recall is an age-old debate. It is important to understand that the other will get worse as one gets better.

3. Cold start

This is the question of how to predict the algorithm when there is not enough data yet.

There are two patterns. 

Cold start for users.

For example, there is no data for a new user, so we don’t know what to recommend. There are several ways to solve this. It asks some questions when a new user first comes up. I went with what kind of movie I like. Or it is a way to say to the extent that can be done using other available data. For example, you may be recommended the 10 best movies for your local area.

Cold start for items/products.

In the case of new items, we do not know what to recommend since there is not enough data.

One solution is to label them manually. Tagging an item by a category expert makes it easy to recommend it to interested users.

An alternative solution is to use an algorithm. This is like A / B testing. You can randomly show new items to the user, and the results can be fine-tuned. It’s a way to start learning about users quickly.

4. Feedback loop for model validation

You should create a mechanism in your machine learning model that can give feedback for that model. By doing so, you can verify the model’s performance in the real world and continue to improve it. The feedback is that the user ignored the recommended news, or conversely read the news. This can also give the user clear feedback. This is often the kind of message “Is this article helpful?”

5. Exploration and Exploitation

Suppose Netflix finds that you like to watch football. The recommended list will then contain related programs, such as football games and documentaries. So if you look at some of those things, even more football related things will be recommended more and more. This type of algorithm is optimized using the signal found. Obviously, you would also be interested in other things than football. For example, technology, but Netflix won’t recommend technology-related content. This is often referred to as a filter bubble in recent media. If you ‘like’ specific news on Facebook etc., the news will fill up the timeline. In this way, you will not be able to see any other news or viewpoints. This makes the user lose the perspective of anything other than a few interests.

In order to solve this problem, the system has to recommend content that makes the user interested or willing to explore. There are various ways to achieve this even if there are not clear signals. For example, deciding on a random basis, or deciding based on the preferences of other similar users.

Finally…

I hope the article was useful. You can use machine learning to improve existing products or services, you can just try to create a new product. Even if you don’t use those technologies yourself, it would be useful to organize and understand the limitations of machine learning.

The purpose is the most important thing to keep in mind. Before deciding on which machine learning algorithms to use and which data to access, you should first identify what the purpose is and what problems you are trying to solve.

This serves as a guideline for solving various problems that appear when starting a project. Further, it can also serve as a basis for evaluating the product.

Further Readings: