Machine Learning - what’s going on under the hood?

The human body is a remarkable piece of biological engineering. Our body will not only adapt to be able to perform task but will also begin to learn how to perform that task again with the minimal amount of effort.

The same could be said for the evolution of programming languages and frameworks. We strive to streamline our development processes to make them efficient and allow our developers to focus on what really matters.

It’s second nature to introduce a library into our solution to perform a specific task, reducing the amount of boilerplate code that we have to write ourselves.

Whilst we continue to write code at such a high level, are we starting to starve our techie curiosity and therefore inadvertently introduce code into our solution in an unethical way?

This question is becoming even more important with the momentum of Machine Learning based applications. When it comes to explaining it, I am forever hearing the words “Just think of Machine Learning as a black box, you feed your data in at one end and out comes a decision at the other.” It’s a pretty useful expression when trying to explain Machine Learning in the wider context of your solution. However, I would say that as a developer this does not mean that we shouldn’t open that black box. I can’t help but feel that it is all too easy to become blinded working at such a high level.

Machine Learning Under The Hood 1

Remember, we are responsible!

If we are now adopting Machine Learning into our solutions can we honestly say, with any level of confidence, that it is a best fit for our solution if we don’t actually know what it is doing? In the same way that we introduce a new logging framework or ORM framework, we still have a duty of care to understand the implications of using Machine Learning and what it is doing with our data, even if the results are fruitful.

History tells us everything. It wasn’t so long ago that developers were flocking from traditional relational databases to NoSQL due to its high performance and scalability. A consequence of being blinded by these benefits was that ACID transactions were sacrificed.

It’s important that we learn from these lessons when considering our strategies for adopting Machine Learning. Whilst we may have got away with this in the past, is this now our opportunity to reflect? Adopting Machine Learning into a software solution without understanding what is going on under the hood is essentially adopting it in a very unethical way.

Machine Learning Under The Hood 2

Knowledge is power...

So now that we have seen the error of our ways, we have opened the black box and taken a good look inside. We are now in a good place, right? Surely we can sit back, relax and enjoy the fruits of our Machine Learning labour.

What happens when the machine gets it wrong and makes a bad decision?

Let’s just think about us humans for one second. We are often biased and sometimes rely on our ‘gut feelings’ which can make it difficult when we have to conduct post-justification of human reasoning. Nevertheless, we are still accountable for our decisions.

If I am turned down for an insurance quote or a mortgage, I have the right to request information about why I have been rejected. If we are now moving to a world in which Machine Learning is being used to make these important decisions, there is still the same level of accountability. Treating the Machine Learning as a black box is not sufficient. Not only do we need to know what is going on under the hood, we also need to be able to conduct the same level of post-justification of its reasoning.

Do we have the power to ask the machine why it came to that decision? In traditional software the clarity is within the code itself. If we follow best practices then the code is readable and can be comprehended by a developer. We also have the power to debug. We can step though our readable code and find out precisely why a decision came about.

Machine Learning Under The Hood 3

Can we debug Machine Learning methods?

Imagine you are back at school and have been sitting a mock maths exam. You are asked a question and it is important to show your workings out as well as the answer. A few days after sitting your exam the teacher tells you that the answer you arrived at is wrong and asks you why you arrived at your answer. You go through the steps of your workings out one by one and spot a mistake. You are able to understand how you arrived at the incorrect answer and can now correct it. Now imagine trying to do the same thing but this time the teacher has removed all of your workings out. Without reciting it step by step from memory it is very hard to work out how you arrived at your answer.

This is precisely the problem with trying to debug Machine Learning. Training the machine to learn a task involves showing the machine some data and telling it what the answer is. Over time the machine will update its internal model to reflect what it has seen in the past. Once it has finished training, its workings out (the data it was trained on) are completely removed from the machine and all that remains is the answer it came to and it’s internal representation which is not easily comprehended by a human. So how can we ask it to work out how it arrived at it’s answer without the workings out?

Perhaps what we really should be doing instead is asking ourselves the question:

“If the Machine Learning internal representation is too complex for me to understand, should I be adopting Machine Learning for this type of use case?”

Finally, it might possible that we can engineer a series of simpler comprehensible Machine Learning models that aggregate to form a complex decision? If so we are stepping our way to a solution rather than introducing a big bang approach with a black box. This is analogous to good quality software engineering, when we break down the problem into smaller focused tasks instead of adopting a large procedural approach.

For more information on this topic. At ITDF 2018, we presented a distilled generic process which can help interested parties introduce Machine Learning to their business.

This site uses cookies. Continue to use the site as normal if you are happy with this, or read more about cookies and how to manage them.