Harnessing the power of business data for Machine Learning

Businesses now have scalable access to their data at an affordable price, with crucial questions being asked of that data to assist with strategic day to day business decisions. Questions that simply couldn’t be asked a few years ago unless your business invested a lot of money in technology. These large data sets are starting to give businesses a leading edge and now that we have Big Data, the natural progression is Machine Learning.

But whilst Machine Learning is beautifully poised for such large data sets and is a very popular buzzword, it isn’t necessarily a silver bullet. By that I mean you can’t just chuck Machine Learning at your Business’ data set and expect to get magical results right out of the box. The data needs to be in a shape that is amenable to a Machine Learning task.

To make this clearer, lets look at what Machine Learning is and what it is doing with your data.

What is Machine Learning?

Machine Learning is a ‘learning task’. This seems pretty obvious right but let’s just remove the machine and think about learning in general. We have two different methods of learning. Supervised and Unsupervised.

Supervised Learning

The supervised is a ‘learn by example’ approach. A child is shown a page in a book with a picture of a cat. The child is asked what animal it is. If the they get it wrong, the teacher will provide feedback “No, this is a Cat”. If the child is repeatedly shown pictures of different cats and asked what animal it is, over time the child will begin to learn how to spot a Cat from other animals. The more pictures they are shown the more confident and accurate the child becomes at spotting a cat from another animal.

What happens if we take away the teacher’s feedback in this example? Let us introduce our second method of learning.

Unsupervised Learning

The unsupervised method is the ‘learn by yourself’ approach. If we take the same example but this time the teacher does not ask the child what animal is on the page, nor do they provide any feedback as to whether the child is correct in saying that it is a Cat. All they do is show them the pictures. Over time the child will start to learn similarities between the animals on the different pages. They won’t be able to answer questions like ‘What animal is this?’ But they will be able to place animals that look similar together (clustering). By doing this they have learned to isolate Cats from other animals without understanding that the animal is a Cat. It’s likely that they may put Cheetahs and Pumas with Cat’s because they are similar but over time, the more animals they are shown, they might start to learn to separate Pumas from Cheetahs from Cat’s but again, without knowing the names of the Animals.

Hopefully the outcome from these two different methods is clear from the child’s ability after learning. Both are showing intelligence learned but the supervised learning provides an added extra value in being able to not only learn the similarities but also correctly classify which animal it is.

But what has this got to do with your business’ Big Data and Machine Learning?

What is Machine Learning doing with your data?

Firstly Machine Learning also learns by the very same two different approaches; Supervised and Unsupervised. How these two are applied to your data will determine what kind of intelligence can be learned and applied to your Business.

So let’s now consider an example that is more amenable to a typical business scenario. I will use a fictitious company for the sake of this scenario,

ABC Aviation is an aircraft engine manufacturer and have assembly lines all over the world. They are responsible for providing engines for 40 airlines, 5 of which are world leading. Every year they manufacture approximately 50,000 engines. At a high level their assembly line can be broken down into the following 5 phases:

  1. High-pressure core assembly
  2. Low-pressure Turbine assembly
  3. Accessory Gearbox assembly
  4. Equipment and accessories assembly
  5. Total visual inspection

Given the nature of their business they have a strict auditing process and have a large QA department. This is mechanical engineering so expectantly parts will break down or fail (luckily they have an excellent fail-safe system). But when aircraft engineers service them and have to report issues with them they need to feed back into the manufacturers QA department.

5 years ago the company invested in technology and provided a world wide web application so that the engineering notes could be captured stored in the cloud. These issues could then be reported back to the QA department so that the assembly line could be refactored to reduce the risk of these issues occurring again.

The current business process for this is that a daily feed of the cloud data is fetched by a team of QA analysts and distributed evenly amongst them. Their primary role is to read the engineers notes and allocate that issue to a particular phase on the assembly line. The manager for that phase will then pick up that issue and manage any changes required on the assembly line.

The business have made a strategic decision to streamline this QA process and are excited about the prospect of Machine Learning. Surely there is no need for the QA team to provide analysts to perform this classification of issues to assembly line phases, Machine Learning can do it for us.

So the business appoint a team of data scientists to develop a solution providing them with access to their cloud data.

“This cloud data contains just the engineers notes for the past 5 years” reports the Data scientists.

“That is correct, isn’t that what you need?” the business responds.

“Well right now, with the data that we have, we can only apply an unsupervised learning method.”

“Okay what does that mean?”

Cue an article explaining just what this means.

“Let’s talk about your data process, when the QA analyst analyses the engineer’s notes and allocates it to one of the 5 assembly line phases, do you capture the relationship between the notes and the phase?”

“Well not really, the QA analyst creates and provides a .csv extract file for each phase and sends it to the appropriate Assembly line phase Manager”

“Okay so this data is captured somewhere else outside of the cloud, how can we go about accessing this data”

“You’ll have to speak to the Assembly line phase managers”

I will leave the fictitious conversation there as I am hoping that the reader can identify an important missing gap in the data pipeline and a big lesson for businesses capturing their data and storing it in the cloud.

Landing that data

If ABC Aviation had also invested storing the allocated assembly line Phase with the associated engineer’s notes then the data would be fair more amenable to this QA classification problem.

It is completely analogous to the child learning to identify a Cat in a picture. Without the feedback as to whether the engine issue root cause was a problem with:

  1. High-pressure core assembly
  2. Low-pressure Turbine assembly
  3. Accessory Gearbox assembly
  4. Equipment and accessories assembly
  5. Total visual inspection

Then Machine learning can only cluster the data into a determined number of groups. In this case it can separate all of the issues out into 5 piles but it doesn’t know which pile should go to which Assembly line phase manager or whether it’s accuracy is any good.

If the feedback was captured then Machine Learning could be used to learn a relationship between the engineer’s notes and the Assembly Line Phase manager.

The Data Scientists make recommendations that ABC start capturing this data from this day forth and make some steps to attempt to back populate their last 5 years worth of data before they embark on a Machine Learning journey.

Big Data, Big Dreams!

It is safe to say that ‘Data’ is key here, and always has been. Cloud data storage is providing Businesses with affordable data storage solutions that allow them to scale up and have easy access around the world.

The larger the data set the more likely that Machine Learning could find and exploit a pattern in your data (Remember the child learning how to identify a Cat, the more pictures they are shown the more the child can confidently identify a Cat). But more fundamentally the lesson to be learned here is to make sure that key decisions made day to day concerning your data are captured also along with any data associations. If it is pertinent to your business and data storage is affordable and accessible, store it!

Finally once your data is in a shape that is ready for Machine Learning methods to be applied, this doesn’t guarantee that it will be 100% successful. It’s success hangs on there being a presence of a pattern within the data that can be learned and understood. If there is a pattern then rest assured Machine Learning applied with some good Data Science will find it and exploit it for you but the lack of a pattern will mean all bets are off.

In some cases the data patterns may be very distinguished and clear cut. In other cases you may have data that gives you an 80% successful solution and can therefore classify your data with confidence in 8 out of 10 cases, leaving you with a business process to have a manual intervention for the remaining 2. This may still be a win for your business because the resources/overheads for managing 10 vs 2 are significant and thus saving your business time and money.

This is a very important point to consider if you are thinking of investing in Machine Learning; manage your expectations and consider an initial investment in building a proof of concept / prototype first before a full blown solution with all of the bells and whistles. A prototype can be run in parallel with your current business process for a set period of time, giving you some metrics on how successful or unsuccessful it was.

This will allow you to then manage your expectations when considering the next steps towards a full blown solution.

This site uses cookies. Continue to use the site as normal if you are happy with this, or read more about cookies and how to manage them.