Automotive Artificial Neural Networks - Part 1 (Primer)
A Primer On Artificial Neural Networks
There has been some discussion on the forum of Artificial Neural Networks (ANN), and their role in the automotive industry (specifically Tesla's fault prediction, though there was mention of Ford attempting using ANN on V10 for misfire monitor, and HVAC systems). The goal here is to break the concept down into something manageable that enables discussion, debate, and education. We will start with a brief primer on what ANN's are, basic operation, and general usage. Mathematical references will be kept to an absolute minimum and some of the more complex concepts will be simplified. This is a topic where trying to go in depth will quickly lead down the rabbit hole, and that's for each person to decide for themselves. I have done my absolute best to remain accurate without getting bogged down in intricate details. I have plenty of resources I would be willing to share.
The goal of any neural network is basically to take a complex data set, say all the inputs/readings required to run a modern day internal combustion engine, and find non-linear (non-obvious) relationships between them. This ultimately allows one to achieve their end goals: more accurate misfire monitoring, better fuel/air control, better fuel economy with increased power and so on.
- Algorithm - A set of rules for solving a given type of problem
- Machine Learning - study of statistical modeling and algorithms that allow a computer to achieve a given task/outcome without needing step-by-step instructions
- Training Data - mathematical modeling (think equation or matrices) of the data one wishes to work with
What is an ANN? Artificial Neural Networks are computing systems that utilize a combination of machine learning algorithms in order to achieve a desired result. These systems (programs) essentially learn in a rough approximation of the neural network in a human brain.
The system is made up of nodes (similar to neurons in the brain), which are connected by edges (much like the synapse firing events that allow us to pass information from one neuron to the next). These edges connect all the nodes to each other, either directly or indirectly through layers. This is all laid out abstractly (in a sense) within code/software. It is not physical traces on a circuit board. A very simply ANN consists of an Input layer, hidden layer(s) (number of these is based on end goal of network), and finally our output layer.
Each layer of nodes generally performs a different mathematical function: an input function, activation function/transfer function, and finally an output function. These are simply different levels of processing. These functions are needed to take our input and calculate our output data.
For our purposes let us use some automotive inputs. We will use: CKP, CMP, TPS, and Load. These are all familiar data PIDs and when we see them (or anything else) our brains go through a process similar to what we may observe in an ANN. The input layer would be our initial viewing of these data PIDs. We see them, but they don't instantaneously mean anything. The input gets sent along the connecting edges to the next node. This is where it can get a bit confusing. What exactly happens in these hidden layers is the real fascinating part of ANNs.
Say we have a MAF reading input of 6.35 g/s at idle, we recognize it as a number, but then we mentally calculate if it is good or bad based on previous experience, engine type, condition, system, etc. Then we may filter through another layer of data (hidden layer), this is where we would factor in other faults that could give us this reading (be it correct or incorrect for the given scenario). Finally we come to an output (our diagnosis). This changes over time as we accumulate more data, experience, knowledge, and incorrect diagnosis. All this happens in our brains so fast we rarely realize it is happening.
A neural network essentially follows the same process, though think of it starting out as a child and growing into an adult as it is trained on our training data. Just like all of us as kids, ANNs need to experience an event or data, make decisions/judgments, and then output something. When we first see a car as a child. we have a "car shaped" neuron (node) that sends a signal across a synapse (edge) to another node. Maybe that next node is a color node. This would be the first of our "hidden layers". These layers helps us go from car (a generalized shape that is recognizable as a car), to a "blue" car (this could be refined into powder puff blue as we get more "training data"), to a car that is blue and has big tail fins (our output). Maybe the first time we put these together we think the mass tail fins are from a 1974 Yugo, which would be terribly wrong. How do we know it's wrong? We are told so. Hopefully we can all see where this is going.
Instead of just one data point being interpreted at a time, ANN's interpret many inputs and take their decisions from each node and send it to the next. The decision to which node to send it to is based off the weights/biases of each edge. This is represented by a number. The bigger the number, the strong the association (generally). So the node is more likely to go to another node that has a higher number (weight) based off a given input. Think of the fact that we are much more likely to associate a bow-tie shape with Chevrolet than to associate it with Triumph. This is all learned from data and experience, and someone correcting us if we get it wrong.
How do we teach an ANN that it is right or wrong? Well there are countless different methods. The general types would be supervised, unsupervised, or reinforcement training to simplify things.
Supervised ANN Training: ANN receives both the inputs and desired outputs as part of their training set. They process the inputs through whatever network was designed. They then look at their output and compare it against the desired output result. If they are right, the edges (connections) that got them to that result are weighted heavier. If they are wrong, they take that information and using back-propagation, change the connections they used to make them less likely to be used again to achieve the desired output. Therefore it is able to learn based off a desired output. There must be a rather large set of training data in order to prevent "over-fitting". Over-fitting is when we have very little data in the network and force our desired output (making our test results, or lack there of, fit whatever silver bullet diagnosis we have already decided on). This is bad in all aspects of life and neural networks.
Unsupervised Training: ANNs receive only the inputs and it makes it's own determination how to group the input data, and filter it through it's various layers to achieve outputs. This is like a kid growing up and never having any adult input in their experiences/educations. Things can be grouped according to attributes (furry, soft, loud, pointy ears, big tail fins, the list goes on). The outputs are not always expected and sometimes surprising. There are various methods (such as back-propagation) that can still be applied to networks that learn this way. These are still not understood in depth, but essentially would allow a network to continually learn new things as it encounters new environments, data, etc.
Reinforcement Training: The system is penalized when it generates the wrong outputs and gets rewarded when it generates the desired inputs. How one rewards or penalizes a computer system is beyond my understanding unfortunately. I think this Pac-Man picture got the point across for me though.
There are a variety of other methods. One particularly interesting one is GANN - generative adversarial neural networks. These systems actual pit two (or more) neural networks against each other in order to more quickly get desired outcomes while also reducing incorrect data.
Play with a neural network here, may make things more concrete.
More parts to follow if the interest is there. The next post will be about the history of neural nets (not all the way back the 1950's with their origin), but more late 80's into the early 90's and various OEM's experiments with them up to and including the modern day. The final post will be the future and the amazing applications these networks have for both the industry and diagnosticians.
See Part 2 Here
See Part 3 Here
I am doing my best to resist editing a bunch of times after the post (it could always be better). If there is incorrect, misleading, or confusing information just let me know in the comments and I'll fix or clarify as needed. Otherwise I'll never let myself post anything until I think it is a doctorate level dissertation. Hopefully the format is easy enough to digest and that this brings out
Good job so far, Chris. Looking forward to the “OEM experiments” part.
Chris, I find this stuff fascinating but I sometimes shudder at the thought that we are building our future masters. Take the unsupervised training model for example. You compared it to a kid growing up and essentially learning on it's own. For humans the process takes years to become an autonomous functioning person and even then, we need many more years to learn and improve over our
Bob, You are correct on many points. The learning speed is fast, and we have already run into instances where we cant understand what a neural net is doing. I think that is partially to our benefit. We can use that, tweak it. Put inputs in or take them out and see what happens. To me it is really not that different with what we do on a daily basis when service information is lacking and we
To anyone watching this thread, I just uploaded part 2 of the series. Just in case you miss it in the feed. The topic isn't generating as much discussion or debate as I thought it would, though perhaps I have it tagged or grouped incorrectly.