Convolutional Neural Networks in AI: Simplicity Leading to Sight And Hearing For Machines

Written by Scott Wilson

convolutional nets

One thing that has always been a limitation in machine learning is machine perception. Humans come with built-in sensors and hard-wired processing power that gives us a constant barrage of information about the world around us to learn from.

Machine intelligence has only had the relatively limited set of data we can provide it to learn from. One of the most capable neural network models in use today, ChatGPT, is estimated to have been trained on about 45 terabytes of data.

Researchers think that the typical human takes in about 120 gigabytes per day of sensory data, much of it through sight and sound. So that means ChatGPT has about 375 days worth of data, or essentially the experience of a one-year-old child to draw on. Of course, that’s all text, and much of it comes without context. It had to be extensively massaged to extract meaning.

Providing AI with the ability to take in and tie together the firehose of noisy real-world information is a key challenge for the industry. When it comes to audio and visual information, convolutional neural networks may be the answer.

Artificial Neural Network Meaning Shifts With Different Techniques

The answer to the question of what a neural network is exactly shifts among the many different approaches that exist to building the base of modern AI.

The range of neural networks and deep learning models popping up in AI have quickly become hard to keep straight. Some of the neural network AI that has become most common today include:

Each of these different types of neural networks have different strengths. For CNNs, the killer use case is imagery.

What Is a Convolutional Neural Network?

artificial neural networkA convolutional neural network is a type of feed forward neural network that has proven particularly useful for processing visual imagery or audio inputs. They offer a way to break down the problems of image recognition into small, simple, manageable pieces that can be used to build up a more complete picture without hand-coding complex recognition functions.

Convolution in neural networks describes the process of using a relatively simple filter to iterate through a grid of data, looking for specific features. Moving the data up through different layers of the network exposes it to filters that look for more and more complex features, building on the work of the earlier layers.

The convolution layers are alternated with pooling layers, which compress and merge features. This allows some of the small differences and noise that otherwise confuse typical neural nets.

Using convolution, neural networks can be trained faster and to work more flexibly on a wider range of data.

As it happens, this gridded approach turns out to be a good way to efficiently iterate over data where neighboring pieces of information (like the pixels that form a line or edge) have connected meaning. The convolution layers reduce the required number of neurons in the network, and tie together connected points of data.

Where Deep Neural Networks Built With Convolution Came From

Like neural networks in general, CNNs are largely inspired by the biological function of neurons in animal brains.

As early as 1960, neurologists had identified two different types of cellular receptors in the human visual cortex:

As the theory went, complex cells didn’t simply develop that more advanced ability on their own—they relied on a number of simple cells, each with different patterns, to power them.

In the same way, convolutional neural network architecture uses sets of filters with very simple pattern recognition to build an ability to recognize more complex features. They are able to do so in various areas of the image and in various orientations.

What Are Neural Network Breakthroughs That Lead to Convolutional Neural Networks?

cpu and computer chipAlexNet, the major breakthrough AI model of the modern era that crushed the competition at the 2012 ImageNet Large Scale Visual Recognition Challenge, was the CNN that kicked off the revolution in neural network machine learning for AI today.

A back propagation neural network that could be trained relatively cheaply on commercial GPU (Graphics Processing Unit) chips, AlexNet showed what could be done on even relatively inexpensive hardware with large data sets.

Since then, CNNs have been at the center of most of the biggest developments in computer vision and image recognition.

The Advantages of Convolutional Neural Networks in Processing Noisy and Variable Data

image recognition processThe major advantages of CNNs among different AI neural networks is that they can use unsupervised learning on raw imagery. The spatial and temporal relationship of information in that kind of data plays to their strengths.

This offers automated feature recognition without predefined filtering. Famously, an artificial neural network built by Google in 2012 and trained on YouTube data learned almost immediately how to recognize a cat, although no one told it that cats existed or what they looked like.

Just like humans see faces and other features in cloud formations, deep neural networks can be prone to finding the familiar even where it doesn’t exist.

CNNs are less prone to overfitting to their training data. That makes them more useful in more general purposes. They can adapt flexibly and interpret ambiguous data. For example, other kinds of neural networks, trained specifically to recognize cats, might start seeing cats in almost everything they are shown.

They also have proven to be relatively cheap to run, minimizing computation requirements. That makes for faster and more effective recognition.

Bias in Convolutional Neural Networks Isn’t What You Think

bias on wooden blocksYou might expect to see conventional neural network bias listed under disadvantages instead of advantages. But this kind of bias isn’t the sort that further marginalizes minorities or makes a CNN see cats in everything.

Instead, bias in convolutional neural networks is how filter layers are trained and optimized.

While unbiased CNNs can be more flexible and apply to a wider range of data, biases help adjust them to operation more powerfully with better recognition and identification of the data.

Convolutional Neural Networks Come With Some Limitations

server roomOf course, all good things in AI also come with some drawbacks.

The fast and efficient computation in processing comes at the expense of very expensive training costs. They take a long time and a great deal of data to train. Just finding relevant data can be a challenge.

Moreover, although they can quickly identify common features within data, in most cases it’s hard to apply that knowledge without human labeling of those features. That can also limit the utility of CNNs in some applications.

CNNs are also tough to optimize due to their very flexibility.

Finally, and in common with many other ANNs today, CNNs are a kind of black box when it comes to understanding how they work. While they are undeniably powerful and effective, a lack of transparency in how they operate can make them vulnerable to various adversarial attacks.

Convolutional Neural Networks Have Broad Applications That Impact Many Areas of AI Today

Image processing is the single biggest application for CNNs. Yet that is such an incredibly broad field, taking in so much of the data through which humans perceive the world, that it opens up huge new capabilities in various AI fields like:

All of these areas can spawn their own specialized sub-categories of CNN uses. Convolutional neural network logo recognition is used to track branding efforts by marketing departments, who can deploy it to look for their corporate logos in social media images. Radiologists make use of it in evaluating x-rays to look for abnormal growth or tumors.

It’s the technology that is powering some of the most potentially groundbreaking kinds of artificial intelligence uses being explored today.

Using Convolutional Neural Network Video Processing

facial recognition on city streetAmong those are video convolutional neural network systems.

While taking in and processing a single image can offer a wealth of information, it’s not really sufficient to open up the visual sense to AI. Motion over time is how humans experience the world. Major parts of our brain are built around deciphering trajectory and tracking movement. If AI is to be useful in similar real-world scenarios, it needs to be able to do the same.

Fortunately, another way to look at a video is simply as a series of still images. And neural network computers are fast. So it’s relatively straightforward to analyze a time-series of images to engage object recognition for video.

The bigger challenge comes with tracking objects from frame to frame and determining what they are doing. The time dimension hands a CNN more of a challenge.

This is where adding RNN and LSTM techniques comes in handy. With better handling of time-series data, recurrent neural networks are ideal for extracting motion and temporal correlation. With a CNN classifying objects in image frames, the system can take those features and hand them off to an RNN layer to tie each frame to the last… and draw conclusions about motion and action.

How Do Neural Networks Work in Convolving Audio Data?

visually impaired man using laptop and aiWhile this approach is ideal for breaking down image files, AI researchers are also finding that it works will in audio classification.

That’s because audio is pretty easy to turn into visual representations. An audio stream can be turned into a spectrogram, showing the wave motion of audio frequency and amplitude information as a still image or video.

When it comes to pattern matching, looking at such waveforms and identifying distinctions is no different than any other image. And using the same techniques of layering LSTM onto the CNN output as are done with video processing can allow AI engineers to process audio as easily as video through convolution.

Other Areas Where Convolutional Neural Network Models Can Be Applied

ai tradingAlthough crunching audio and image data is the primary use case for CNNs, they have been applied in other areas where data can be noisy and variable, yet contain deeper patterns. For example, convolutional network stock market predictions have been used to attempt more pricing accuracy in various indexes and securities.

CNNs have also been applied in natural language processing. While recurrent neural networks offer a better fit for accurate language model processing, CNNs deliver faster processing and cheaper meaning representation. While not the best fit for language generation or specific meaning, they can be powerful when used for overall sentiment analysis or in areas like spam detection.

Finally, CNNs may play a big role in additional types of machine learning in a more general sense. While ML today primarily refers to the type of adaptive algorithms that adjust based on the training data they take in and classify, it can also apply to the larger problem of developing machines that can learn more broadly from the world around them.

Clearly, visual data is hugely important in that process.

Convolutional Neural Network Definitions for the Future Are a Work in Progress

In the short term, however, CNNs have plenty of work to do in both primary image analysis, like evaluating medical imaging, and in supporting other AI efforts, like handling object detection for autonomous vehicle navigation. Optimization and new training techniques will be where most research is focused.

Defining simple neural network use cases where CNNs have value will probably be the future of the technology in the short-term.

But combinations of approaches, like graph convolutional networks, which bring the power of CNNs to neural network graphs, may boost both CNN and other AI model performance. As AI researchers put more time into integrating different neural network approaches, they are finding more ways that different ANNs reinforce one another.

There are also more complex visual use cases, like creating or interpreting three-dimensional vision, that are likely to come in the future of the technique.

Convolutional Neural Network Design Is Best Learned in Advanced AI Degree Programs

student e-learning using aiUnderstanding convolutional neural network meaning and design typically comes through in-depth educational training. Any look at the biggest breakthroughs in CNN and ANN development in general will have a long list of names with the letters “PhD” behind them. Both conceptually and mathematically, CNNs are a tough field to master.

Neural network definitions for students are nailed down through graduate-level programs in artificial intelligence and machine learning. A Master of Science in Machine Learning will take the essential undergrad math preparation from Compsci or similar degrees and turn it into applied professional skill in developing CNNs and other AI algorithms.

Because of its central role in the field of computer vision, though, you can also get a strong grounding in CNN tech through a Master of Science in Computer Vision program.

These advanced studies come with extensive coursework in:

Graduate studies in computer vision and artificial intelligence also include research projects that allow you to push forward the cutting edge of CNN applications in the field today.

And they frequently have concentration options or electives to allow you to tailor your CNN expertise toward areas like:

Similarly, certificate programs in machine learning and AI can be found to increase the skills behind CNN design if you already have a degree in a related field. These options cut away the research requirements and more generic courses for a strict focus on deep learning and neural network design.

What Is Neural Network Professional Certification?

group of students working hard on projectAnother kind of certification that can be valuable is professional certification.

Unlike educational certificates, these certs come from professional industry groups and platform vendors. Instead of offering a general education, they conduct a kind of assessment of your specific skills in the area the cert covers. That can include either the software tools used to design CNNs or various techniques used in neural network diagram development and programming.

Whether coded in Python or R, convolutional neural networks are often part of certification evaluations. Options like the Deeplearning Tensorflow Developer Professional Certificate include validation of skill in image classification modeling with deep neural networks.

Professional certificates tell potential employers that you have proven your abilities in using those tools or techniques. In a field where the latest techniques were developed only yesterday, that kind of validation is a valuable edge.

Chances are that convolutional networking design will continue to evolve. Machine learning certifications that cover CNN and other artificial neural networks will be in demand at more companies and in more and more different industries as that happens. And machines that can see the world as it is will be the next big step to more capable and more knowledgeable artificial intelligence.