Written by Scott Wilson
It wasn’t very long ago that data science was the hot new field in information technology.
Taking large amounts of disparate information in – and then using algorithmic and computational processes to discern hidden information within it seemed like the golden skill to have in the 21st Century. In a world overwhelmed by data, data science offered the key to making sense of and using it for good.
Then AI came along, and all the heads turned.
But if AI is the new golden boy in the press and corporate world, under the hood it has an intimate and important relationship with data science. That fact makes data science a natural career path for those interested in marshalling the full potential of artificial intelligence… It also means that an education in data science can also take you a long way in artificial intelligence engineering.
Data Science May be the Key to Realizing True Artificial Intelligence
If you go back to that thumbnail definition of data science above, you can broaden it just a bit to see how it is also the key to achieving useful artificial intelligence.
“Taking large amounts of disparate information in – and then using algorithmic and computational processes to discern hidden information within it.” This describes exactly what computers need to be able to do to achieve intelligent behavior.
Put yourself in the shoes of the machine. The on switch is flipped, and there you are. But you’re arriving with nothing but a few blocks of predefined code… mostly quite shoddy, as it was written by humans.
To improve that code, to interact with sensors you have been provided, to process information in any format, and to produce results… you need to be able to unpack it. Analyze it. Discern trends and make predictions.
If that all sounds familiar to shop talk, that’s because it’s the same language used to describe the field of data science for the past decade or more.
Data Science Is Already Behind Some of the Biggest Breakthroughs in Modern Artificial Intelligence
It’s fair to say that without data science, we wouldn’t have artificial intelligence as we know it today.
Data science has been integral in developing some of the tools and techniques that have been used to create the biggest breakthroughs in artificial intelligence to date.
The most well-developed AI tools we have now are based on statistical models. The way those tools are created relies on probabilities. That is, they derive their understanding from the repeated, multi-layer analysis of existing information through machine learning.
This is a shift from older models, which relied on discrete calculation. Computers have always been particularly good at mathematical and logical operations. But the reasoning skills that humans use in problem-solving and inference aren’t so clear-cut. To tap into similar abilities, computers have to get away from binary yes/no models and start dealing in probabilities.
By reviewing enough pictures of, say, a human face, or enough pages of written English, such algorithms can train themselves to recognize what makes a face a face, or a sentence a sentence. They can use patterns detected in that data to either classify it, or to generate apparently similar content… generative AI.
Generative AI Processors Are Hungry for Big Data
But it takes a LOT of data. According to the BBC, the GPT 4 model was trained using some 570 gigabytes of text, or around 300 billion words. Moreover, the attributes of that data are important—how it is organized and classified initially, how it is stored and fed to the algorithm, can impact the quality of the output.
Data science developed the storage and retrieval mechanisms, like NoSQL, to hold that data. It created the algorithms used to process it. And it’s been the driving force behind the development of specialized chips and languages, like R, optimized for crunching all that information.
Data scientists are the professionals who have wrangled that data for AI research teams. And the tools and techniques developed by data science are what power the creation of AI tools. For example, while machine learning (ML) was a tool originally developed for AI research, it was taken in and honed to the current state of the art in the service of categorizing and analyzing Big Data.
Now ML is critical in putting together popular AI systems like natural language processors and generative transformers.
Data Science Will Be Propelled by AI Tools as the Field Matures
This is not just a one-way street, though. The data science field itself benefits enormously from the development of artificial intelligence.
Popular data science tools, for example, have been created in the pursuit of AI. Machine learning, noted above as a training tool for AI, is also widely used in data science circles for other purposes. It’s a core part of the curriculum in pretty much every data science degree program offered today. It’s one of the reasons that the field has found success in everything from marketing to medical diagnostics.
When you are looking at the latest Netflix recommendations on your account, you’re seeing the fruits of data science machine learning analysis.
AI also has the potential to do the same things in data science that it might do in many other fields: automate low-level, repetitive tasks to free up data scientists to pursue more advanced or interesting work. AI may be able to synthesize large, realistic data sets for testing purposes, handling basic data cleaning, and otherwise taking care of laborious pre-processing.
And AI breakthroughs have data scientists salivating at the possibility of new tools for evaluating data. While today, data scientists have to hand-code many of their tests and analysis in R or Python, AI is being adapted to deliver no-code ML model training.
For data scientists who aren’t looking to take their hands entirely off the wheel yet, tools like Copilot and Codex bring the best of both worlds to data science coding by offering AI-assisted programming.
It’s likely, in fact, that data science and artificial intelligence are engaging in an accelerating positive feedback loop. Advances in data science will boost AI projects, which in turn will move the field of data science forward faster.
Teaching AI to Tell the Truth Is the Next Big Step for Data Scientists
One challenge that data scientists will need to overcome in current iterations of artificial intelligence tools is that no one can offer a provable explanation of how they arrive at their conclusions.
In traditional data science, with analytical code developed by humans, or even simple ML algorithms, this isn’t a problem. The code is linear, the results are reproducible, and with enough effort, even the most complex analytical process can be broken down into parts that make it understandable to humans. Biases can be detected and the reasons for them found and corrected.
Advanced generative AI doesn’t include that feature. The magic that happens as data courses through multilayer neural networks isn’t discretely traceable. The output of such models can be verified statistically… but any specific result may not be.
Dubbed hallucinations by researchers, these have become a much-reported feature of AI chatbots. While they can be offensive or contain misinformation, they are nothing more than objects of curiosity.
For data scientists seeking deep answers in large datasets, however, they are a deal breaker. Any possibility that incorrect or invented answers may come out of their product is the kiss of death.
Until the hallucination problem is kicked, data scientists are effectively locked out of some of the most cutting-edge AI in use today.
The Average Data Scientist Is Already Halfway to Becoming an AI Engineer
While data science is often said to live at the intersection between statistical mathematics and information processing, artificial intelligence may be a field that is a blend of data science and software engineering.
That gives data scientists a real edge in putting together a career in AI. Since they typically have experience, or at least an education in coding, that’s both halves of the basic skills needed to engage in AI research and development.
In both cases, the challenges and complexity leave no room for lightweights. Getting the intellectual heft you need to succeed almost always requires a graduate degree. But if you took a marker to the titles in the course catalogs, you might have a tough time telling which master’s degree was which. Shared prerequisites in data structures, calculus, and statistics are needed for both paths.
A closer look at the path that data science majors and AI engineers go through to develop those skills shows the similarities.
College Degrees in Both AI and Data Science Cover Much of the Same Coursework
Particularly in these early years of AI engineering, there aren’t hard and fast bounds between job titles, responsibilities, or degree programs feeding these fields. Like any new career path, the people forging ahead are coming from other, existing fields first… and data science is one of the biggest contributors.
So an education in data science offers many of the kinds of skills and background knowledge needed to flourish in AI engineering.
It’s entirely possible to earn an advanced degree in artificial intelligence and apply the same learning and skills to a career in data science.
Those include:
- Qualitative and quantitative statistics
- Programming
- Math and algorithms
- Principles of machine learning
- Modeling
The inquisitive mindset that a data science degree instills in graduates is just as critical as those foundational skills. In many fields, research stops at the university gates, but in data science you’re expected to continue developing small experiments and investigative techniques on the job. It’s all part of uncovering the mysteries of the data or developing game-changing analytical efficiencies to put your company on top.
That’s equally important in AI. New developments come almost daily. And the fierce competition to develop new tools and techniques means that every company is looking for engineers who will continue to innovate on the job.
There are differences, of course. Most AI engineers will have more of a focus on programming. They may also need greater familiarity with hardware and sensors, dealing more closely with the inputs that data comes from. And considering the nature of intelligence, they might also develop a better understanding in fields like formal logic or even psychology to help inform their efforts.
The Unique Emphasis on Ethics in AI Degree Programs
Another feature in modern data science education that will find favor in AI circles is slightly surprising: ethics.
As data science has been forced to deal with issues like overfitting, working with imperfect data, algorithmic bias, and abuses of personal information, public and government pressure has forced it to confront both legal and moral responsibilities. You can see the effect of that pressure in any data science degree curriculum. Pretty much every school offers one or more courses in the ethical collection and use of data; in many programs, at least one of those courses is required.
AI has also been facing a firestorm of controversy over the use of copyrighted data used to train AI, similar algorithmic bias, and thorny issues of propriety. On top of the mundane, there’s also considerable debate over the ethics and existential dangers of work that could lead to an artificial super-intelligence.
It’s a set of problems that keep philosophy professors up at night. The kind of ethical training that is now common in data science seems likely to soon be needed among AI professionals as well.
AI and Data Science Can Be a Choose-Your-Own-Adventure Option in the Job Market
When you have that much alignment between educational paths and core skills, you won’t find much difference in workplace requirements, either.
In fact, when you watch data scientists and AI engineers at work, you might be hard pressed to tell the difference.
Because data science is already an interdisciplinary field, it’s broad enough to make it easy to tune to AI engineering challenges.
Breaking down the problem set facing AI engineering teams in general, you’ll find a lot of work that falls right into the category of data science:
- Parsing and managing very large datasets
- Creating statistical models of real-world scenarios
- Breaking down real-time data input streams for natural language or visual analysis
And most AI engineering graduates will themselves have the toolsets to take on many problems in data science roles, such as:
- Probability and statistical modeling
- Machine learning algorithm development and applications
- Python programming skills
- Crafting Deep Learning models and adapting them to real-world questions and scenarios
With that kind of crossover skillset, it can be easy to pick up jobs on both sides of the table. In fact, there are plenty of positions that blend both roles in a sort of hybrid. And, let’s face it, when you are on the cutting edge you frequently have the choice to create your own job description. First-rate data scientists and AI engineers gravitate toward the challenges and projects that interest them… they don’t pay much attention to how they’re labeled.
Data Science Jobs Could Be the Perfect Holding Pattern for Tomorrow’s AI Engineers
On the other hand, as AI has come to dominate attention in computer science circles and in the media, data science remains the proven performer in government and industry.
Open the job listings and you’ll see the difference quickly: there are far more open positions in data science than artificial intelligence. Data science has a track record and tangible benefits that corporate executives have come to rely on. While almost everyone is excited and optimistic about the potential of AI, many employers are continuing to invest in data science teams even as they embrace AI R&D.
But considering the similarities in education and skillsets, this too is a benefit. If AI is where you want to focus, but the jobs aren’t quite there yet, it’s entirely possible to shop your resume for data science positions to pay the bills instead. And since the kinds of projects you’re likely to work on will still be AI-adjacent, you’ll be keeping your skills honed for a jump to that field when demand catches up.
No one can see the future of data science or artificial intelligence. It seems entirely likely that the two fields may converge into one over time, anyway. But one thing is certain: it will be the talented individuals who develop expertise in both areas that help create that future.