Menu

The story of how Data Scientists came into existence

Data Scientists turn data to information and information to useful insight.

In this one, I look back and reflect on how the profession of Data Scientists came into being. All of this is based on happenings in companies that were early adopters of AI, esp analytics firms and product companies. Ok, let me get straight to it.

How it started

Well, before the arrival of Data science, IT companies were busy collecting, storing and organizing data. This was the time IT industry was booming. This was late 1990s and early 2000’s.

Besides other things, IT professionals were proficient in SQL, busy with ETL operations, working with relational databases, and large data using technologies like Teradata, then Hadoop etc. Sound familiar? That’s Ok.

Now that the companies have collected and storing a lot of data, certain smaller companies started realizing this data can be put to better use.

Banking companies were early adopters. Soon, pharmaceutical and computer companies joined in as well.

It started off quite simple.

Building dashboards in MS Excel and building macros in VBA was a thing. This was before Python and R became mainstream.

You can build dashboards in Excel spreadsheets and do ‘Analytics’ with it. That’s what people were doing before Tableau, Power BI etc came in. Then certain folks started making predictions such as who are the customers you should send campaigns to maximize your chance of sale/conversion.

Naturally, with such revenue generating ideas, companies are interested to make more profit. They started hiring statisticians to do such modeling.

With time, the need to work with larger datasets and more complex algorithms came and statisticians were expected to be more savvy with programming. So people slowly started moving from Minitab/statistica to SPSS to SAS to R Programming language then now to Python.

Statisticians would not approve

With the move to programming languages, people started developing more algorithms. Started to do more experiments. Some of theses experiments hardcore statisticians would never approve. You will never see a statistician take the average of two predictions for linear models and call it the final prediction. Many discussions arise with statistical significance. Model significance matters, but does it need to stall progress?

The newer ensemble based algorithms worked. They helped win Kaggle competitions.

With advent of more such algorithms and techniques, it did not make sense to call this new breed of professionals as ‘Statisticians’. They don’t really have a statistics degree, or think like statisticians do. For them, only results matter.

Companies had to give them designations. Some of them not so clear and proper. Such as ‘Business Analyst’, ‘Information Analyst’, ‘Data specialist’ etc. This was around the start of 2010’s.

Though the folks were doing similar jobs, their were called with different designations when they moved to different companies. To be fair, even now, there is ambiguity. By the way, certain companies call people who solely build tableau dashboards, or run only SQL queries as Data Scientist, which is not the profession I am describing here.

Back to topic, you cannot call them statisticians, but they do what statistician used to do, plus you have to be savvy with programming and ML algorithms.

It was at this time, in 2012, Thomas H. Davenport and DJ Patil wrote the famous article titled “Data Scientist: The Sexiest Job of the 21st Century”. The term just caught on since then.

They save lot of money for companies, as a result companies were willing to pay more. Far more than traditional IT jobs.

With more companies embracing Data Science, more use cases arise with that more opportunities. Unlike traditional jobs, these professionals were able to apply the ideas learnt from working with a Technology company to a Pharmaceutical company. They were not stuck with a specific domain.

What’s the pay like?

Salaries vary from country to country and city to city. The generally accepted notion is Data Scientist are high paid, slightly and sometimes significantly higher than software engineers with the same company.

Here is a detailed analysis of salaries of Data Scientists collected across multiple countries.

This job was and is still booming. Well, the natural question is, is this just a wave? Will Data Science go out of trend?

Will Data Scientists go out of trend?

Is it just another wave?

Not really. Let me tell you why and the add some perspective.

Unlike typical IT jobs, Data Scientists need to be good at math and general problem solving. They also need to be good in programming, stats, learn a math behind the algorithms, understand the business, connect the dots and tell the story.

So, hiring quality Data Scientists who are really good at the job is a challenge for companies. You need to be able to execute projects that bring value.

Second, it cannot be considered as a wave because, Govt’s are investing heavily in AI / Data Science. Colleges have degrees in Data Science, even schools have started teaching it.

My prediction is just like how the current generation was taught Trigonometry and algebra at School, there will be a time when school students will be expected to explain the math behind Random forests and Linear Regression in board exams.

A super star Data Scientist would be someone who is good with the concepts, problem solving, business understanding, stakeholder management and story telling, able to build or integrate ML to web/mobile apps, deploy the ML to cloud services, able to work with big data using Apache spark etc, know how to write optimal code, software engineering ideas like SOLID principles, good with optimization techniques, probabilistic modeling, deep learning so on and so forth.

Well, the ask is long, so is the need.

So, the point is the demand for Data Scientists is and will continue to rise and the roles will get more specialized, we are already seeing that will the bifurcation of ML Engineers, ML Ops Engineers, ML scientists etc.

I am not so sure about the supply because, DS is not everyone’s cup of tea.

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science