A Data scientist uses Data and AI to solve business problems, is skilled at working with data, extract meaningful insights, using ML to solve business problems, build applications that make predictions and recommendations, deploy and monitor the solutions.
The perks of being a Data Scientist
Data scientist is a relatively a new profession.
By Data Scientists, I refer not just the individual job title, but the whole job family. This includes, ML Engineers, ML Scientists, Research scientists etc, who are primarily concerned with solving business problems using ML.
The job of Data Scientists is very interesting for various reasons. First, you get to work on intellectually stimulating problems that matter.
Data Scientists execute high value projects. They do this by building models that may either predict something (Eg: what product to recommend to a given user) or provide valuable insights that drive business decisions scientifically (Eg: Should I introduce this feature in product design?).
Such decisions bring value for companies.
Data scientists working for Product companies and proven your skills, will also have:
- Exposure to key / important problems the company faces.
- You are a core person in cross functional project teams.
Because the decision on what data will be needed, how much, what outputs are stored and how, how data and models should be monitored etc resides with Data Scientists, in consultation with domain SME.
- Insights you generate drive business decisions, your models drive growth.
- Work closely with stakeholders across different levels to the top decision makers.
- People look up to you for expert opinion on data driven policy decisions.
Of course this also means fast career growth. You get to work on things that matter for the company, often having a strong impact on the revenue generation, which easily justifies the large salaries Data Scientists are paid.
Companies have realized the benefits of using data and AI to drive business decisions, optimize operations and integrate AI into existing products or build new ones.
Storing large amount of data was a big deal few years back, but now with big data frameworks and cloud computing services processing petabytes of data is not uncommon.
Roles and Responsibilities of Data Scientists
Covering at an overall level, both Senior and junior level Data Scientists.
Some of the main responsibilities of Data Scientists are as follows:
1. Translate business pain points to Data Science problems
- Interact with business persons to identify problems that can be solved with ML / AI / Optimization. (Senior DS)
- Translate pain points into a quantitative Data Science problem.
- Frame the Data science problem in structured modules and phases
2. Collect, Store and prepare data
- Proficient with collecting data from various sources and automating the process.
- Processing, cleaning and validating the data for integrity
3. Exploratory Data Analysis
- Mining data for valuable insights and presenting in business friendly form.
- Identify patterns, relationships, extract insights in context
- Use statistical tests and visualizations using R / Python / Tableau etc for the analysis workflow and presentation.
- Communicate findings and opportunities with stakeholders
4. Model and solution development
- Build predictive models using ML, Stats modeling and AI. Fine tune performance
- Write production quality code, optimize code run time.
- Develop features that could enhance the model.
- Perform model validation, diagnostics, formulate business relevant metrics to judge performance.
- Use advanced modeling approaches and fine tuning to improve performance
5. Deployment and Monitoring
- Deploy ML models in production, monitor the performance, improve and fix bugs.
- Work with engineering, product architecture, Dev ops teams to deploy / integrate / productionize your application.
6. Communicate findings in business language
- Collaborate with senior executives and stakeholders to evangelize adoption.
- Set expectations on model performance, usage, next phases. etc
7. Keep yourself updated
- ML is an evolving field. More research gets published, new tools and algorithms are developed. So stay up to date on the latest developments.
What does a typical Data Scientist career path look like?
Initially Data Scientists start out into single stream, but post mid career, you may choose to be in one of the two routes:
1. Technical / Research Path
2. Managerial Path
Which one is better, it’s usually a personal preference. Financially they are equally rewarding as well as far as Data Science is concerned, because of the value they bring to the table.
Data Analyst (
$90k) –> Junior Data Scientist (
$110k) –> Data Scientist (
$125k) –> Senior Data Scientist (
$150k) –> Principal / Staff Data Scientist (
$180k) –> Chief Data Scientist (
$230k) –> CTO / CIO / CDO / CEO
Data Scientist 1 –> Data Scientist 2 –> Data Scientist 3 –> Data Scientist 4 –> CTO or CIO or CDO –> CEO
Companies give various titles as per their organizational policies.
Sometimes junior positions can start directly as Junior Data Scientist, ML Engineer, Quantitative Analyst etc. That is, Data Analyst may be skipped. Also, mid/senior positions can have other titles such as ‘Research Scientist’, ‘Applied ML Scientist’ etc.
Junior Data Scientist –> Data Scientist –> Senior Data Scientist –> Data Science manager (
$175k) -> Senior Data Science Manager –> Director of Data Science (
$200k) –> VP, Data Science (
Data Scientists skill sets
The general ask of Data Scientist skill sets from companies standpoint is as follows.
- Proficient in one or more of the following: Machine learning algorithms, Statistical modeling techniques, Deep learning, Optimization techniques.
- Strong Math skills and quantitative aptitude
- Excellent communications and presentation skills
- Problem solving aptitude
- Experienced building and deploying ML solutions.
- Knowledge of CI/CD processes, dockerization, creating REST APIs of ML models.
Data Scientists in general are also expected to be savvy with:
- Programming languages (Python / R / SQL)
- Machine learning (In-depth understanding of Concepts, Algorithms, Deep learning, Statistics)
- Business Understanding
- Working with Big Data
- ML Ops (Model deployment and monitoring)
- Presenting results, communicating with multiple stakeholders, business acumen, Domain expertise
What kind of degree and background is needed?
Typically Bachelors / Masters / PhD in quantitative discipline (Engineering, Statistics, Operations research, Computer science, Mathematics, Physics, Economics etc).
Quantitative discipline is preferred to make sure candidates have the exposure and aptitude. However, there are several cases where Science and commerce graduates have take up Data Science. Education is usually not a barrier for able candidates.
What knowledge is required?
Knowledge of ML algorithms, Probability and Statistics and Deep learning are the main areas for Data Scientists.
However, what you should also know is, not all Data Scientists in industry need to be proficient in all of the areas. These things are highly team and projects specific.
There are teams that don’t use deep learning at all to solve problems, and there are teams that use Deep Learning for everything. Not all projects get deployed in a AWS or equivalent cloud infrastructure as well.
What are non-negotiable skills for Data Scientists?
- The business understanding of the problem
- Hands-on coding ability in Python or R, knowledge of core ML algorithms and stats concepts.
Being able to collaborate with stakeholders and convey your results and findings effectively will help you grow faster in career.
What does a typical day look like?
This again can vary with company culture, teams, projects and seniority levels. But there are common themes that Data Scientists involve in.
- Attend daily call (scrum) with your immediate project members to plan your activities.
Have weekly calls with your key client stakeholder to catch up and the review the progress. Make sure everything is on track, share insights from the modeling and data analysis, discuss roadblocks and potential risks that you foresee, discuss what other problems they face and how Data Science can help etc.
Write SQL / PySpark / Python code to gather data from one or several data sources. Map the data together in logical fashion, create data pipelines with tools like AirFlow, Dagster if needed.
Process, cleanse, transform data and store in appropriate place (relational DB, AWS S3, On-premise computer, local etc) for reuse.
Deep dive and analyze the data to find patterns and extract insights.
Build ML / Stats / Deep Learning models to make predictions, validate the models, prepare results in presentable format.
Attend to ad-hoc requests like a quick win projects, AB tests design and analysis, building baseline models, PoCs and feasibility studies, checking data availability, model re-training, enhancements, production bug fixes in data / model pipelines etc.
Interact with your team members to share project knowledge, gather ideas and brainstorm.
Interact, take sessions for non-Data science members on use cases that can be implemented in their functions.
Have monthly / quarterly calls with senior leaders to catch up, discuss overall progress and goals, strategies for growth and new initiatives you / your team can take up.
What companies do they work for?
Data Scientists work in all sorts of organizations from Startups to Consulting firms to Product companies.
They are present in all industries as well, from manufacturing, healthcare, e-Commerce, retailers, automotive, FMCG, finance and banking, political parties, United Nations and Government organizations.
Do Data Scientists spend 80% of time cleaning data?
This may be true since I’ve seen this complaint too often, but working that way is highly inefficient.
There can be various reasons: you are probably understaffed (have a dedicated Data Engineer on the team), your company has poor data infrastructure, data is highly unorganized, scattered cross various systems or not present that you have to scrape/acquire it from other sources.
Or you simply you need to introspect and learn how to run a DS project efficiently.
Do you want to become a Data Scientist? ML+ offers comprehensive ML Mastery learning path. I’ve designed this for optimal learning, which you can complete in about 6 moths time. Start taking the courses in sequence as per the learning path, my team and I are there to support you and clear all your doubts.