
R.P.TECHNOLOGY PVT LTD
R.P.Technology PVT LTD -Digital marketing company in India, India offers best digital marketing services like SEO, PPC, SMO, ORM, design/development. & business rapidly. Call @ 8308958903
What is a Data Scientist 2022 What do data scientists do? According to interviews with more than 30 data scientists, data science is about infrastructure, testing, using machine learning for decision making, and data products. Data science is being used in numerous fields, but it’s not all about deep learning or the search for artificial general intelligence.
In fact, the skills needed include communication and storytelling. But data science is becoming more specialized, and with that the skills data scientists need are evolving. In addition, ethics is becoming a bigger and bigger challenge.
Modern data science emerged in tech, from optimizing Google search rankings and LinkedIn recommendations to influencing the headlines Buzzfeed editors run. But it’s poised to transform all sectors, from retail, telecommunications, and agriculture to health, trucking, and the penal system. Yet the terms “data science” and “data scientist” aren’t always easily understood, and are used to describe a wide range of data-related work.
Data scientists work closely with business stakeholders to understand their goals and determine how data can be used to achieve those goals. They design data modeling processes, create algorithms and predictive models to extract the data the business needs, and help analyze the data and share insights with peers. While each project is different, the process for gathering and analyzing data generally follows the below path:
1. Ask the right questions to begin the discovery process
2. Acquire data
3. Process and clean the data
4. Integrate and store data
5. Initial data investigation and exploratory data analysis
6. Choose one or more potential models and algorithms
7. Apply data science techniques, such as machine learning, statistical modeling, and artificial intelligence
8. Measure and improve results
9. Present final result to stakeholders
10. Make adjustments based on feedback
11. Repeat the process to solve a new problem
The most common careers in data science include the following roles.
Although the roles of data scientists and data analysts are often conflated, their responsibilities are actually quite different. Put simply, data scientists develop processes for modeling data while data analysts examine data sets to identify trends and draw conclusions. Because of this distinction and the more technical nature of data science, the role of a data scientist is often considered to be more senior than that of a data analyst; however, both positions may be attainable with similar educational backgrounds.
Learn More: Data Scientist vs. Data Analyst
By many accounts, becoming a data scientist is a highly desirable career path. For five years in a row, Glassdoor ranked data scientists as one of the 10 best jobs in America, based on median base salary, the number of active job openings, and employee satisfaction rates. Likewise, Harvard Business Review called data science “the sexiest job of the 21st century,” noting that “high-ranking professionals with the training and curiosity to make discoveries in the world of big data” are in major demand.
From startups to Fortune 500s to government agencies, organizations are seeing the value in capitalizing on big data. Google’s Chief Economist Hal Varian spoke about the need for data scientists back in 2009, telling McKinsey Quarterly, “the ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades.”
This prediction proved prescient. A report by LinkedIn ranked data science as one of the top emerging jobs in 2020.
The United States Bureau of Labor Statistics concurs, stating that employment of all computer and information research scientists is expected to rise 16 percent by 2028—a rate of increase that exceeds many other professions. Yet, data scientists are relatively scarce, meaning it’s now an opportune time to upskill and enter the field.
According to Robert Half Technology’s 2020 Salary Guide, data scientists earn an average annual salary between $105,750 and $180,250 per year. However, compensation can vary depending on location. For example, average salaries in cities across the United States include:
Additionally, as data scientists gain experience, they often move into more senior positions with higher pay. These include:
Most data scientists use the following core skills in their daily work:
Data scientists play a key role in helping organizations make sound decisions. As such, they need “soft skills” in the following areas.
Most employers look for data science professionals with advanced degrees, such as a Master of Science in Data Science. Candidates for data science roles usually begin with a foundation in computer science or math and build on this with a master’s degree in data science, data analytics, or a related field.
In these graduate-level programs, professionals gain core competencies in skills such as predictive analytics, statistical modeling, big data, data mining applications, enterprise analytics, data-driven decision making, data visualization, and data storytelling.
The Master of Science in Data Science program at Northeastern University, for example, is an interdisciplinary program of study which combines courses from the Khoury College of Computer Sciences and the College of Engineering to provide students with comprehensive frameworks for processing, modeling, analyzing, and drawing conclusions from data. Additionally, Northeastern’s industry-aligned faculty bring their experiences from the field to the classroom, allowing students to gain first-hand knowledge of the top issues facing big data.
Alternatively, some students may find that a degree in data analytics is better suited to their career goals. Studying data analytics teaches students how to employ statistics, analytics systems technology, and business intelligence to achieve specific goals. With this foundational knowledge, students discover how to find a logical, data-driven path to resolving a complex problem. They also learn how to overcome data obstacles, such as dealing with uncertain data sets and reconciling data from disparate sources.
The Master of Professional Studies in Analytics program at Northeastern University prepares students by applying the principles, tools, and methods of analytics to a project within a sponsoring organization. Graduates complete the program with a core analytical skillset upon which to layer more specialized technical or industry-specific applications. Experiential learning is a key component of the program. Students learn by building portfolios of real-world projects, demonstrating competency with key technologies, visualization, and communication techniques, and the ability to translate information into recommended actions.
What, exactly, is it that data scientists do? As the host of the DataCamp podcast DataFramed, I have had the pleasure of speaking with over 30 data scientists across a wide array of industries and academic disciplines. Among other things, I’ve asked them about what their jobs entail.
It’s true that data science is a varied field. The data scientists I’ve interviewed approach our conversations from many angles. They describe a wide range of work, including the massive online experimental frameworks for product development at booking.com and Etsy, the methods Buzzfeed uses to implement a multi-armed bandit solution for headline optimization, and the impact machine learning has on business decisions at Airbnb.
That last example came during my conversation with Airbnb data scientist Robert Chang. When Chang was at Twitter, that company was focused on growth. Now that he’s at Airbnb, Chang works on productionized machine-learning models. Data science can be used in a number of different ways, depending not just on the industry but on the business and its goals.
But despite all the variety, a number of themes have emerged from these conversations. Here’s what they are:
What data scientists do. We now know how data science works, at least in the tech industry. First, data scientists lay a solid data foundation in order to perform robust analytics. Then they use online experiments, among other methods, to achieve sustainable growth. Finally, they build machine learning pipelines and personalized data products to better understand their business and customers and to make better decisions. In other words, in tech, data science is about infrastructure, testing, machine learning for decision making, and data products.
Great strides are being made in industries other than tech. I spoke with Ben Skrainka, a data scientist at Convoy, about how that company is leveraging data science to revolutionize the North American trucking industry. Sandy Griffith of Flatiron Health told us about the impact data science has begun to have on cancer research. Drew Conway and I discussed his company Alluvium, which “uses machine learning and artificial intelligence to turn massive data streams produced by industrial operations into insights.” Mike Tamir, now head of self-driving at Uber, discussed working with Takt to facilitate Fortune 500 companies’ leveraging data science, including his work on Starbucks’ recommendation systems. This non-exhaustive list illustrates data-science revolutions across a multitude of verticals.
It isn’t all just the promise of self-driving cars and artificial general intelligence. Many of my guests are skeptical not only of the fetishization of artificial general intelligence by the mainstream media (including headlines such as VentureBeat’s “An AI god will emerge by 2042 and write its own bible.
Will you worship it?”), but also of the buzz around machine learning and deep learning. Sure, machine learning and deep learning are powerful techniques with important applications, but, as with all buzz terms, a healthy skepticism is in order.
Nearly all of my guests understand that working data scientists make their daily bread and butter through data collection and data cleaning; building dashboards and reports; data visualization; statistical inference; communicating results to key stakeholders; and convincing decision makers of their results.
The skills data scientists need are evolving (and experience with deep learning isn’t the most important one). In a conversation with Jonathan Nolis, a data science leader in the Seattle area who helps Fortune 500 companies, we posed the question, “Which skill is more important for a data scientist: the ability to use the most sophisticated deep learning models, or the ability to make good PowerPoint slides?” He made a case for the latter, since communicating results remains a critical part of data work.
Another recurring theme is that these skills, so necessary today, are likely to change on a relatively short timescale. As we’re seeing rapid developments in both the open-source ecosystem of tools available to do data science and in the commercial, productized data-science tools, we’re also seeing increasing automation of a lot of data-science drudgery, such as data cleaning and data preparation. It has been a common trope that 80% of a data scientist’s valuable time is spent simply finding, cleaning, and organizing data, leaving only 20% to actually perform analysis.
But this is unlikely to last. These days even a great deal of machine learning and deep learning is being automated, as we learned when we dedicated an episode to automated machine learning, and heard from Randal Olson, lead data scientist at Life Epigenetics.
One result of this rapid change is that the vast majority of my guests tell us that the key skills for data scientists are not the abilities to build and use deep-learning infrastructures. Instead they are the abilities to learn on the fly and to communicate well in order to answer business questions, explaining complex results to nontechnical stakeholders. Aspiring data scientists, then, should focus less on techniques than on questions. New techniques come and go, but critical thinking and quantitative, domain-specific skills will remain in demand.
Specialization is becoming more important. While there is no well-defined career path for data scientists, and little support for junior data scientists, we are starting to see some forms of specialization. Emily Robinson described the difference between Type A and Type B data scientists: “Type A is the analysis — sort of a traditional statistician — and Type B is building machine learning models.”
Jonathan Nolis breaks data science down into three components: (1) business intelligence, which is essentially about “taking data that the company has and getting it in front of the right people” in the form of dashboards, reports, and emails; (2) decision science, which is about “taking data and using it to help a company make a decision”; and (3) machine learning, which is about “how can we take data science models and put them continuously into production.” Although many working data scientists are currently generalists and do all three, we are seeing distinct career paths emerging, as in the case of machine learning engineers.
Ethics is among the field’s biggest challenges. You may gather that the profession offers its practitioners a great deal of uncertainty. When I asked Hilary Mason in our first episode if any other major challenges face the data science community, she said, “Do you think that imprecise ethics, no standards of practice, and a lack of consistent vocabulary are not enough challenges for us today?”
All three are essential points, and the first two in particular are front of mind for nearly every DataFramed guest. At a time when so many of our interactions with the world are dictated by algorithms developed by data scientists, what role does ethics play? As Omoju Miller, the senior machine learning data scientist at GitHub, said in our interview:
We need to have that ethical understanding, we need to have that training, and we need to have something akin to a Hippocratic oath. And we need to actually have proper licenses so that if you actually do something unethical, perhaps you have some kind of penalty, or disbarment, or some kind of recourse, something to say this is not what we want to do as an industry, and then figure out ways to remediate people who go off the rails and do things because people just aren’t trained and they don’t know.
A recurring theme is the serious, harmful, and unethical consequences that data science can have, such as the COMPAS Recidivism Risk Score that has been “used across the country to predict future criminals” and is “biased against blacks,” according to ProPublica.
We’re approaching a consensus that ethical standards need to come from within data science itself, as well as from legislators, grassroots movements, and other stakeholders. Part of this movement involves a reemphasis on interpretability in models, as opposed to black-box models.
That is, we need to build models that can explain why they make the predictions they make. Deep learning models are great at a lot of things, but they are infamously uninterpretable. Many dedicated, intelligent researchers, developers, and data scientists are making headway here with work such as Lime, a project aimed at explaining what machine learning models are doing.
The data science revolution across industries and society at large has just begun. Whether the title of data scientist will remain the “sexiest job of the 21st century,” will become more specialized, or will become a set of skills that most working professionals are simply required to have is unclear. As Hilary Mason told me: “Will we even have data science in 10 years? I remember a world where we didn’t, and it wouldn’t surprise me if the title goes the way of ‘webmaster.’”