The Five Stages Of Data Science

1053 Words5 Pages

Introduction.

Data science is the process of transforming data into information with different levels of details depending on the audience of that data. E.g. data science includes interpreting business rules into system requirements, modelling the data for development teams.
Data scientists have these main roles: data architecture, data acquisition, data analysis, and data archiving. (Stanton, 2012: 4)
Generally speaking any information system has four main stakeholders and they are: Client (The person who’s funding the project), development team, users (usually the client’s employees), project manager, and the executive management of the developing company.
Each of the above mentioned stakeholders need to know some information about and/or …show more content…

According to (Microsoft, 6) application lifecycle has five stages, which are: requirement gathering, design, development, testing, and maintenance. Here I will be describing how data science is involved in these stages.

In system development data scientist deal with data from the earliest stages of system development i.e. analysing and converting business rules into system requirements.

As stated above that information has different levels of details for different audience, at this stage the level of detail would be broad/technical enough for system development team i.e. technical audience. This level of detail will include analysis diagrams like use case diagrams, Entity Relation Diagram, State diagram etc.

E.g. from a business rule that states that a customer can purchase item(s) data scientists can draw this UML use …show more content…

According to (Philip, 2013) data science work flow consists of four main phases which are: preparation, analysis, reflection and dissemination.

Preparation:

The preparation phase encapsulate two sub-phases in it.

Acquire data: As presented on the image above the first step on the preparation phase is to acquire data. As simple as this task sounds like in reality it can be very complex and time consuming. One of the most important things to consider when acquiring data is how reliable that data and is it up to date. In system development data scientists usually perform this task by questionnaires, interviews, watching user performing their daily tasks, observing how the previous/current system works.

Reformat and clean data: As presented on the introduction that data is usually prepared for different audiences and depending on the audience data has to be presented up to a certain degree of detail. On this stage raw data might almost be meaningless or misleading to a programmer because it would probably be missing lot of required technical details such as entities. Data scientists reformat and clean data by writing scripts and/or convert business rules to system requirements using analysis diagrams such as UML

More about The Five Stages Of Data Science

Open Document