- Home
- Interview Questions
- Data Science
Understand the business problem
- Explore the data and become familiar with it.
- Prepare the data for modelling by detecting outliers, treating missingvalues, transforming variables, etc.
- After data preparation, start running the model, analyse the result andtweak the approach. This is an iterative step till the best possible outcome isachieved.
- Validate the model using a new data set.
- Start implementing the model and track the result to analyse theperformance of the model over the period of time.
This can be done using the enumerate function which takes every element in asequence just like in a list and adds its location just before it.
The extent of the missing values is identified after identifying the variables withmissing values. If any patterns are identified the analyst has to concentrate onthem as it could lead to interesting and meaningful business insights. If there areno patterns identified, then the missing values can be substituted with mean ormedian values (imputation) or they can simply be ignored.There are variousfactors to be considered when answering this question-
- Understand the problem statement, understand the data and then give theanswer.Assigning a default value which can be mean, minimum or maximumvalue. Getting into the data is important.
- If it is a categorical variable, the default value is assigned. The missing valueis assigned a default value.
- If you have a distribution of data coming, for normal distribution give themean value.
- Should we even treat missing values is another important point to consider? If80% of the values for a variable are missing then you can answer that youwould be dropping the variable instead of treating the missing values.
For some reason or the other, the response variable for a regression analysismight not satisfy one or more assumptions of an ordinary least squaresregression. The residuals could either curve as the prediction increases or followskewed distribution. In such scenarios, it is necessary to transform the responsevariable so that the data meets the required assumptions. A Box coxtransformation is a statistical technique to transform non-mornla dependentvariables into a normal shape. If the given data is not normal then most of thestatistical techniques assume normality. Applying a box cox transformationmeans that you can run a broader number of tests.
Yes, it can be used but it depends on the applications.