- Home
- Interview Questions
- Data Science
Univariate analyses are descriptive statistical analysis techniques which can be
differentiated based on the number of variables involved at a given point of time.
For example, the pie charts of sales based on territory involve only one variable
and can the analysis can be referred to as univariate analysis.
The bivariate analysis attempts to understand the difference between two
variables at a time as in a scatterplot. For example, analyzing the volume of sale
and spending can be considered as an example of bivariate analysis.
Multivariate analysis deals with the study of more than two variables to
understand the effect of variables on the responses.
Cluster sampling is a technique used when it becomes difficult to study the target
population spread across a wide area and simple random sampling cannot be
applied. Cluster Sample is a probability sample where each sampling unit is a
collection or cluster of elements.
For eg., A researcher wants to survey the academic performance of high school
students in Japan. He can divide the entire population of Japan into different
clusters (cities). Then the researcher selects a number of clusters depending on
his research through simple or systematic random sampling.
Let’s continue our Data Science Interview Questions blog with some more
statistics questions.
Systematic sampling is a statistical technique where elements are selected from an ordered sampling frame. In systematic sampling, the list is progressed in a circular manner so once you reach the end of the list, it is progressed from the top again. The best example of systematic sampling is equal probability method.
Eigenvectors are used for understanding linear transformations. In data analysis, we usually calculate the eigenvectors for a correlation or covariance matrix. Eigenvectors are the directions along which a particular linear transformation acts by flipping, compressing or stretching.
In the Banking industry giving loans is the primary source of making money but
at the same time if your repayment rate is not good you will not make any profit,
rather you will risk huge losses.
Banks don’t want to lose good customers and at the same point in time, they
don’t want to acquire bad customers. In this scenario, both the false positives and
false negatives become very important to measure.