- Home
- Interview Questions
- Data Science
A recommender system is today widely deployed in multiple fields like movierecommendations, music preferences, social tags, research articles, searchqueries and so on. The recommender systems work as per collaborative andcontent-based filtering or by deploying a personality-based approach. This type of system worksbased on a person’s past behavior in order to build a model for the future. Thiswill predict the future product buying, movie viewing or book reading bypeople. It also creates a filtering approach using the discrete characteristics ofitems while recommending additional items.
Statistics helps Data Scientists to look into the data for patterns, hiddeninsights and convert Big Data into Big insights. It helps to get a better idea ofwhat the customers are expecting. Data Scientists can learn about theconsumer behavior, interest, engagement, retention and finally conversion allthrough the power of insightful statistics. It helps them to build powerful datamodels in order to validate certain inferences and predictions. All this can beconverted into a powerful business proposition by giving users what theywant at precisely when they want it.
It is a statistical technique or a model in order to analyze a dataset and predictthe binary outcome. The outcome has to be a binary outcome that is eitherzero or one or a yes or no.
With data coming in from multiple sources it is important to ensure that datais good enough for analysis. This is where data cleansing becomes extremelyvital. Data cleansing extensively deals with the process of detecting andcorrecting of data records, ensuring that data is complete and accurate and thecomponents of data that are irrelevant are deleted or modified as per theneeds. This process can be deployed in concurrence with data wrangling orbatch processing.
Once the data is cleaned it confirms with the rules of the data sets in thesystem. Data cleansing is an essential part of the data science because the datacan be prone to error due to human negligence, corruption duringtransmission or storage among other things. Data cleansing takes a hugechunk of time and effort of a Data Scientist because of the multiple sourcesfrom which data emanates and the speed at which it comes.
As the name suggests these are analysis methodologies having a single,double or multiple variables.
So a univariate analysis will have one variable and due to this there are norelationships, causes. The major aspect of the univariate analysis is tosummarize the data and find the patterns within it to make actionabledecisions.
A Bivariate analysis deals with the relationship between two sets of data.These sets of paired data come from related sources, or samples. There arevarious tools to analyze such data including the chi-squared tests and t-testswhen the data are having a correlation.
If the data can be quantified then it can analyzed using a graph plot or ascatterplot. The strength of the correlation between the two data sets will betested in a Bivariate analysis.