- Home
- Interview Questions
- Data Science
Take the entire data set as inputLook for a split that maximizes the separation of the classes. A split isany test that divides the data in two setsApply the split to the input data (divide step)Re-apply steps 1 to 2 to the divided dataStop when you meet some stopping criteriaThis step is called pruning. Clean up the tree when you went too fardoing splits.
Root cause analysis was initially developed to analyze industrial accidents,but is now widely used in other areas. It is basically a technique of problemsolving used for isolating the root causes of faults or problems. A factor iscalled a root cause if its deduction from the problem-fault-sequence averts thefinal undesirable event from reoccurring.
It is a model validation technique for evaluating how the outcomes of astatistical analysis will generalize to an independent data set. Mainly used inbackgrounds where the objective is forecast and one wants to estimate howaccurately a model will accomplish in practice.
The goal of cross-validation is to term a data set to test the model in thetraining phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independentdata set.
The process of filtering used by most of the recommender systems to findpatterns or information by collaborating perspectives, numerous data sourcesand several agents.
No, they do not because in some cases it reaches a local minima or a localoptima point. You will not reach the global optima point. This is governed bythe data and the starting conditions.