티스토리 뷰
source:scikit-learn.org/stable/modules/tree.html#tree
DesicionTreeRegressor는 scikit가 제공하는 클래스이다.
Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression.
DecisionTree는 지도학습supervised learnling에서 자료의 정규 분포 정보를 모르는 경우에(non-parametric) 분류나 회귀분석(Regression)을 하기 위해 사용된다.
The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation.
목표로 삼은 값을 예측하는 모델을 만든다. 모델은 데이터 feature에 내재 된inferred 결정 규칙decision rule을 바탕으로 만들어 진다. 트리를 출력하면 부분적인 상수로 된 근사 처럼 보인다.
For instance, in the example below, decision trees learn from data to approximate a sine curve with a set of if-then-else decision rules. The deeper the tree, the more complex the decision rules and the fitter the model.
예를 들어 아래의 예시를 보면, 데이터를 기반으로 if-then-else 결정 규칙 decision rules를 바탕으로 학습 된 decision tree는 사인 곡선과 유사하다. Tree의 층계가 많아질 수록, 모델이 가지고 있는 decision rule과 fitter는 더 복잡 해 진다.
c.f. non-parametric test
A non parametric test (sometimes called a distribution free test) does not assume anything about the underlying distribution (for example, that the data comes from a normal distribution). That’s compared to parametric test, which makes assumptions about a population’s parameters (for example, the mean or standard deviation); When the word “non parametric” is used in stats, it doesn’t quite mean that you know nothing about the population. It usually means that you know the population data does not have a normal distribution.
For example, one assumption for the one way ANOVA is that the data comes from a normal distribution. If your data isn’t normally distributed, you can’t run an ANOVA, but you can run the nonparametric alternative—the Kruskal-Wallis test.
If at all possible, you should us parametric tests, as they tend to be more accurate. Parametric tests have greater statistical power, which means they are likely to find a true significant effect. Use nonparametric tests only if you have to (i.e. you know that assumptions like normality are being violated). Nonparametric tests can perform well with non-normal continuous data if you have a sufficiently large sample size (generally 15-20 items in each group).
source: www.statisticshowto.com/parametric-and-non-parametric-data/
자세한 class 정의는 여기로
scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html
class sklearn.tree.DecisionTreeRegressor(*, criterion='mse', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, ccp_alpha=0.0)
'머신러닝 라이브러리 > scikit-learn' 카테고리의 다른 글
LabelEncoder and OneHotEncoder (0) | 2021.01.08 |
---|---|
SimpleImputer (0) | 2021.01.08 |
RandomForestRegressor and Ensemble methods (0) | 2021.01.07 |