機械学習のライブラリsklearnというものがあるそうですが、
 具体的にどんなことができるのでしょうか?
このような疑問にお答えします。
sklearnによる機械学習の利用
sklearnは、様々な機械学習手法を簡単に使用することができるツールです。
今回は、sklearnを利用した機械学習入門として下記の手法を利用したいと思います。
- Logistic Regression
 - Support Vector Machines
 - k-Nearest Neighbors
 - Naive Bayes classifier
 - Perceptron
 - Linear SVC
 - Decision Tree
 - Random Forrest
 
以下のライブラリを使用します。
- pandas
 - numpy
 - sklearn
 
ライブラリがない場合は、以下のコマンド等でインストールします。
| 
					 1 2 3  | 
						pip install pandas pip install numpy pip install scikit-learn  | 
					
環境としては、Anaconda prompt経由でjupyter notebookを使用します。データは、機械学習の分野では入門用のデータセットとして頻繁に使用されるirisデータを使用します。
Anaconda環境の導入方法は、以下の記事を参照ください。
以下に、各機械学習手法を簡単に利用するソースコードを載せます。
| 
					 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80  | 
						from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC, LinearSVC from sklearn.ensemble import RandomForestClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.linear_model import Perceptron from sklearn.linear_model import SGDClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from sklearn import datasets import pandas as pd def fn_start_learning():     iris = datasets.load_iris()     df = pd.DataFrame(iris.data, columns=iris.feature_names)     y_label = iris.target.flatten()     x_train, x_test, y_train, y_test = train_test_split(df, y_label, test_size=0.2)     # Logistic Regression     logreg = LogisticRegression()     logreg.fit(x_train, y_train)     y_pred = logreg.predict(x_test)     print('[Logistic Regression]')     print(classification_report(y_test, y_pred))     # Support Vector Machines     svc = SVC()     svc.fit(x_train, y_train)     y_pred = svc.predict(x_test)     print('[Support Vector Machines]')     print(classification_report(y_test, y_pred))     # k-Nearest Neighbors     knn = KNeighborsClassifier(n_neighbors = 3)     knn.fit(x_train, y_train)     y_pred = knn.predict(x_test)     print('[k-Nearest Neighbors]')     print(classification_report(y_test, y_pred))     # Naive Bayes classifier     gaussian = GaussianNB()     gaussian.fit(x_train, y_train)     y_pred = gaussian.predict(x_test)     print('[Naive Bayes classifier]')     print(classification_report(y_test, y_pred))     # Perceptron     perceptron = Perceptron()     perceptron.fit(x_train, y_train)     y_pred = perceptron.predict(x_test)     print('[Perceptron]')     print(classification_report(y_test, y_pred))     # Linear SVC     linear_svc = LinearSVC()     linear_svc.fit(x_train, y_train)     y_pred = linear_svc.predict(x_test)     print('[Linear SVC]')     print(classification_report(y_test, y_pred))     # Decision Tree     decision_tree = DecisionTreeClassifier()     decision_tree.fit(x_train, y_train)     y_pred = decision_tree.predict(x_test)     print('[Decision Tree]')     print(classification_report(y_test, y_pred))     # Random Forest     random_forest = RandomForestClassifier(n_estimators=100)     random_forest.fit(x_train, y_train)     y_pred = random_forest.predict(x_test)     print('[Random Forest]')     print(classification_report(y_test, y_pred)) if __name__ == '__main__':     fn_start_learning()  | 
					
出力結果
各機械学習モデルの評価結果は、以下になります。
| 
					 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87  | 
						[Logistic Regression]               precision    recall  f1-score   support            0       1.00      1.00      1.00        10            1       1.00      0.73      0.84        11            2       0.75      1.00      0.86         9     accuracy                           0.90        30    macro avg       0.92      0.91      0.90        30 weighted avg       0.93      0.90      0.90        30 [Support Vector Machines]               precision    recall  f1-score   support            0       1.00      1.00      1.00        10            1       1.00      0.82      0.90        11            2       0.82      1.00      0.90         9     accuracy                           0.93        30    macro avg       0.94      0.94      0.93        30 weighted avg       0.95      0.93      0.93        30 [k-Nearest Neighbors]               precision    recall  f1-score   support            0       1.00      1.00      1.00        10            1       1.00      0.82      0.90        11            2       0.82      1.00      0.90         9     accuracy                           0.93        30    macro avg       0.94      0.94      0.93        30 weighted avg       0.95      0.93      0.93        30 [Naive Bayes classifier]               precision    recall  f1-score   support            0       1.00      1.00      1.00        10            1       1.00      0.73      0.84        11            2       0.75      1.00      0.86         9     accuracy                           0.90        30    macro avg       0.92      0.91      0.90        30 weighted avg       0.93      0.90      0.90        30 [Perceptron]               precision    recall  f1-score   support            0       0.50      1.00      0.67        10            1       0.00      0.00      0.00        11            2       0.89      0.89      0.89         9     accuracy                           0.60        30    macro avg       0.46      0.63      0.52        30 weighted avg       0.43      0.60      0.49        30 [Linear SVC]               precision    recall  f1-score   support            0       1.00      1.00      1.00        10            1       0.90      0.82      0.86        11            2       0.80      0.89      0.84         9     accuracy                           0.90        30    macro avg       0.90      0.90      0.90        30 weighted avg       0.90      0.90      0.90        30 [Decision Tree]               precision    recall  f1-score   support            0       1.00      1.00      1.00        10            1       1.00      0.73      0.84        11            2       0.75      1.00      0.86         9     accuracy                           0.90        30    macro avg       0.92      0.91      0.90        30 weighted avg       0.93      0.90      0.90        30 [Random Forest]               precision    recall  f1-score   support            0       1.00      1.00      1.00        10            1       1.00      0.73      0.84        11            2       0.75      1.00      0.86         9     accuracy                           0.90        30    macro avg       0.92      0.91      0.90        30 weighted avg       0.93      0.90      0.90        30  | 
					
このように、sklearnを利用することで手軽に機械学習手法を試すことができますので活用してみてはいかがでしょうか。
  
  
  
  
