當我們需要更大量的資料來做測試的時候，我們可以使用 scikit learn 提供的自動生成數據。

Generated datasets 生成的數據集

make_blobs
make_classification
make_gaussian_quantiles
make_biclusters
make_blobs
make_checkerboard
make_circles
make_classification
make_friedman1
make_friedman2
make_friedman3
make_gaussian_quantiles
make_hastie_10_2
make_low_rank_matrix
make_moons
make_multilabel_classification
make_regression
make_s_curve
make_sparse_coded_signal
make_sparse_spd_matrix
make_sparse_uncorrelated
make_spd_matrix
make_swiss_roll

因為使用方法很類似我們來看幾個實際使用的範例。

from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, 
                   n_features=7,
                   n_informative=4,
                   n_classes=3)

X.shape, y.shape

from sklearn.datasets import make_regression

X, y = make_regression(n_samples=1000, 
                   n_features=7,
                   n_targets=2,
                   random_state=87)

X.shape, y.shape

from sklearn.datasets import make_regression

X, y = make_regression(n_samples=100, 
                    n_features=4, 
                    n_targets=2,
                    random_state=87)

X.shape, y.shape

from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=100, 
                centers=3, 
                n_features=2,
                random_state=87)

X.shape, y.shape

OpenResource

My Resource Location

Generated datasets 生成的數據集