생능출판사 (가칭)"데이터과학 파이썬" 코드 14장

14.6 선형 회귀를 scikit-learn 라이브러리로 구현해 보자

In [1]:
import numpy as np 
from sklearn import linear_model  # scikit-learn ¸ðµâÀ» °¡Á®¿Â´Ù

regr = linear_model.LinearRegression()
In [2]:
X = [[164], [179], [162], [170]]  # ´ÙÁßȸ±Í¿¡µµ »ç¿ëÇϵµ·Ï ÇÔ 
y = [53, 63, 55, 59]              # y = f(X)ÀÇ °á°ú 
regr.fit(X, y)
Out[2]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

14.6 선형 회귀 학습결과를 확인하고 예측하기

In [3]:
regr.fit(X, y)
Out[3]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In [4]:
coef = regr.coef_           # Á÷¼±ÀÇ ±â¿ï±â
intercept = regr.intercept_ # Á÷¼±ÀÇ ÀýÆí
score = regr.score(X, y)    # ÇнÀµÈ Á÷¼±ÀÌ µ¥ÀÌÅ͸¦ ¾ó¸¶³ª Àß µû¸£³ª

print("y =", coef, "* X + ", intercept)
print("The score of this line for the data: ", score)
y = [0.55221745] * X +  -35.686695278969964
The score of this line for the data:  0.903203123105647
In [5]:
input_data = [ [180], [185] ]

14.8 선형회귀로 예측하기 : 키와 몸무게는 상관관계가 있을까

In [10]:
regr.predict([[169]])
Out[10]:
array([57.63805436])
In [11]:
import matplotlib.pyplot as plt
import numpy as np 
from sklearn import linear_model  # scikit-learn ¸ðµâÀ» °¡Á®¿Â´Ù
 
regr = linear_model.LinearRegression() 
 
X = [[164], [179], [162], [170]]  # ¼±Çüȸ±ÍÀÇ ÀÔ·ÂÀº 2Â÷¿øÀ¸·Î ¸¸µé¾î¾ß ÇÔ
y = [53, 63, 55, 59]     # y = f(X)ÀÇ °á°ú°ª
regr.fit(X, y)

# ÇнÀ µ¥ÀÌÅÍ¿Í y °ªÀ» »êÆ÷µµ·Î ±×¸°´Ù. 
plt.scatter(X, y, color='black')

# ÇнÀ µ¥ÀÌÅ͸¦ ÀÔ·ÂÀ¸·Î ÇÏ¿© ¿¹Ãø°ªÀ» °è»êÇÑ´Ù.
y_pred = regr.predict(X)

# ÇнÀ µ¥ÀÌÅÍ¿Í ¿¹Ãø°ªÀ¸·Î ¼±±×·¡ÇÁ·Î ±×¸°´Ù. 
# °è»êµÈ ±â¿ï±â¿Í y ÀýÆíÀ» °¡Áö´Â Á÷¼±ÀÌ ±×·ÁÁø´Ù 
plt.plot(X, y_pred, color='blue', linewidth=3)
plt.show()

LAB 14-1 다차원 선형회귀

In [12]:
import numpy as np 
from sklearn import linear_model 
 
regr = linear_model.LinearRegression() 
# ³²ÀÚ´Â 0, ¿©ÀÚ´Â 1
X = [[164, 1], [167, 1], [165, 0], [170, 0], [179, 0], [163, 1], [159, 0], [166, 1]]    # ÀԷµ¥ÀÌÅ͸¦ 2Â÷¿øÀ¸·Î ¸¸µé¾î¾ß ÇÔ 
y = [43, 48, 47, 66, 67, 50, 52, 44]     # y °ªÀº 1Â÷¿ø µ¥ÀÌÅÍ
regr.fit(X, y)         # ÇнÀ 
print('°è¼ö :', regr.coef_ )
print('ÀýÆí :', regr.intercept_)
print('Á¡¼ö :', regr.score(X, y))
print('ÀºÁö¿Í µ¿¹ÎÀÌÀÇ ÃßÁ¤ ¸ö¹«°Ô :', regr.predict([[166, 1], [166, 0]]))
°è¼ö : [ 0.88542825 -8.87235818]
ÀýÆí : -90.97330367074522
Á¡¼ö : 0.7404546306026769
ÀºÁö¿Í µ¿¹ÎÀÌÀÇ ÃßÁ¤ ¸ö¹«°Ô : [47.13542825 56.00778643]

14.8 당뇨병 예제와 학습 데이터 생성

In [27]:
import matplotlib.pyplot as plt 
import numpy as np 
from sklearn.linear_model import LinearRegression 
from sklearn import datasets 
 
# ´ç´¢º´ µ¥ÀÌÅÍ ¼¼Æ®¸¦ sklearnÀÇ µ¥ÀÌÅÍÁýÇÕÀ¸·ÎºÎÅÍ ÀоîµéÀδÙ. 
diabetes = datasets.load_diabetes()
In [28]:
print('shape of diabetes.data: ', diabetes.data.shape)
print(diabetes.data)
shape of diabetes.data:  (442, 10)
[[ 0.03807591  0.05068012  0.06169621 ... -0.00259226  0.01990842
  -0.01764613]
 [-0.00188202 -0.04464164 -0.05147406 ... -0.03949338 -0.06832974
  -0.09220405]
 [ 0.08529891  0.05068012  0.04445121 ... -0.00259226  0.00286377
  -0.02593034]
 ...
 [ 0.04170844  0.05068012 -0.01590626 ... -0.01107952 -0.04687948
   0.01549073]
 [-0.04547248 -0.04464164  0.03906215 ...  0.02655962  0.04452837
  -0.02593034]
 [-0.04547248 -0.04464164 -0.0730303  ... -0.03949338 -0.00421986
   0.00306441]]
In [29]:
print('ÀԷµ¥ÀÌÅÍÀÇ Æ¯¼ºµé')
print(diabetes.feature_names)
ÀԷµ¥ÀÌÅÍÀÇ Æ¯¼ºµé
['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
In [30]:
print('target data y:', diabetes.target.shape)
print(diabetes.target)
target data y: (442,)
[151.  75. 141. 206. 135.  97. 138.  63. 110. 310. 101.  69. 179. 185.
 118. 171. 166. 144.  97. 168.  68.  49.  68. 245. 184. 202. 137.  85.
 131. 283. 129.  59. 341.  87.  65. 102. 265. 276. 252.  90. 100.  55.
  61.  92. 259.  53. 190. 142.  75. 142. 155. 225.  59. 104. 182. 128.
  52.  37. 170. 170.  61. 144.  52. 128.  71. 163. 150.  97. 160. 178.
  48. 270. 202. 111.  85.  42. 170. 200. 252. 113. 143.  51.  52. 210.
  65. 141.  55. 134.  42. 111.  98. 164.  48.  96.  90. 162. 150. 279.
  92.  83. 128. 102. 302. 198.  95.  53. 134. 144. 232.  81. 104.  59.
 246. 297. 258. 229. 275. 281. 179. 200. 200. 173. 180.  84. 121. 161.
  99. 109. 115. 268. 274. 158. 107.  83. 103. 272.  85. 280. 336. 281.
 118. 317. 235.  60. 174. 259. 178. 128.  96. 126. 288.  88. 292.  71.
 197. 186.  25.  84.  96. 195.  53. 217. 172. 131. 214.  59.  70. 220.
 268. 152.  47.  74. 295. 101. 151. 127. 237. 225.  81. 151. 107.  64.
 138. 185. 265. 101. 137. 143. 141.  79. 292. 178.  91. 116.  86. 122.
  72. 129. 142.  90. 158.  39. 196. 222. 277.  99. 196. 202. 155.  77.
 191.  70.  73.  49.  65. 263. 248. 296. 214. 185.  78.  93. 252. 150.
  77. 208.  77. 108. 160.  53. 220. 154. 259.  90. 246. 124.  67.  72.
 257. 262. 275. 177.  71.  47. 187. 125.  78.  51. 258. 215. 303. 243.
  91. 150. 310. 153. 346.  63.  89.  50.  39. 103. 308. 116. 145.  74.
  45. 115. 264.  87. 202. 127. 182. 241.  66.  94. 283.  64. 102. 200.
 265.  94. 230. 181. 156. 233.  60. 219.  80.  68. 332. 248.  84. 200.
  55.  85.  89.  31. 129.  83. 275.  65. 198. 236. 253. 124.  44. 172.
 114. 142. 109. 180. 144. 163. 147.  97. 220. 190. 109. 191. 122. 230.
 242. 248. 249. 192. 131. 237.  78. 135. 244. 199. 270. 164.  72.  96.
 306.  91. 214.  95. 216. 263. 178. 113. 200. 139. 139.  88. 148.  88.
 243.  71.  77. 109. 272.  60.  54. 221.  90. 311. 281. 182. 321.  58.
 262. 206. 233. 242. 123. 167.  63. 197.  71. 168. 140. 217. 121. 235.
 245.  40.  52. 104. 132.  88.  69. 219.  72. 201. 110.  51. 277.  63.
 118.  69. 273. 258.  43. 198. 242. 232. 175.  93. 168. 275. 293. 281.
  72. 140. 189. 181. 209. 136. 261. 113. 131. 174. 257.  55.  84.  42.
 146. 212. 233.  91. 111. 152. 120.  67. 310.  94. 183.  66. 173.  72.
  49.  64.  48. 178. 104. 132. 220.  57.]
In [31]:
X = diabetes.data[:, 2]
print(X)
[ 0.06169621 -0.05147406  0.04445121 -0.01159501 -0.03638469 -0.04069594
 -0.04716281 -0.00189471  0.06169621  0.03906215 -0.08380842  0.01750591
 -0.02884001 -0.00189471 -0.02560657 -0.01806189  0.04229559  0.01211685
 -0.0105172  -0.01806189 -0.05686312 -0.02237314 -0.00405033  0.06061839
  0.03582872 -0.01267283 -0.07734155  0.05954058 -0.02129532 -0.00620595
  0.04445121 -0.06548562  0.12528712 -0.05039625 -0.06332999 -0.03099563
  0.02289497  0.01103904  0.07139652  0.01427248 -0.00836158 -0.06764124
 -0.0105172  -0.02345095  0.06816308 -0.03530688 -0.01159501 -0.0730303
 -0.04177375  0.01427248 -0.00728377  0.0164281  -0.00943939 -0.01590626
  0.0250506  -0.04931844  0.04121778 -0.06332999 -0.06440781 -0.02560657
 -0.00405033  0.00457217 -0.00728377 -0.0374625  -0.02560657 -0.02452876
 -0.01806189 -0.01482845 -0.02991782 -0.046085   -0.06979687  0.03367309
 -0.00405033 -0.02021751  0.00241654 -0.03099563  0.02828403 -0.03638469
 -0.05794093 -0.0374625   0.01211685 -0.02237314 -0.03530688  0.00996123
 -0.03961813  0.07139652 -0.07518593 -0.00620595 -0.04069594 -0.04824063
 -0.02560657  0.0519959   0.00457217 -0.06440781 -0.01698407 -0.05794093
  0.00996123  0.08864151 -0.00512814 -0.06440781  0.01750591 -0.04500719
  0.02828403  0.04121778  0.06492964 -0.03207344 -0.07626374  0.04984027
  0.04552903 -0.00943939 -0.03207344  0.00457217  0.02073935  0.01427248
  0.11019775  0.00133873  0.05846277 -0.02129532 -0.0105172  -0.04716281
  0.00457217  0.01750591  0.08109682  0.0347509   0.02397278 -0.00836158
 -0.06117437 -0.00189471 -0.06225218  0.0164281   0.09618619 -0.06979687
 -0.02129532 -0.05362969  0.0433734   0.05630715 -0.0816528   0.04984027
  0.11127556  0.06169621  0.01427248  0.04768465  0.01211685  0.00564998
  0.04660684  0.12852056  0.05954058  0.09295276  0.01535029 -0.00512814
  0.0703187  -0.00405033 -0.00081689 -0.04392938  0.02073935  0.06061839
 -0.0105172  -0.03315126 -0.06548562  0.0433734  -0.06225218  0.06385183
  0.03043966  0.07247433 -0.0191397  -0.06656343 -0.06009656  0.06924089
  0.05954058 -0.02668438 -0.02021751 -0.046085    0.07139652 -0.07949718
  0.00996123 -0.03854032  0.01966154  0.02720622 -0.00836158 -0.01590626
  0.00457217 -0.04285156  0.00564998 -0.03530688  0.02397278 -0.01806189
  0.04229559 -0.0547075  -0.00297252 -0.06656343 -0.01267283 -0.04177375
 -0.03099563 -0.00512814 -0.05901875  0.0250506  -0.046085    0.00349435
  0.05415152 -0.04500719 -0.05794093 -0.05578531  0.00133873  0.03043966
  0.00672779  0.04660684  0.02612841  0.04552903  0.04013997 -0.01806189
  0.01427248  0.03690653  0.00349435 -0.07087468 -0.03315126  0.09403057
  0.03582872  0.03151747 -0.06548562 -0.04177375 -0.03961813 -0.03854032
 -0.02560657 -0.02345095 -0.06656343  0.03259528 -0.046085   -0.02991782
 -0.01267283 -0.01590626  0.07139652 -0.03099563  0.00026092  0.03690653
  0.03906215 -0.01482845  0.00672779 -0.06871905 -0.00943939  0.01966154
  0.07462995 -0.00836158 -0.02345095 -0.046085    0.05415152 -0.03530688
 -0.03207344 -0.0816528   0.04768465  0.06061839  0.05630715  0.09834182
  0.05954058  0.03367309  0.05630715 -0.06548562  0.16085492 -0.05578531
 -0.02452876 -0.03638469 -0.00836158 -0.04177375  0.12744274 -0.07734155
  0.02828403 -0.02560657 -0.06225218 -0.00081689  0.08864151 -0.03207344
  0.03043966  0.00888341  0.00672779 -0.02021751 -0.02452876 -0.01159501
  0.02612841 -0.05901875 -0.03638469 -0.02452876  0.01858372 -0.0902753
 -0.00512814 -0.05255187 -0.02237314 -0.02021751 -0.0547075  -0.00620595
 -0.01698407  0.05522933  0.07678558  0.01858372 -0.02237314  0.09295276
 -0.03099563  0.03906215 -0.06117437 -0.00836158 -0.0374625  -0.01375064
  0.07355214 -0.02452876  0.03367309  0.0347509  -0.03854032 -0.03961813
 -0.00189471 -0.03099563 -0.046085    0.00133873  0.06492964  0.04013997
 -0.02345095  0.05307371  0.04013997 -0.02021751  0.01427248 -0.03422907
  0.00672779  0.00457217  0.03043966  0.0519959   0.06169621 -0.00728377
  0.00564998  0.05415152 -0.00836158  0.114509    0.06708527 -0.05578531
  0.03043966 -0.02560657  0.10480869 -0.00620595 -0.04716281 -0.04824063
  0.08540807 -0.01267283 -0.03315126 -0.00728377 -0.01375064  0.05954058
  0.02181716  0.01858372 -0.01159501 -0.00297252  0.01750591 -0.02991782
 -0.02021751 -0.05794093  0.06061839 -0.04069594 -0.07195249 -0.05578531
  0.04552903 -0.00943939 -0.03315126  0.04984027 -0.08488624  0.00564998
  0.02073935 -0.00728377  0.10480869 -0.02452876 -0.00620595 -0.03854032
  0.13714305  0.17055523  0.00241654  0.03798434 -0.05794093 -0.00943939
 -0.02345095 -0.0105172  -0.03422907 -0.00297252  0.06816308  0.00996123
  0.00241654 -0.03854032  0.02612841 -0.08919748  0.06061839 -0.02884001
 -0.02991782 -0.0191397  -0.04069594  0.01535029 -0.02452876  0.00133873
  0.06924089 -0.06979687 -0.02991782 -0.046085    0.01858372  0.00133873
 -0.03099563 -0.00405033  0.01535029  0.02289497  0.04552903 -0.04500719
 -0.03315126  0.097264    0.05415152  0.12313149 -0.08057499  0.09295276
 -0.05039625 -0.01159501 -0.0277622   0.05846277  0.08540807 -0.00081689
  0.00672779  0.00888341  0.08001901  0.07139652 -0.02452876 -0.0547075
 -0.03638469  0.0164281   0.07786339 -0.03961813  0.01103904 -0.04069594
 -0.03422907  0.00564998  0.08864151 -0.03315126 -0.05686312 -0.03099563
  0.05522933 -0.06009656  0.00133873 -0.02345095 -0.07410811  0.01966154
 -0.01590626 -0.01590626  0.03906215 -0.0730303 ]
In [32]:
X = diabetes.data[:, np.newaxis, 2]
print(X)
[[ 0.06169621]
 [-0.05147406]
 [ 0.04445121]
 [-0.01159501]
 [-0.03638469]
 [-0.04069594]
 [-0.04716281]
 [-0.00189471]
 [ 0.06169621]
 [ 0.03906215]
 [-0.08380842]
 [ 0.01750591]
 [-0.02884001]
 [-0.00189471]
 [-0.02560657]
 [-0.01806189]
 [ 0.04229559]
 [ 0.01211685]
 [-0.0105172 ]
 [-0.01806189]
 [-0.05686312]
 [-0.02237314]
 [-0.00405033]
 [ 0.06061839]
 [ 0.03582872]
 [-0.01267283]
 [-0.07734155]
 [ 0.05954058]
 [-0.02129532]
 [-0.00620595]
 [ 0.04445121]
 [-0.06548562]
 [ 0.12528712]
 [-0.05039625]
 [-0.06332999]
 [-0.03099563]
 [ 0.02289497]
 [ 0.01103904]
 [ 0.07139652]
 [ 0.01427248]
 [-0.00836158]
 [-0.06764124]
 [-0.0105172 ]
 [-0.02345095]
 [ 0.06816308]
 [-0.03530688]
 [-0.01159501]
 [-0.0730303 ]
 [-0.04177375]
 [ 0.01427248]
 [-0.00728377]
 [ 0.0164281 ]
 [-0.00943939]
 [-0.01590626]
 [ 0.0250506 ]
 [-0.04931844]
 [ 0.04121778]
 [-0.06332999]
 [-0.06440781]
 [-0.02560657]
 [-0.00405033]
 [ 0.00457217]
 [-0.00728377]
 [-0.0374625 ]
 [-0.02560657]
 [-0.02452876]
 [-0.01806189]
 [-0.01482845]
 [-0.02991782]
 [-0.046085  ]
 [-0.06979687]
 [ 0.03367309]
 [-0.00405033]
 [-0.02021751]
 [ 0.00241654]
 [-0.03099563]
 [ 0.02828403]
 [-0.03638469]
 [-0.05794093]
 [-0.0374625 ]
 [ 0.01211685]
 [-0.02237314]
 [-0.03530688]
 [ 0.00996123]
 [-0.03961813]
 [ 0.07139652]
 [-0.07518593]
 [-0.00620595]
 [-0.04069594]
 [-0.04824063]
 [-0.02560657]
 [ 0.0519959 ]
 [ 0.00457217]
 [-0.06440781]
 [-0.01698407]
 [-0.05794093]
 [ 0.00996123]
 [ 0.08864151]
 [-0.00512814]
 [-0.06440781]
 [ 0.01750591]
 [-0.04500719]
 [ 0.02828403]
 [ 0.04121778]
 [ 0.06492964]
 [-0.03207344]
 [-0.07626374]
 [ 0.04984027]
 [ 0.04552903]
 [-0.00943939]
 [-0.03207344]
 [ 0.00457217]
 [ 0.02073935]
 [ 0.01427248]
 [ 0.11019775]
 [ 0.00133873]
 [ 0.05846277]
 [-0.02129532]
 [-0.0105172 ]
 [-0.04716281]
 [ 0.00457217]
 [ 0.01750591]
 [ 0.08109682]
 [ 0.0347509 ]
 [ 0.02397278]
 [-0.00836158]
 [-0.06117437]
 [-0.00189471]
 [-0.06225218]
 [ 0.0164281 ]
 [ 0.09618619]
 [-0.06979687]
 [-0.02129532]
 [-0.05362969]
 [ 0.0433734 ]
 [ 0.05630715]
 [-0.0816528 ]
 [ 0.04984027]
 [ 0.11127556]
 [ 0.06169621]
 [ 0.01427248]
 [ 0.04768465]
 [ 0.01211685]
 [ 0.00564998]
 [ 0.04660684]
 [ 0.12852056]
 [ 0.05954058]
 [ 0.09295276]
 [ 0.01535029]
 [-0.00512814]
 [ 0.0703187 ]
 [-0.00405033]
 [-0.00081689]
 [-0.04392938]
 [ 0.02073935]
 [ 0.06061839]
 [-0.0105172 ]
 [-0.03315126]
 [-0.06548562]
 [ 0.0433734 ]
 [-0.06225218]
 [ 0.06385183]
 [ 0.03043966]
 [ 0.07247433]
 [-0.0191397 ]
 [-0.06656343]
 [-0.06009656]
 [ 0.06924089]
 [ 0.05954058]
 [-0.02668438]
 [-0.02021751]
 [-0.046085  ]
 [ 0.07139652]
 [-0.07949718]
 [ 0.00996123]
 [-0.03854032]
 [ 0.01966154]
 [ 0.02720622]
 [-0.00836158]
 [-0.01590626]
 [ 0.00457217]
 [-0.04285156]
 [ 0.00564998]
 [-0.03530688]
 [ 0.02397278]
 [-0.01806189]
 [ 0.04229559]
 [-0.0547075 ]
 [-0.00297252]
 [-0.06656343]
 [-0.01267283]
 [-0.04177375]
 [-0.03099563]
 [-0.00512814]
 [-0.05901875]
 [ 0.0250506 ]
 [-0.046085  ]
 [ 0.00349435]
 [ 0.05415152]
 [-0.04500719]
 [-0.05794093]
 [-0.05578531]
 [ 0.00133873]
 [ 0.03043966]
 [ 0.00672779]
 [ 0.04660684]
 [ 0.02612841]
 [ 0.04552903]
 [ 0.04013997]
 [-0.01806189]
 [ 0.01427248]
 [ 0.03690653]
 [ 0.00349435]
 [-0.07087468]
 [-0.03315126]
 [ 0.09403057]
 [ 0.03582872]
 [ 0.03151747]
 [-0.06548562]
 [-0.04177375]
 [-0.03961813]
 [-0.03854032]
 [-0.02560657]
 [-0.02345095]
 [-0.06656343]
 [ 0.03259528]
 [-0.046085  ]
 [-0.02991782]
 [-0.01267283]
 [-0.01590626]
 [ 0.07139652]
 [-0.03099563]
 [ 0.00026092]
 [ 0.03690653]
 [ 0.03906215]
 [-0.01482845]
 [ 0.00672779]
 [-0.06871905]
 [-0.00943939]
 [ 0.01966154]
 [ 0.07462995]
 [-0.00836158]
 [-0.02345095]
 [-0.046085  ]
 [ 0.05415152]
 [-0.03530688]
 [-0.03207344]
 [-0.0816528 ]
 [ 0.04768465]
 [ 0.06061839]
 [ 0.05630715]
 [ 0.09834182]
 [ 0.05954058]
 [ 0.03367309]
 [ 0.05630715]
 [-0.06548562]
 [ 0.16085492]
 [-0.05578531]
 [-0.02452876]
 [-0.03638469]
 [-0.00836158]
 [-0.04177375]
 [ 0.12744274]
 [-0.07734155]
 [ 0.02828403]
 [-0.02560657]
 [-0.06225218]
 [-0.00081689]
 [ 0.08864151]
 [-0.03207344]
 [ 0.03043966]
 [ 0.00888341]
 [ 0.00672779]
 [-0.02021751]
 [-0.02452876]
 [-0.01159501]
 [ 0.02612841]
 [-0.05901875]
 [-0.03638469]
 [-0.02452876]
 [ 0.01858372]
 [-0.0902753 ]
 [-0.00512814]
 [-0.05255187]
 [-0.02237314]
 [-0.02021751]
 [-0.0547075 ]
 [-0.00620595]
 [-0.01698407]
 [ 0.05522933]
 [ 0.07678558]
 [ 0.01858372]
 [-0.02237314]
 [ 0.09295276]
 [-0.03099563]
 [ 0.03906215]
 [-0.06117437]
 [-0.00836158]
 [-0.0374625 ]
 [-0.01375064]
 [ 0.07355214]
 [-0.02452876]
 [ 0.03367309]
 [ 0.0347509 ]
 [-0.03854032]
 [-0.03961813]
 [-0.00189471]
 [-0.03099563]
 [-0.046085  ]
 [ 0.00133873]
 [ 0.06492964]
 [ 0.04013997]
 [-0.02345095]
 [ 0.05307371]
 [ 0.04013997]
 [-0.02021751]
 [ 0.01427248]
 [-0.03422907]
 [ 0.00672779]
 [ 0.00457217]
 [ 0.03043966]
 [ 0.0519959 ]
 [ 0.06169621]
 [-0.00728377]
 [ 0.00564998]
 [ 0.05415152]
 [-0.00836158]
 [ 0.114509  ]
 [ 0.06708527]
 [-0.05578531]
 [ 0.03043966]
 [-0.02560657]
 [ 0.10480869]
 [-0.00620595]
 [-0.04716281]
 [-0.04824063]
 [ 0.08540807]
 [-0.01267283]
 [-0.03315126]
 [-0.00728377]
 [-0.01375064]
 [ 0.05954058]
 [ 0.02181716]
 [ 0.01858372]
 [-0.01159501]
 [-0.00297252]
 [ 0.01750591]
 [-0.02991782]
 [-0.02021751]
 [-0.05794093]
 [ 0.06061839]
 [-0.04069594]
 [-0.07195249]
 [-0.05578531]
 [ 0.04552903]
 [-0.00943939]
 [-0.03315126]
 [ 0.04984027]
 [-0.08488624]
 [ 0.00564998]
 [ 0.02073935]
 [-0.00728377]
 [ 0.10480869]
 [-0.02452876]
 [-0.00620595]
 [-0.03854032]
 [ 0.13714305]
 [ 0.17055523]
 [ 0.00241654]
 [ 0.03798434]
 [-0.05794093]
 [-0.00943939]
 [-0.02345095]
 [-0.0105172 ]
 [-0.03422907]
 [-0.00297252]
 [ 0.06816308]
 [ 0.00996123]
 [ 0.00241654]
 [-0.03854032]
 [ 0.02612841]
 [-0.08919748]
 [ 0.06061839]
 [-0.02884001]
 [-0.02991782]
 [-0.0191397 ]
 [-0.04069594]
 [ 0.01535029]
 [-0.02452876]
 [ 0.00133873]
 [ 0.06924089]
 [-0.06979687]
 [-0.02991782]
 [-0.046085  ]
 [ 0.01858372]
 [ 0.00133873]
 [-0.03099563]
 [-0.00405033]
 [ 0.01535029]
 [ 0.02289497]
 [ 0.04552903]
 [-0.04500719]
 [-0.03315126]
 [ 0.097264  ]
 [ 0.05415152]
 [ 0.12313149]
 [-0.08057499]
 [ 0.09295276]
 [-0.05039625]
 [-0.01159501]
 [-0.0277622 ]
 [ 0.05846277]
 [ 0.08540807]
 [-0.00081689]
 [ 0.00672779]
 [ 0.00888341]
 [ 0.08001901]
 [ 0.07139652]
 [-0.02452876]
 [-0.0547075 ]
 [-0.03638469]
 [ 0.0164281 ]
 [ 0.07786339]
 [-0.03961813]
 [ 0.01103904]
 [-0.04069594]
 [-0.03422907]
 [ 0.00564998]
 [ 0.08864151]
 [-0.03315126]
 [-0.05686312]
 [-0.03099563]
 [ 0.05522933]
 [-0.06009656]
 [ 0.00133873]
 [-0.02345095]
 [-0.07410811]
 [ 0.01966154]
 [-0.01590626]
 [-0.01590626]
 [ 0.03906215]
 [-0.0730303 ]]

14.10 체질량지수bmi와 당뇨수치는 어떤 상관관계가 있을까

In [33]:
regr.fit(X, diabetes.target)         # ÇнÀÀ» ÅëÇÑ ¼±Çüȸ±Í ¸ðµ¨À» »ý¼º 
print(regr.coef_, regr.intercept_)
[949.43526038] 152.1334841628967
In [36]:
# ÇнÀ µ¥ÀÌÅÍ¿Í Å×½ºÆ® µ¥ÀÌÅ͸¦ ºÐ¸®ÇÑ´Ù. 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target,
                                                    test_size=0.2)
In [37]:
# ÇнÀ µ¥ÀÌÅÍ¿Í Å×½ºÆ® µ¥ÀÌÅ͸¦ ºÐ¸®ÇÑ´Ù. 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(diabetes.data[:,np.newaxis,2],
                                                    diabetes.target,
                                                    test_size=0.2) 
regr = LinearRegression() 
regr.fit(X_train, y_train)
Out[37]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In [38]:
score = regr.score(X_train, y_train)
print(score)
score = regr.score(X_test, y_test)
print(score)
0.34621263244077793
0.33329025924682276

14.10 당뇨병 예제를 학습 데이터와 테스트 데이터로 구분하자

In [40]:
X_train, X_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target,
                                                    test_size=0.2)
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)     # Å×½ºÆ® µ¥ÀÌÅÍ·Î ¿¹ÃøÇغ¸ÀÚ.
In [43]:
print(y_pred)
print(y_test)
[ 76.19288971 122.8980136  132.68679726 176.74222532 150.40493432
 142.48710548  66.19143825 225.75315415  60.93811615 160.06630859
 161.7616245  127.1956492  250.32597653 136.83805312 104.47431247
  99.52979979 127.50925329 101.41130274 158.20747795 132.79182392
  80.7795418  174.9973218  287.02165646 190.78262199 117.11515329
  62.16905109  64.53672125 225.29320066  69.1953779  120.80935652
 213.84025117  57.90048558 207.03514126  50.57317748 214.37114275
 179.37268122 142.4440009  142.0586992  234.74067428 163.87708461
  68.65378412 183.03182167 137.46083338 168.48871886 160.57063366
 183.50320213 168.43267522 158.65030154 177.76700402 233.77210559
  90.96710468 208.25586583 142.7774825   99.44421264 190.41776893
 154.66243439 125.51055833 100.85916621 110.98087975 153.3642741
 131.898583    83.15038276 104.96664787 101.17321497 190.12253877
 232.45170311  98.64556045 218.84435877  86.86250865 191.45645608
 164.76520878 157.28402233 135.79646464 239.28512034 145.47140687
  55.81550388 170.95013983 208.96457691  94.03668631  98.86180891
 183.61886889 171.4910994  137.94151312 157.16379177 108.19535084
 109.11138602 260.44818846 196.28927273  97.83528846]
[ 60. 103. 178. 138. 197.  88.  72. 261.  96. 185. 196.  97. 310. 202.
 101.  65.  51. 170.  94. 170.  77. 111. 281. 233.  71.  52.  43. 217.
  59.  68. 163.  65. 268.  55. 275. 217. 116. 302. 208. 120.  70.  77.
 187. 127. 259. 272. 110. 118.  66. 281.  71. 151. 172.  31. 202. 276.
  92.  69.  88. 202.  59.  42.  69. 104. 178. 321.  49. 180. 181.  68.
 178. 109. 103. 272.  93.  90. 216. 288.  84.  54. 107. 180. 230. 150.
 125.  72. 308. 163. 137.]

LAB 14-1 데이터 80%로 학습하여 예측한 결과와 실제 데이터 비교

In [45]:
import numpy as np 
from sklearn import linear_model  # scikit-learn ¸ðµâÀ» °¡Á®¿Â´Ù
from sklearn import datasets
import matplotlib.pyplot as plt

regr = linear_model.LinearRegression() 
# ÇнÀ µ¥ÀÌÅÍ¿Í Å×½ºÆ® µ¥ÀÌÅ͸¦ ºÐ¸®ÇÑ´Ù. 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(diabetes.data,
                                                    diabetes.target,
                                                    test_size=0.2) 
regr.fit(X_train, y_train)
print(regr.coef_, regr.intercept_)

y_pred = regr.predict(X_test)

plt.scatter(y_pred, y_test)
plt.show()
[ -63.57492467 -284.6049703   510.29607689  332.80804801 -914.13334266
  554.96458111  175.3746655   245.40214146  795.88565343   48.31971934] 149.6661397653907

14.11 알고리즘이 가지는 오차

In [48]:
plt.scatter(y_pred, y_test,  color='black')

x = np.linspace(0, 330, 100)  # ƯÁ¤ ±¸°£ÀÇ Á¡ 
plt.plot(x, x, linewidth = 3, color = 'blue')
plt.show()
In [50]:
from sklearn.metrics import mean_squared_error

... # ÀÌÀü Àý¿¡¼­ ±¸ÇÑ ¼±Çüȸ±Í ¸ðµ¨ÀÇ Äڵ带 »ðÀÔÇÔ

print('Mean squared error:', mean_squared_error(y_test, y_pred))
Mean squared error: 2450.864111510396

14.14 아름다운 붓꽃의 종류를 분류할 준비를 해보자

In [1]:
from sklearn.datasets import load_iris 
iris = load_iris() 
print(iris.data)
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.6 1.4 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 4.9 1.5]
 [5.5 2.3 4.  1.3]
 [6.5 2.8 4.6 1.5]
 [5.7 2.8 4.5 1.3]
 [6.3 3.3 4.7 1.6]
 [4.9 2.4 3.3 1. ]
 [6.6 2.9 4.6 1.3]
 [5.2 2.7 3.9 1.4]
 [5.  2.  3.5 1. ]
 [5.9 3.  4.2 1.5]
 [6.  2.2 4.  1. ]
 [6.1 2.9 4.7 1.4]
 [5.6 2.9 3.6 1.3]
 [6.7 3.1 4.4 1.4]
 [5.6 3.  4.5 1.5]
 [5.8 2.7 4.1 1. ]
 [6.2 2.2 4.5 1.5]
 [5.6 2.5 3.9 1.1]
 [5.9 3.2 4.8 1.8]
 [6.1 2.8 4.  1.3]
 [6.3 2.5 4.9 1.5]
 [6.1 2.8 4.7 1.2]
 [6.4 2.9 4.3 1.3]
 [6.6 3.  4.4 1.4]
 [6.8 2.8 4.8 1.4]
 [6.7 3.  5.  1.7]
 [6.  2.9 4.5 1.5]
 [5.7 2.6 3.5 1. ]
 [5.5 2.4 3.8 1.1]
 [5.5 2.4 3.7 1. ]
 [5.8 2.7 3.9 1.2]
 [6.  2.7 5.1 1.6]
 [5.4 3.  4.5 1.5]
 [6.  3.4 4.5 1.6]
 [6.7 3.1 4.7 1.5]
 [6.3 2.3 4.4 1.3]
 [5.6 3.  4.1 1.3]
 [5.5 2.5 4.  1.3]
 [5.5 2.6 4.4 1.2]
 [6.1 3.  4.6 1.4]
 [5.8 2.6 4.  1.2]
 [5.  2.3 3.3 1. ]
 [5.6 2.7 4.2 1.3]
 [5.7 3.  4.2 1.2]
 [5.7 2.9 4.2 1.3]
 [6.2 2.9 4.3 1.3]
 [5.1 2.5 3.  1.1]
 [5.7 2.8 4.1 1.3]
 [6.3 3.3 6.  2.5]
 [5.8 2.7 5.1 1.9]
 [7.1 3.  5.9 2.1]
 [6.3 2.9 5.6 1.8]
 [6.5 3.  5.8 2.2]
 [7.6 3.  6.6 2.1]
 [4.9 2.5 4.5 1.7]
 [7.3 2.9 6.3 1.8]
 [6.7 2.5 5.8 1.8]
 [7.2 3.6 6.1 2.5]
 [6.5 3.2 5.1 2. ]
 [6.4 2.7 5.3 1.9]
 [6.8 3.  5.5 2.1]
 [5.7 2.5 5.  2. ]
 [5.8 2.8 5.1 2.4]
 [6.4 3.2 5.3 2.3]
 [6.5 3.  5.5 1.8]
 [7.7 3.8 6.7 2.2]
 [7.7 2.6 6.9 2.3]
 [6.  2.2 5.  1.5]
 [6.9 3.2 5.7 2.3]
 [5.6 2.8 4.9 2. ]
 [7.7 2.8 6.7 2. ]
 [6.3 2.7 4.9 1.8]
 [6.7 3.3 5.7 2.1]
 [7.2 3.2 6.  1.8]
 [6.2 2.8 4.8 1.8]
 [6.1 3.  4.9 1.8]
 [6.4 2.8 5.6 2.1]
 [7.2 3.  5.8 1.6]
 [7.4 2.8 6.1 1.9]
 [7.9 3.8 6.4 2. ]
 [6.4 2.8 5.6 2.2]
 [6.3 2.8 5.1 1.5]
 [6.1 2.6 5.6 1.4]
 [7.7 3.  6.1 2.3]
 [6.3 3.4 5.6 2.4]
 [6.4 3.1 5.5 1.8]
 [6.  3.  4.8 1.8]
 [6.9 3.1 5.4 2.1]
 [6.7 3.1 5.6 2.4]
 [6.9 3.1 5.1 2.3]
 [5.8 2.7 5.1 1.9]
 [6.8 3.2 5.9 2.3]
 [6.7 3.3 5.7 2.5]
 [6.7 3.  5.2 2.3]
 [6.3 2.5 5.  1.9]
 [6.5 3.  5.2 2. ]
 [6.2 3.4 5.4 2.3]
 [5.9 3.  5.1 1.8]]
In [2]:
iris.data.shape
Out[2]:
(150, 4)

14.15 k-NN 알고리즘을 적용할 데이터를 살펴보자

In [3]:
print(iris.feature_names) # 4°³ÀÇ Æ¯Â¡ À̸§À» Ãâ·ÂÇÑ´Ù.
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
In [4]:
# Á¤¼ö´Â ²ÉÀÇ Á¾·ù¸¦ ³ªÅ¸³½´Ù.: 0 = setosa, 1=versicolor, 2=virginica 
print(iris.target)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

14.16 k-NN 알고리즘을 적용해보자

In [5]:
# (80:20)À¸·Î ºÐÇÒÇÑ´Ù. 
from sklearn.datasets import load_iris 
from sklearn.model_selection import train_test_split 

iris = load_iris() 
X_train,X_test,y_train,y_test = train_test_split(iris.data, iris.target,test_size=0.2)
In [6]:
from sklearn.neighbors import KNeighborsClassifier 
from sklearn import metrics 

num_neigh = 1
knn = KNeighborsClassifier(n_neighbors = num_neigh) 
knn.fit(X_train, y_train) 
y_pred = knn.predict(X_test) 
scores = metrics.accuracy_score(y_test, y_pred) 
print('n_neighbors°¡ {0:d}À϶§ Á¤È®µµ: {1:.3f}'.format(num_neigh, scores))
n_neighbors°¡ 1À϶§ Á¤È®µµ: 0.900

14.17 새로운 꽃에 대해서 모델을 적용해서 분류해보자

In [7]:
from sklearn.datasets import load_iris 
from sklearn.model_selection import train_test_split 
from sklearn.neighbors import KNeighborsClassifier 
 
iris = load_iris() 
knn = KNeighborsClassifier(n_neighbors=6) 
knn.fit(iris.data, iris.target) 
Out[7]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=6, p=2,
                     weights='uniform')
In [8]:
classes = {0:'setosa', 1:'versicolor', 2:'virginica'} 
 
# ¾ÆÁ÷ º¸Áö ¸øÇÑ »õ·Î¿î µ¥ÀÌÅ͸¦ Á¦½ÃÇغ¸ÀÚ. 
X = [[3,4,5,2], 
[5,4,2,2]] 
y = knn.predict(X) 
 
print(classes[y[0]]) 
print(classes[y[1]]) 
versicolor
setosa

14.19 보스턴 집값 데이터 읽어오기와 결측 확인하기

In [9]:
import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 
import seaborn as sns    # ½Ã°¢È­¸¦ À§ÇÏ¿© Seaborn ¶óÀ̺귯¸®¸¦ ÀÌ¿ëÇÔ
 
from sklearn.datasets import load_boston 
boston = load_boston() 
 
df = pd.DataFrame(boston.data, columns=boston.feature_names) 
print(df.head())
      CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS  RAD    TAX  \
0  0.00632  18.0   2.31   0.0  0.538  6.575  65.2  4.0900  1.0  296.0   
1  0.02731   0.0   7.07   0.0  0.469  6.421  78.9  4.9671  2.0  242.0   
2  0.02729   0.0   7.07   0.0  0.469  7.185  61.1  4.9671  2.0  242.0   
3  0.03237   0.0   2.18   0.0  0.458  6.998  45.8  6.0622  3.0  222.0   
4  0.06905   0.0   2.18   0.0  0.458  7.147  54.2  6.0622  3.0  222.0   

   PTRATIO       B  LSTAT  
0     15.3  396.90   4.98  
1     17.8  396.90   9.14  
2     17.8  392.83   4.03  
3     18.7  394.63   2.94  
4     18.7  396.90   5.33  
In [10]:
df['MEDV'] = boston.target 
In [11]:
print( df.isnull().sum() )
CRIM       0
ZN         0
INDUS      0
CHAS       0
NOX        0
RM         0
AGE        0
DIS        0
RAD        0
TAX        0
PTRATIO    0
B          0
LSTAT      0
MEDV       0
dtype: int64

14.20 각 특징들 사이의 상관관계를 살펴보자

In [12]:
sns.set(rc={'figure.figsize':(12,10)}) 
correlation_matrix = df.corr().round(2) 
sns.heatmap(data=correlation_matrix, annot=True) 
plt.show()

14.22 어떤 특징들이 서로 상관관계가 있을까

In [13]:
sns.pairplot(df[["MEDV", "RM", "AGE", "CHAS", "B"]])
plt.show()

14.21 간단한 회귀모델을 만들자

In [14]:
X = df[['LSTAT', 'RM']] 
y = df['MEDV']
In [15]:
from sklearn.model_selection import train_test_split 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
In [16]:
from sklearn.linear_model import LinearRegression 
from sklearn.metrics import mean_squared_error 

lin_model = LinearRegression() 
lin_model.fit(X_train, y_train)
Out[16]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In [17]:
y_test_predict = lin_model.predict(X_test) 
rmse = np.sqrt(mean_squared_error(y_test, y_test_predict))
print('RMSE =', rmse)
RMSE = 5.350860024889858