생능출판사 (가칭)"데이터과학 파이썬" 코드 10장

10.1 넘파이란?

In [1]:
scores = [10, 20, 30, 40, 50, 60]

10.2 리스트와 넘파이 배열의 차이

In [2]:
mid_scores = [10, 20, 30]    # ÆÄÀ̽㠸®½ºÆ® °´Ã¼
final_scores = [70, 80, 90]  # ÆÄÀ̽㠸®½ºÆ® °´Ã¼
In [3]:
total = mid_scores + final_scores    # ¿ø¼Ò°£ÀÇ ÇÕÀÌ ¾Æ´Ñ ¸®½ºÆ®¸¦ ¿¬°áÇÔ
total
Out[3]:
[10, 20, 30, 70, 80, 90]

10.3 넘파이의 별칭과 배열 연산하기

In [4]:
import numpy as np   # ¾ÕÀ¸·Î numpyÀÇ º°¸íÀ» np·Î ÇÑ´Ù
In [5]:
mid_scores  = np.array([10, 20, 30])
final_scores = np.array([60, 70, 80])
In [6]:
total = mid_scores + final_scores
print('½ÃÇ輺ÀûÀÇ ÇÕ°è :', total)    # °¢ ¿ä¼Òº° ÇÕ°è°¡ ³ªÅ¸³­´Ù
print('½ÃÇ輺ÀûÀÇ Æò±Õ :', total/2)  # ¸ðµç ¿ä¼Ò¸¦ 2·Î ³ª´«´Ù
½ÃÇ輺ÀûÀÇ ÇÕ°è : [ 70  90 110]
½ÃÇ輺ÀûÀÇ Æò±Õ : [35. 45. 55.]

10.4 넘파이의 핵심 다차원배열(ndarray)

In [7]:
import numpy as np
a = np.array([1, 2, 3])       # ³ÑÆÄÀÌ ndarray °´Ã¼ÀÇ »ý¼º
a.shape      # a °´Ã¼ÀÇ ÇüÅÂ(shape)
Out[7]:
(3,)
In [8]:
a.ndim
Out[8]:
1
In [9]:
a.dtype
Out[9]:
dtype('int32')
In [10]:
a.itemsize
Out[10]:
4
In [11]:
a.size
Out[11]:
3

LAB 10-1 Ndarray 객체 생성하고 속성 알아 보기

In [12]:
import numpy as np

# ½Ç½À 1
array_a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print('½Ç½À 1 : array_a =', array_a)

# ½Ç½À 2
array_b = np.array(range(10))
print('½Ç½À 2 : array_b =', array_b)

# ½Ç½À 3
array_c = np.array(range(0,10,2))
print('½Ç½À 3 : array_c =', array_c)

# ½Ç½À 4
print('½Ç½À 4: ')
print('array_cÀÇ shape :', array_c.shape)
print('array_cÀÇ ndim :', array_c.ndim)
print('array_cÀÇ ctype :', array_c.dtype)
print('array_cÀÇ size :', array_c.size)
print('array_cÀÇ itemsize :',array_c.itemsize)
½Ç½À 1 : array_a = [0 1 2 3 4 5 6 7 8 9]
½Ç½À 2 : array_b = [0 1 2 3 4 5 6 7 8 9]
½Ç½À 3 : array_c = [0 2 4 6 8]
½Ç½À 4: 
array_cÀÇ shape : (5,)
array_cÀÇ ndim : 1
array_cÀÇ ctype : int32
array_cÀÇ size : 5
array_cÀÇ itemsize : 4

10.5 넘파이 배열 연산

In [13]:
import numpy as np
salary = np.array([220, 250, 230])
In [14]:
salary = salary + 100
print(salary)
[320 350 330]
In [15]:
salary = np.array([220, 250, 230])
salary = salary * 2
print(salary)
[440 500 460]
In [16]:
salary = np.array([220, 250, 230]) 
salary = salary * 2.1 
print(salary)
[462. 525. 483.]

Note

³ÑÆÄÀÌ°¡ °è»êÀ» ½±°í ºü¸£°Ô ÇÒ ¼ö ÀÖ´Â µ¥¿¡´Â ÀÌÀ¯°¡ ÀÖ´Ù. ³ÑÆÄÀÌ´Â °¢ ¹è¿­¸¶´Ù ŸÀÔÀÌ Çϳª¸¸ ÀÖ´Ù°í »ý°¢ÇÑ´Ù. ´Ù½Ã ¸»Çϸé, ³ÑÆÄÀÌÀÇ ¹è¿­ ¾È¿¡´Â µ¿ÀÏÇÑ Å¸ÀÔÀÇ µ¥ÀÌÅ͸¸ ÀúÀåÇÒ ¼ö ÀÖ´Ù. Áï Á¤¼ö¸é Á¤¼ö, ½Ç¼ö¸é ½Ç¼ö¸¸À» ÀúÀåÇÒ ¼ö ÀÖ´Â °ÍÀÌ´Ù. ÆÄÀ̽ãÀÇ ¸®½ºÆ®Ã³·³ ¿©·¯ °¡Áö ŸÀÔÀ» ¼¯¾î¼­ ÀúÀåÇÒ ¼ö´Â ¾ø´Ù. ¸¸¾à ¿©·¯ºÐµéÀÌ ¿©·¯ °¡Áö ŸÀÔÀ» ¼¯¾î¼­ ³ÑÆÄÀÌÀÇ ¹è¿­¿¡ Àü´ÞÇÏ¸é ³ÑÆÄÀÌ´Â ÀÌ°ÍÀ» ÀüºÎ ¹®ÀÚ¿­·Î º¯°æÇÑ´Ù. ¿¹¸¦ µé¾î¼­ ´ÙÀ½ ¹è¿­Àº ¹®ÀÚ¿­ ¹è¿­ÀÌ µÈ´Ù.

In [17]:
tangled = np.array([ 100, 'test', 3.0, False])
print(tangled)
['100' 'test' '3.0' 'False']

LAB 10-2 BMI 계산하기

In [18]:
import numpy as np

heights = [ 1.83, 1.76, 1.69, 1.86, 1.77, 1.73 ]
weights = [ 86,    74,    59,   95,    80,   68  ]

np_heights = np.array(heights)
np_weights = np.array(weights)

bmi = np_weights/(np_heights**2)
print(bmi)
[25.68007405 23.88946281 20.65754    27.45982194 25.53544639 22.72043837]

10.6 인덱싱과 슬라이싱

In [19]:
import numpy as np
scores = np.array([88, 72, 93, 94, 89, 78, 99]) 
In [20]:
scores[2]
Out[20]:
93
In [21]:
scores[-1]
Out[21]:
99
In [22]:
scores[1:4]     # ù¹ø°, µÎ¹ø°, ¼¼¹ø°, ³×¹ø° Ç׸ñÀ» ½½¶óÀÌ½Ì ÇÔ
Out[22]:
array([72, 93, 94])
In [23]:
scores[3:]      # ¸¶Áö¸· À妽º¸¦ »ý·«ÇÏ¸é µðÆúÆ® °ªÀº -1ÀÓ
Out[23]:
array([94, 89, 78, 99])
In [24]:
scores[4:-1]      # ¸¶Áö¸· À妽º -1À» »ç¿ë
Out[24]:
array([89, 78])

10.7 논리적인 인덱싱

In [25]:
ages = np.array([18, 19, 25, 30, 28])
In [26]:
y = ages > 20
y
Out[26]:
array([False, False,  True,  True,  True])
In [27]:
ages[ ages > 20 ]
Out[27]:
array([25, 30, 28])

10.8 2차원 배열

In [28]:
import numpy as np 
y = [[1,2,3], [4,5,6], [7,8,9]] 
y 
Out[28]:
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [29]:
np_array = np.array(y) 
np_array
Out[29]:
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
In [30]:
np_array[0][2]
Out[30]:
3

10.9 넘파이 스타일의 배열의 인덱싱

In [31]:
np_array = np.array([[1,2,3], [4,5,6], [7,8,9]]) 
np_array[0, 2]
Out[31]:
3
In [32]:
np_array[0, 0]
Out[32]:
1
In [33]:
np_array[2, -1]
Out[33]:
9
In [34]:
np_array[0, 0] = 12   # ndarrayÀÇ Ã¹ ¿ä¼Ò¸¦ º¯°æÇÔ
np_array
Out[34]:
array([[12,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9]])
In [35]:
np_array[2, 2] = 1.234  # ¸¶Áö¸· ¿ä¼ÒÀÇ °ªÀ» ½Ç¼ö·Î º¯°æÇÏ·Á°í ÇÏ¸é ½ÇÆÐ
np_array
Out[35]:
array([[12,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  1]])

10.10 넘파이 스타일의 2차원 배열 슬라이싱

In [36]:
np_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) 
np_array[0:2, 1:3]
Out[36]:
array([[2, 3],
       [5, 6]])
In [37]:
np_array[0]
Out[37]:
array([1, 2, 3])
In [38]:
np_array[1, 1:3]
Out[38]:
array([5, 6])
In [39]:
np_array = np.array([[ 1,  2,  3,  4], 
                     [ 5,  6,  7,  8], 
                     [ 9, 10, 11, 12], 
                     [13, 14, 15, 16]]) 
print(np_array[::2][::2]) # ù ½½¶óÀ̽Ì: 0Çà, 2Çà ¼±ÅÃ, µÎ ¹ø° ½½¶óÀ̽Ì: ±× Áß 0Çà ¼±ÅÃ
print(np_array[::2,::2])  # Çà ½½¶óÀ̽Ì: 0Çà, 2Çà ¼±ÅÃ, ¿­ ½½¶óÀ̽Ì: 0¿­ 2¿­ ¼±ÅÃ
[[1 2 3 4]]
[[ 1  3]
 [ 9 11]]

10.11 2차원 배열에서 논리적인 인덱싱

In [40]:
np_array = np.array([[1,2,3], [4,5,6], [7,8,9]]) 
np_array > 5
Out[40]:
array([[False, False, False],
       [False, False,  True],
       [ True,  True,  True]])
In [41]:
np_array[ np_array > 5 ]
Out[41]:
array([6, 7, 8, 9])
In [42]:
np_array[:, 2]
Out[42]:
array([3, 6, 9])
In [43]:
np_array[:, 2] > 5
Out[43]:
array([False,  True,  True])

LAB 10-3 2차원 배열 연습

In [44]:
import numpy as np
x = np.array( [['a', 'b', 'c', 'd'],
               ['c', 'c', 'g', 'h']])

print(x [ x == 'c' ])
#print(x - y)
['c' 'c' 'c']
In [45]:
mat_a = np.array( [[10, 20, 30], [10, 20, 30]])
mat_b = np.array( [[2, 2, 2], [1, 2, 3]])

print(mat_a - mat_b)
[[ 8 18 28]
 [ 9 18 27]]

LAB 10-4 넘파이 배열의 형태 알아내고 슬라이싱하여 연산하기

In [46]:
import numpy as np 
 
x = np.array([[ 1.83, 1.76, 1.69, 1.86, 1.77, 1.73 ], 
              [ 86.0, 74.0, 59.0, 95.0, 80.0, 68.0 ]]) 
y = x[0:2, 1:3] 
z = x[0:2][1:3]

print('x shape :', x.shape)
print('y shape :', y.shape)
print('z shape :', z.shape)
print('z values = :', z)

bmi = x[0] / x[1]**2
print('BMI data')
print(bmi)
x shape : (2, 6)
y shape : (2, 2)
z shape : (1, 6)
z values = : [[86. 74. 59. 95. 80. 68.]]
BMI data
[0.00024743 0.0003214  0.00048549 0.00020609 0.00027656 0.00037413]

LAB 10-5 2차원 배열에서 특정 조건을 만족하는 행만 추출하기

In [47]:
import numpy as np 

players = [[170, 76.4], 
           [183, 86.2], 
           [181, 78.5], 
           [176, 80.1]] 

np_players = np.array(players) 

print('¸ö¹«°Ô°¡ 80 ÀÌ»óÀÎ ¼±¼ö Á¤º¸');
print(np_players[ np_players[:, 1] >= 80.0 ])

print('Å°°¡ 180 ÀÌ»óÀÎ ¼±¼ö Á¤º¸');
print(np_players[ np_players[:, 0] >= 180.0 ])
¸ö¹«°Ô°¡ 80 ÀÌ»óÀÎ ¼±¼ö Á¤º¸
[[183.   86.2]
 [176.   80.1]]
Å°°¡ 180 ÀÌ»óÀÎ ¼±¼ö Á¤º¸
[[183.   86.2]
 [181.   78.5]]

10.12 arange() 함수와 range() 함수의 비교

In [48]:
import numpy as np 
np.arange(5)
Out[48]:
array([0, 1, 2, 3, 4])
In [49]:
np.arange(1, 6)
Out[49]:
array([1, 2, 3, 4, 5])
In [50]:
np.arange(1, 10, 2)
Out[50]:
array([1, 3, 5, 7, 9])
In [51]:
range(5)
Out[51]:
range(0, 5)
In [52]:
range(0, 5, 2)
Out[52]:
range(0, 5, 2)
In [53]:
list(range(5))
Out[53]:
[0, 1, 2, 3, 4]
In [54]:
np.array(range(5))
Out[54]:
array([0, 1, 2, 3, 4])

10.13 linspace() 함수와 logspace() 함수

In [55]:
np.linspace(0, 10, 100)
Out[55]:
array([ 0.        ,  0.1010101 ,  0.2020202 ,  0.3030303 ,  0.4040404 ,
        0.50505051,  0.60606061,  0.70707071,  0.80808081,  0.90909091,
        1.01010101,  1.11111111,  1.21212121,  1.31313131,  1.41414141,
        1.51515152,  1.61616162,  1.71717172,  1.81818182,  1.91919192,
        2.02020202,  2.12121212,  2.22222222,  2.32323232,  2.42424242,
        2.52525253,  2.62626263,  2.72727273,  2.82828283,  2.92929293,
        3.03030303,  3.13131313,  3.23232323,  3.33333333,  3.43434343,
        3.53535354,  3.63636364,  3.73737374,  3.83838384,  3.93939394,
        4.04040404,  4.14141414,  4.24242424,  4.34343434,  4.44444444,
        4.54545455,  4.64646465,  4.74747475,  4.84848485,  4.94949495,
        5.05050505,  5.15151515,  5.25252525,  5.35353535,  5.45454545,
        5.55555556,  5.65656566,  5.75757576,  5.85858586,  5.95959596,
        6.06060606,  6.16161616,  6.26262626,  6.36363636,  6.46464646,
        6.56565657,  6.66666667,  6.76767677,  6.86868687,  6.96969697,
        7.07070707,  7.17171717,  7.27272727,  7.37373737,  7.47474747,
        7.57575758,  7.67676768,  7.77777778,  7.87878788,  7.97979798,
        8.08080808,  8.18181818,  8.28282828,  8.38383838,  8.48484848,
        8.58585859,  8.68686869,  8.78787879,  8.88888889,  8.98989899,
        9.09090909,  9.19191919,  9.29292929,  9.39393939,  9.49494949,
        9.5959596 ,  9.6969697 ,  9.7979798 ,  9.8989899 , 10.        ])
In [56]:
np.logspace(0, 5, 10)
Out[56]:
array([1.00000000e+00, 3.59381366e+00, 1.29154967e+01, 4.64158883e+01,
       1.66810054e+02, 5.99484250e+02, 2.15443469e+03, 7.74263683e+03,
       2.78255940e+04, 1.00000000e+05])

10.14 reshape() 함수

In [57]:
import numpy as np
y = np.arange(12) 
y
Out[57]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
In [58]:
y.reshape(3, 4)
Out[58]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [59]:
y.reshape(6, -1)
Out[59]:
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])
In [60]:
y.reshape(7, 2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-60-93a5372b0460> in <module>
----> 1 y.reshape(7, 2)

ValueError: cannot reshape array of size 12 into shape (7,2)
In [61]:
y.flatten()
Out[61]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

10.16 난수 생성하기

In [62]:
np.random.seed(100)
In [63]:
np.random.rand(5)
Out[63]:
array([0.54340494, 0.27836939, 0.42451759, 0.84477613, 0.00471886])
In [64]:
np.random.rand(5, 3)
Out[64]:
array([[0.12156912, 0.67074908, 0.82585276],
       [0.13670659, 0.57509333, 0.89132195],
       [0.20920212, 0.18532822, 0.10837689],
       [0.21969749, 0.97862378, 0.81168315],
       [0.17194101, 0.81622475, 0.27407375]])
In [65]:
a = 10
b = 20 
(b - a) * np.random.rand(5) + a
Out[65]:
array([14.31704184, 19.4002982 , 18.17649379, 13.3611195 , 11.75410454])
In [66]:
np.random.randint(1, 7, size=10)
Out[66]:
array([4, 1, 3, 2, 2, 4, 3, 6, 4, 1])
In [67]:
np.random.randint(1, 11, size=(4, 7))
Out[67]:
array([[ 2,  1,  8,  7,  3,  1,  9],
       [ 3,  6,  2,  9,  2,  6,  5],
       [ 3,  9,  4,  6,  1, 10,  4],
       [ 7,  4,  5,  8,  7,  4, 10]])

10.17 정규 분포 난수 생성

In [68]:
np.random.randn(5)
Out[68]:
array([-1.02933685, -0.51099219, -2.36027053,  0.10359513,  1.73881773])
In [69]:
np.random.randn(5, 4)
Out[69]:
array([[ 1.24187584,  0.13241276,  0.57779396, -1.57590571],
       [-1.29279424, -0.65991979, -0.87400478, -0.68955061],
       [-0.53547985,  1.52795302,  0.64720579, -0.67733661],
       [-0.2650188 ,  0.74610644, -3.13078483,  0.05962178],
       [-0.87521111,  1.06487833, -0.57315265, -0.80327849]])
In [70]:
mu = 10
sigma = 2
randoms = mu + sigma * np.random.randn( 5, 4 )
randoms
Out[70]:
array([[12.18594325, 11.30255516, 14.32104958,  8.72173986],
       [ 9.33262494,  9.12479628,  6.18841024,  7.54196134],
       [11.58979772,  7.67898372, 11.09211104, 12.32651667],
       [11.31775404, 11.04737852, 12.65431215, 12.22504894],
       [ 7.85074079, 10.68683233, 11.97087508, 11.47300336]])

10.18 평균과 중간값 계산하기

In [71]:
m = 175 
sigma = 10 
heights = m+sigma*np.random.randn(10000)
In [72]:
np.mean(heights)
Out[72]:
174.9972570108985
In [73]:
np.median(heights)
Out[73]:
174.90487455543172
In [74]:
array_data = np.array([ 3, 7, 1, 2, 21])
np.mean(array_data)
Out[74]:
6.8
In [75]:
np.median(array_data)
Out[75]:
3.0

LAB 10-6 평균과 중간값 계산하기

In [76]:
import numpy as np 
 
players = np.zeros( (100, 3) ) 
players[:, 0] = 10 * np.random.randn(100) + 175 
players[:, 1] = 10 * np.random.randn(100) + 70
players[:, 2] = np.floor(10 * np.random.randn(100)) + 22

heights = players[:, 0] 
print('½ÅÀå Æò±Õ°ª:', np.mean(heights))
print('½ÅÀå Áß¾Ó°ª:', np.median(heights))

weights = players[:, 1] 
print('üÁß Æò±Õ°ª:', np.mean(weights))
print('üÁß Áß¾Ó°ª:', np.median(weights))

ages = players[:, 2] 
print('³ªÀÌ Æò±Õ°ª:', np.mean(ages))
print('³ªÀÌ Áß¾Ó°ª:', np.median(ages))
½ÅÀå Æò±Õ°ª: 172.7170127299261
½ÅÀå Áß¾Ó°ª: 171.48115582605044
üÁß Æò±Õ°ª: 69.57509276978816
üÁß Áß¾Ó°ª: 70.55243586196354
³ªÀÌ Æò±Õ°ª: 22.29
³ªÀÌ Áß¾Ó°ª: 23.0

10.19 상관관계 계산하기

In [77]:
import numpy as np 

x = [ i for i in range(100) ]
y = [ i ** 2 for i in range(100) ]

result = np.corrcoef(x, y)
print(result)
[[1.         0.96764439]
 [0.96764439 1.        ]]

10.20 다수 변수들 사이의 상관관계 계산하기

In [78]:
x = [ i for i in range(100) ]
y = [ i ** 2 for i in range(100) ]
z = [ 100 * np.sin(3.14*i/100) for i in range(0, 100) ]
In [79]:
result = np.corrcoef( [x, y, z] )
print(result)
[[ 1.          0.96764439  0.03763255]
 [ 0.96764439  1.         -0.21532645]
 [ 0.03763255 -0.21532645  1.        ]]
In [80]:
a = np.arange(0, 24).reshape(4, 3, 2)
print(a)
[[[ 0  1]
  [ 2  3]
  [ 4  5]]

 [[ 6  7]
  [ 8  9]
  [10 11]]

 [[12 13]
  [14 15]
  [16 17]]

 [[18 19]
  [20 21]
  [22 23]]]
In [81]:
print(a.flatten())
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
In [ ]: