python实现梯度下降算法

zhoufanking 2013-05-14 12:22:40

刚开始学习机器学习，想自己用python实现课件中的算法，加深认识。下面是我写的使用随机梯度下降计算线性回归模型参数的代码，但是得到的结果不对，请帮忙看看！

import sys



#Training data set

#each element in x represents (x0,x1,x2)

x = [(1,2104,3) , (1,1600,3) ,(1,2400,3), (1,1416,2) , (1,3000,4)]

#y[i] is the output of y = theta0 * x[0] + theta1 * x[1] +theta2 * x[2]

y = [400,330,369,232,540]





epsilon = 0.00000001

#learning rate

alpha = 0.0001

diff = [0,0]

max_itor = 1000

error1 = 0

error0 =0

cnt = 0

m = len(x)





#init the parameters to zero

theta0 = 0

theta1 = 0

theta2 = 0



while cnt < max_itor:

	

	cnt = cnt + 1



	#calculate the parameters

	for i in range(m):



		diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] )

		

		theta0 = theta0 + alpha * diff[0] * x[i][0]

		theta1 = theta1 + alpha * diff[0]* x[i][1]

		theta2 = theta2 + alpha * diff[0]* x[i][2]



	#calculate the cost function

	for lp in range(len(x)):

		error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/2

	

	if abs(error1-error0) < epsilon:

		break

	else:

		error0 = error1



	print ' theta0 : %f, theta1 : %f, theta2 : %f, error1 : %f'%(theta0,theta1,theta2,error1)



print ' theta0 : %f, theta1 : %f, theta2 : %f'%(theta0,theta1,theta2)

试着改过alpha和theta的初值，结果还是不对，应该不是初值引起的，但我也看不出代码里哪儿写错了。

...全文

2647 6 打赏收藏转发到动态举报

写回复

用AI写文章

6 条回复

切换为时间正序

请发表友善的回复…

发表回复

Linvo 2013-08-05

打赏
举报

我也有这个疑问，41行的lp难道不应该是i吗？或者该循环都改用j

zhoufanking 2013-05-23

打赏
举报

补一完结贴，当learning-rate和variance分别取0.0000001,0.0001时，data2收敛。

zhoufanking 2013-05-15

打赏
举报

感谢2楼，这几个地方竟然写错了都没发现。。。

引用楼主 zhoufanking 的回复:

import sys

#Training data set
#each element in x represents (x0,x1,x2)
x = [(1,2104,3) , (1,1600,3) ,(1,2400,3), (1,1416,2) , (1,3000,4)]
#y[i] is the output of y = theta0 * x[0] + theta1 * x[1] +theta2 * x[2]
y = [400,330,369,232,540]


epsilon = 0.00000001
#learning rate
alpha = 0.0001
diff = [0,0]
max_itor = 1000
error1 = 0
error0 =0
cnt = 0
m = len(x)


#init the parameters to zero
theta0 = 0
theta1 = 0
theta2 = 0

while cnt < max_itor:
	
	cnt = cnt + 1

	#calculate the parameters
	for i in range(m):

		diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] )
		
		theta0 = theta0 + alpha * diff[0] * x[i][0]
		theta1 = theta1 + alpha * diff[0]* x[i][1]
		theta2 = theta2 + alpha * diff[0]* x[i][2]

	#calculate the cost function
	for lp in range(len(x)):
		error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/2
	
	if abs(error1-error0) < epsilon:
		break
	else:
		error0 = error1

	print ' theta0 : %f, theta1 : %f, theta2 : %f, error1 : %f'%(theta0,theta1,theta2,error1)

print ' theta0 : %f, theta1 : %f, theta2 : %f'%(theta0,theta1,theta2)

试着改过alpha和theta的初值，结果还是不对，应该不是初值引起的，但我也看不出代码里哪儿写错了。

zhoufanking 2013-05-15

打赏
举报

问题解决了，贴上代码：

import sys

#Training data set
#each element in x represents (x0,x1,x2)
x = [(1,0.,3) , (1,1.,3) ,(1,2.,3), (1,3.,2) , (1,4.,4)]
#y[i] is the output of y = theta0 * x[0] + theta1 * x[1] +theta2 * x[2]
y = [95.364,97.217205,75.195834,60.105519,49.342380]


epsilon = 0.0001
#learning rate
alpha = 0.01
diff = [0,0]
max_itor = 1000
error1 = 0
error0 =0
cnt = 0
m = len(x)


#init the parameters to zero
theta0 = 0
theta1 = 0
theta2 = 0

while True:
	
	cnt = cnt + 1

	#calculate the parameters
	for i in range(m):

		diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] )
		
		theta0 = theta0 + alpha * diff[0] * x[i][0]
		theta1 = theta1 + alpha * diff[0]* x[i][1]
		theta2 = theta2 + alpha * diff[0]* x[i][2]

	#calculate the cost function
	error1 = 0
	for lp in range(len(x)):
		error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/2
	
	if abs(error1-error0) < epsilon:
		break
	else:
		error0 = error1

	print ' theta0 : %f, theta1 : %f, theta2 : %f, error1 : %f'%(theta0,theta1,theta2,error1)

print 'Done: theta0 : %f, theta1 : %f, theta2 : %f'%(theta0,theta1,theta2)

上面这段用的是随机梯度下降算法，原来无法收敛是因为1）1楼提到的error1没有复位置零和下标问题；2)步长alpha和误差epsilon的取值不合适。aplha和epsilon的取值感觉比较关键，我用原先的输入数据，怎么调整这两个参数也没法收敛，换了一组数（也就是上面这段代码里的训练数据）很快就得到结果了。下面这段基本借鉴的是一个国外网友blog上贴出来的代码段，地址忘了，在此致谢！这段代码里用的是批量梯度下降算法，而且判断收敛的准则也不同：

import sys

#Training data set
data1 = [(0.000000,95.364693) ,
        (1.000000,97.217205) ,
        (2.000000,75.195834),
        (3.000000,60.105519) ,
        (4.000000,49.342380),
        (5.000000,37.400286),
        (6.000000,51.057128),
        (7.000000,25.500619),
        (8.000000,5.259608),
        (9.000000,0.639151),
        (10.000000,-9.409936),
        (11.000000, -4.383926),
        (12.000000,-22.858197),
        (13.000000,-37.758333),
        (14.000000,-45.606221)]

data2 = [(2104.,400.),
         (1600.,330.),
         (2400.,369.),
         (1416.,232.),
         (3000.,540.)]

def create_hypothesis(theta1, theta0):
    return lambda x: theta1*x + theta0

def linear_regression(data, learning_rate=0.001, variance=0.00001):
    """ Takes a set of data points in the form: [(1,1), (2,2), ...] and outputs (slope, y0). """

   
    #init the parameters to zero
    theta0_guess = 1.
    theta1_guess = 1.


    theta0_last = 100.
    theta1_last = 100.
    
    m = len(data)

    while (abs(theta1_guess-theta1_last) > variance or abs(theta0_guess - theta0_last) > variance):

        theta1_last = theta1_guess
        theta0_last = theta0_guess

        hypothesis = create_hypothesis(theta1_guess, theta0_guess)

        theta0_guess = theta0_guess - learning_rate * (1./m) * sum([hypothesis(point[0]) - point[1] for point in data])
        theta1_guess = theta1_guess - learning_rate * (1./m) * sum([ (hypothesis(point[0]) - point[1]) * point[0] for point in data])   

    return ( theta0_guess,theta1_guess )



points = [(float(x),float(y)) for (x,y) in data1]

res = linear_regression(points)
print res

同样，当采用data2作为训练集时，没法得到theta。

hypnoz 2013-05-15

打赏
举报

这个我都是用matlab了。。。。。矩阵算得快啊咱那些编程语言感觉引入矩阵的包才能达到同等速度

angel_su 2013-05-14

打赏
举报

没研究，不过逻辑上应该用除法反算回来；还有重新算误差error1不归0吗，下标lp是不是应该改i... theta0 = theta0 + alpha * diff[0] / x[i][0] theta1 = theta1 + alpha * diff[0] / x[i][1] theta2 = theta2 + alpha * diff[0] / x[i][2] error1 = 0 for i in range(len(x)): error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/2