机器学习实验五 / 神经网络_公路运量主要包括公路客运量和公路货运量两个方面-CSDN博客

实验五神经网络

代码已开源：https://github.com/LinXiaoDe/MachineLearning/tree/master/lab5

参考链接

https://blog.csdn.net/tangyuanzong/article/details/78922874
https://blog.csdn.net/loveliuzz/article/details/78982928
https://blog.csdn.net/fanxin_i/article/details/80212906
问题描述

公路运量主要包括公路客运量和公路货运量两个方面。据研究，某地区的公路运量主要与该地区的人口数量、机动车数量和公路面积有关。下面数据表中给出了某地区公路运量相关数据。根据相关部门数据，该地区 2010 年和 2011 年的人口数量分别为 73.39 和75.55 万人，机动车数量分别为 3.9635 和 4.0975 万辆，公路面积将分别为 0.9880和 1.0268 万平方千米

实验要求

（1）请利用 BP 神经网络预测该地区 2010 年和 2011 年的公路客运量和公路货运量。
（2）请利用其他方法预测该地区 2010 年和 2011 年的公路客运量和公路货运量，并比较神经网络和其他方法的优缺点

数据集

年份	人口数量	机动车数量	公路面积	公路客运量	公路货运量
1991	22.44	0.75	0.11	6217	1379
1992	25.37	0.85	0.11	7730	1385
1993	27.13	0.9	0.14	9145	1399
1994	29.45	1.05	0.2	10460	1663
1995	30.1	1.35	0.23	11387	1714
1996	30.96	1.45	0.23	12353	1834
1997	34.06	1.6	0.32	15750	4322
1998	36.42	1.7	0.32	18304	8132
1999	38.09	1.85	0.34	19836	8936
2000	39.13	2.15	0.36	21024	11099
2001	39.99	2.2	0.36	19490	11203
2002	41.93	2.25	0.38	20433	10524
2003	44.59	2.35	0.49	22598	11115
2004	47.3	2.5	0.56	25107	13320
2005	52.89	2.6	0.59	33442	16762
2006	55.73	2.7	0.59	36836	18673
2007	56.76	2.85	0.67	40548	20724
2008	59.17	2.95	0.69	42927	20803
2009	60.63	3.1	0.79	43462	21804

实验过程

在本次实验中，我们小组实现了手工PB算法，对该地区 2010 年和 2011 年的公路客运量和公路货运量进行了预测。为比较不同算法之间的差异，丰富实验内容，我们实现了多元线性回归算法MLLR，后用使用sklearn实现SVM对结果进行预测。通过可视化实验结果，对不同算法进行了对比分析。

数据集处理

我们对将上表中的数据集保存为两种格式，一种为xls表格，另一种为csv文件，两种格式对应不同的读取方式，前者通过open_workbook打开并读取，后者通过readlines读取数据，最终返回xdata数据,和两组标签ydata1,ydata2即可，下面是对应实现。

读取xls

# 数据读取
def read_xls_file(filename):                         #读取训练数据  
    data = xlrd.open_workbook(filename)                
    sheet1 = data.sheet_by_index(0)                    
    m = sheet1.nrows                                    
    n = sheet1.ncols                      
    # 人口数量 机动车数量 公路面积 公路客运量 公路货运量              
    pop,veh,roa,pas,fre=[],[],[],[],[] 
    for i in range(m):                                  
        row_data = sheet1.row_values(i)               
        if i > 0:
           pop.append(row_data[1])
           veh.append(row_data[2])
           roa.append(row_data[3])
           pas.append(row_data[4])
           fre.append(row_data[5])
    dataMat = np.mat([pop,veh,roa])
    labels = np.mat([pas,fre])
    dataMat_old = dataMat
    labels_old = labels
    # 数据集合，标签集合，保留数据集合，保留标签集合
    return dataMat,labels,dataMat_old,labels_old

读取csv文件

def loadData():                                 # 文件读取函数
    f=open('./data/train.csv')                  # 打开文件    
    data = f.readlines()    
    print(data)
    l=len(data)                                 # mat为l*6的矩阵,元素都为0
    mat=zeros((l,6))                            
    index=0                                     
    xdata = ones((l,4))                         #xdata为l*4的矩阵，元素都为1
    ydata1,ydata2= [],[]                        #两列数据结果                      
    for line in data:
        line = line.strip()                     #去除多余字符
        linedata = line.split(',')              #对数据分割
        mat[index, :] = linedata[0:6]           #得到一行数据
        index +=1
    yearData   = mat[:,0]                       # 得到年份                  
    xdata[:,1] = mat[:,1]                       #得到第1列数据
    xdata[:,2] = mat[:,2]                       #得到第2列数据
    xdata[:,3] = mat[:,3]                       #得到第3列数据
    ydata1 = mat[:,4]                           #得到第4列数据
    ydata2 = mat[:,5]                           #得到第5列数据
    return yearData,xdata,ydata1,ydata2