我有这些数据:
1975,a,b
1976,b,c
1977,b,a
1977,a,b
1978,c,d
1979,e,f
1980,a,f
我想要一个包含年份和项目的两列列表,如下所示:
1975,a
1975,b
...
我有这个代码:
import pandas
# Set column names
colnames=['Date','Item1','Item2']
# read csv adding column names
data = pandas.read_csv('/Users/Simon/Dropbox/Work/Datasets/lagtest.csv', names=colnames)
# create a dataframe with info on dates for first column
datelist1 = data[['Date', 'Item1']]
# create a dataframe with info on dates for first column
datelist2 = data[['Date', 'Item2']]
bigdatelist = datelist1.append(datelist2)
print bigdatelist
但它给了我这个:
Date Item1 Item2
0 1975 a NaN
1 1976 b NaN
2 1977 b NaN
3 1977 a NaN
4 1978 c NaN
5 1979 e NaN
6 1980 a NaN
0 1975 NaN b
1 1976 NaN c
2 1977 NaN a
3 1977 NaN b
4 1978 NaN d
5 1979 NaN f
6 1980 NaN f
我希望行号连续,并将最后两列合并为一列.有什么建议?
最佳答案 您正在寻找pd.melt.
假设您将此作为数据帧
>>> df
Date item1 item2
0 1975 a b
1 1976 b c
2 1977 b a
3 1977 a b
4 1978 c d
5 1979 e f
6 1980 a f
[7 rows x 3 columns]
现在用这个:
pd.melt(df, id_vars='year')['year','value']
得到你需要的东西.
>>> pd.melt(df, id_vars='Date')[['Date','value']]
Date value
0 1975 a
1 1976 b
2 1977 b
3 1977 a
4 1978 c
5 1979 e
6 1980 a
7 1975 b
8 1976 c
9 1977 a
10 1977 b
11 1978 d
12 1979 f
13 1980 f
[14 rows x 2 columns]