how to unstack (or pivot?) in pandas(如何在 pandas 中拆散(或旋转?))
问题描述
我有一个如下所示的数据框:
I have a dataframe that looks like the following:
import pandas as pd
datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D')
s = list(datelisttemp)*3
s.sort()
df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s)
这给了我:
Out[458]: df
             BORDER  HOUR1  HOUR2  HOUR3
2014-01-01  GERMANY      2      3      8
2014-01-01   FRANCE      2      3      8
2014-01-01    ITALY      2      3      8
2014-01-02  GERMANY      4      5     12
2014-01-02   FRANCE      4      5     12
2014-01-02    ITALY      4      5     12
2014-01-03  GERMANY      6      7     99
2014-01-03   FRANCE      6      7     99
2014-01-03    ITALY      6      7     99
我希望最终的数据框看起来像:
I want the final dataframe to look something like:
             HOUR  GERMANY  FRANCE  ITALY
2014-01-01   1     2        2       2     
2014-01-01   2     3        3       3
2014-01-01   3     8        8       8 
2014-01-02   1     4        4       4
2014-01-02   2     5        5       5
2014-01-02   3    12       12      12
2014-01-03   1     6        6       6
2014-01-03   2     7        7       7
2014-01-03   3    99       99      99
我已经完成了以下操作,但还没有完成:
I've done the following but I'm not quite there:
df['date_col'] = df.index
df2 = melt(df, id_vars=['date_col','BORDER'])  
#Can I keep the same index after melt or do I have to set an index like below?
df2.set_index(['date_col', 'variable'], inplace=True, drop=True)
df2 = df2.sort()
df
Out[465]: df2
                         BORDER   value
date_col   variable                 
2014-01-01 HOUR1           GERMANY   2
           HOUR1           FRANCE    2
           HOUR1           ITALY     2
           HOUR2           GERMANY   3
           HOUR2           FRANCE    3
           HOUR2           ITALY     3
           HOUR3           GERMANY   8
           HOUR3           FRANCE    8
           HOUR3           ITALY     8
2014-01-02 HOUR1           GERMANY   4
           HOUR1           FRANCE    4
           HOUR1           ITALY     4
           HOUR2           GERMANY   5
           HOUR2           FRANCE    5
           HOUR2           ITALY     5
           HOUR3           GERMANY  12
           HOUR3           FRANCE   12
           HOUR3           ITALY    12
2014-01-03 HOUR1           GERMANY   6
           HOUR1           FRANCE    6
           HOUR1           ITALY     6
           HOUR2           GERMANY   7
           HOUR2           FRANCE    7
           HOUR2           ITALY     7
           HOUR3           GERMANY  99
           HOUR3           FRANCE   99
           HOUR3           ITALY    99
我以为我可以取消堆叠 df2 以获得类似于我的最终数据帧的东西,但我得到了各种各样的错误.我也尝试过旋转这个数据框,但不能完全得到我想要的.
I thought I could unstack df2 to get something that resembles my final dataframe but I get all sorts of errors. I have also tried to pivot this dataframe but can't quite get what I want.
推荐答案
我们希望值(例如'GERMANY')成为列名,列名(例如'HOUR1') 成为值——一种交换.
We want values (e.g. 'GERMANY') to become column names, and column names (e.g. 'HOUR1') to become values -- a swap of sorts. 
stack 方法将列名转换为索引值,并且unstack 方法将索引值转换为列名.
The stack method turns column names into index values, and 
the unstack method turns index values into column names.
所以通过将值转移到索引中,我们可以使用 stack 和 unstack 来执行交换.
So by shifting the values into the index, we can use stack and unstack to perform the swap.
import pandas as pd
datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D')
s = list(datelisttemp)*3
s.sort()
df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s)
df = df.set_index(['BORDER'], append=True)
df.columns.name = 'HOUR'
df = df.unstack('BORDER')
df = df.stack('HOUR')
df = df.reset_index('HOUR')
df['HOUR'] = df['HOUR'].str.replace('HOUR', '').astype('int')
print(df)
产量
BORDER      HOUR  FRANCE  GERMANY  ITALY
2014-01-01     1       2        2      2
2014-01-01     2       3        3      3
2014-01-01     3       8        8      8
2014-01-02     1       4        4      4
2014-01-02     2       5        5      5
2014-01-02     3      12       12     12
2014-01-03     1       6        6      6
2014-01-03     2       7        7      7
2014-01-03     3      99       99     99
                        这篇关于如何在 pandas 中拆散(或旋转?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何在 pandas 中拆散(或旋转?)
				
        
 
            
        - 使用公司代理使Python3.x Slack(松弛客户端) 2022-01-01
 - 计算测试数量的Python单元测试 2022-01-01
 - 我如何透明地重定向一个Python导入? 2022-01-01
 - 检查具有纬度和经度的地理点是否在 shapefile 中 2022-01-01
 - YouTube API v3 返回截断的观看记录 2022-01-01
 - CTR 中的 AES 如何用于 Python 和 PyCrypto? 2022-01-01
 - 如何使用PYSPARK从Spark获得批次行 2022-01-01
 - 使用 Cython 将 Python 链接到共享库 2022-01-01
 - 我如何卸载 PyTorch? 2022-01-01
 - ";find_element_by_name(';name';)";和&QOOT;FIND_ELEMENT(BY NAME,';NAME';)";之间有什么区别? 2022-01-01
 
