Python regularise irregular time series with linear interpolation(Python用线性插值正则化不规则时间序列)
问题描述
我在 pandas 中有一个时间序列,如下所示:
I have a time series in pandas that looks like this:
Values
1992-08-27 07:46:48 28.0
1992-08-27 08:00:48 28.2
1992-08-27 08:33:48 28.4
1992-08-27 08:43:48 28.8
1992-08-27 08:48:48 29.0
1992-08-27 08:51:48 29.2
1992-08-27 08:53:48 29.6
1992-08-27 08:56:48 29.8
1992-08-27 09:03:48 30.0
我想将其重新采样为具有 15 分钟步长的常规时间序列,其中值是线性插值的.基本上我想得到:
I would like to resample it to a regular time series with 15 min times steps where the values are linearly interpolated. Basically I would like to get:
Values
1992-08-27 08:00:00 28.2
1992-08-27 08:15:00 28.3
1992-08-27 08:30:00 28.4
1992-08-27 08:45:00 28.8
1992-08-27 09:00:00 29.9
但是使用 Pandas 的重采样方法 (df.resample('15Min')) 我得到:
However using the resample method (df.resample('15Min')) from Pandas I get:
Values
1992-08-27 08:00:00 28.20
1992-08-27 08:15:00 NaN
1992-08-27 08:30:00 28.60
1992-08-27 08:45:00 29.40
1992-08-27 09:00:00 30.00
我尝试过使用不同的how"和fill_method"参数的重采样方法,但从未得到我想要的结果.我是不是用错了方法?
I have tried the resample method with different 'how' and 'fill_method' parameters but never got exactly the results I wanted. Am I using the wrong method?
我认为这是一个相当简单的查询,但我已经在网上搜索了一段时间并没有找到答案.
I figure this is a fairly simple query, but I have searched the web for a while and couldn't find an answer.
提前感谢我能得到的任何帮助.
Thanks in advance for any help I can get.
推荐答案
这需要一些工作,但试试这个.基本思想是找到最接近每个重采样点的两个时间戳并进行插值.np.searchsorted
用于查找最接近重采样点的日期.
It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. np.searchsorted
is used to find dates closest to the resample point.
# empty frame with desired index
rs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)
# array of indexes corresponding with closest timestamp after resample
idx_after = np.searchsorted(df.index.values, rs.index.values)
# values and timestamp before/after resample
rs['after'] = df.loc[df.index[idx_after], 'Values'].values
rs['before'] = df.loc[df.index[idx_after - 1], 'Values'].values
rs['after_time'] = df.index[idx_after]
rs['before_time'] = df.index[idx_after - 1]
#calculate new weighted value
rs['span'] = (rs['after_time'] - rs['before_time'])
rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']
# I got errors here unless I turn the index to a series
rs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']
rs['Values'] = rs.eval('before * before_weight + after * after_weight')
毕竟,希望是正确的答案:
After all that, hopefully the right answer:
In [161]: rs['Values']
Out[161]:
1992-08-27 08:00:00 28.011429
1992-08-27 08:15:00 28.313939
1992-08-27 08:30:00 28.223030
1992-08-27 08:45:00 28.952000
1992-08-27 09:00:00 29.908571
Freq: 15T, Name: Values, dtype: float64
这篇关于Python用线性插值正则化不规则时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Python用线性插值正则化不规则时间序列


- 分析异常:路径不存在:dbfs:/databricks/python/lib/python3.7/site-packages/sampleFolder/data; 2022-01-01
- 沿轴计算直方图 2022-01-01
- 如何在 Python 的元组列表中对每个元组中的第一个值求和? 2022-01-01
- 如何将一个类的函数分成多个文件? 2022-01-01
- python-m http.server 443--使用SSL? 2022-01-01
- padding='same' 转换为 PyTorch padding=# 2022-01-01
- 使用Heroku上托管的Selenium登录Instagram时,找不到元素';用户名'; 2022-01-01
- pytorch 中的自适应池是如何工作的? 2022-07-12
- python check_output 失败,退出状态为 1,但 Popen 适用于相同的命令 2022-01-01
- 如何在 python3 中将 OrderedDict 转换为常规字典 2022-01-01