ARMA或自回归移动平均模型是一种基于过去值预测未来值的预测模型。预测是许多商业目标的关键任务,如预测分析、预测维护、产品规划、预算等。ARMA模型的一大优势是它们相对简单。它们只需要一个小的数据集就能做出预测,它们对短期预测非常准确,而且它们处理的数据没有趋势。
在本教程中,我们将学习如何使用Python statmodels包使用ARMA模型和开源时间序列数据库InfluxDB来预测数据。本教程将概述如何使用InfluxDB Python客户端库从InfluxDB查询数据,并将数据转换为Pandas DataFrame,以便更容易地使用时间序列数据。然后我们再做预测。
我还将更详细地讨论ARMA的优点。
一、需求
本教程是在通过Homebrew安装Python 3的macOS系统上执行的。我建议设置额外的工具,如virtualenv、pyenv或conda-env,以简化Python和客户端安装。否则,全部要求如下:
nfluxdb-client = 1.30.0
pandas = 1.4.3
influxdb-client >= 1.30.0
pandas >= 1.4.3
matplotlib >= 3.5.2
sklearn >= 1.1.1
本教程还假设您有一个免费的层InfluxDB云帐户,并且您已经创建了一个桶和一个令牌。您可以将桶视为数据库或InfluxDB中数据组织的最高层次结构。在本教程中,我们将创建一个名为NOAA的bucket。
二、什么是ARMA?
ARMA代表自动回归移动平均。这是一种结合了AR(自回归)模型和MA(移动平均)模型的预测技术。AR预测是一个线性加性模型。预测是过去值乘以比例因子加上残差的总和。要了解更多关于AR模型背后的数学知识,我建议阅读这篇文章。
移动平均模型是一系列的平均线。有不同类型的移动平均线,包括简单的,累积的和加权的形式。ARMA模型结合了AR和MA技术来生成预测。我建议阅读这篇文章来了解更多关于AR、MA和ARMA模型的知识。今天我们将使用statmodels的ARMA功能进行预测。
[参加11月8日的虚拟峰会- CIO的云未来峰会:掌握复杂性和数字创新-今天注册!]]
三、AR, MA和ARMA模型的假设
如果您希望使用AR、MA和ARMA模型,那么首先必须确保您的数据满足模型的要求:平稳性。要评估时间序列数据是否平稳,必须检查均值和协方差是否保持不变。幸运的是,我们可以使用InfluxDB和Flux语言来获取数据集并使数据稳定。我们将在下一节中进行数据准备
四、用于时间序列差异和数据准备的通量
Flux是用于InfluxDB的数据脚本语言。对于我们的预测,我们使用的是InfluxDB自带的空气传感器样本数据集。该数据集包含来自多个传感器的温度数据。我们正在为单个传感器创建温度预测。数据是这样的:
使用以下Flux代码导入单个时间序列的数据集和过滤器。
import “join”import “influxdata/influxdb/sample”//dataset is regular time series at 10 second intervals
data = sample.data(set: “airSensor”)
|> filter(fn: (r) => r._field == “temperature” and r.sensor_id == “TLM0100”)
接下来,我们可以通过差分移动平均来使时间序列弱平稳。差分是一种从数据中去除任何趋势或斜率的技术。我们将在这个数据准备步骤中使用移动平均差分。首先,我们找到数据的移动平均值。
InfluxData
原始气温数据(蓝色)与移动平均线(粉红色)。
接下来,我们将原始数据和MA数据结合在一起后,从我们的实际时间序列中减去移动平均线。
差分数据是平稳的。
以下是用于执行这种区别的整个Flux脚本:
import “join”import “influxdata/influxdb/sample”//dataset is regular time series at 10 second intervals
data = sample.data(set: “airSensor”)
|> filter(fn: (r) => r._field == “temperature” and r.sensor_id == “TLM0100”)// |> yield(name: “temp data”)
MA = data
|> movingAverage(n:6)// |> yield(name: “MA”)
differenced = join.time(left: data, right: MA, as: (l, r) => ({l with MA: r._value}))|> map(fn: (r) => ({r with _value: r._value – r.MA}))|> yield(name: “stationary data”)
请注意,这种方法估计了趋势周期。序列分解也经常用线性回归来执行。
五、使用Python进行ARMA和时间序列预测
现在我们已经准备好了数据,我们可以创建一个预测。为了使用ARMA方法,我们必须确定数据的p值和q值。p值定义了AR模型的顺序。q值定义了MA模型的顺序。为了将statmodels ARIMA函数转换为ARMA函数,我们提供了一个d值0。d值是稳定所需的非季节性差异的数量。因为我们没有季节性,所以我们不需要任何区别。华东CIO大会、华东CIO联盟、CDLC中国数字化灯塔大会、CXO数字化研学之旅、数字化江湖-讲武堂,数字化江湖-大侠传、数字化江湖-论剑、CXO系列管理论坛(陆家嘴CXO管理论坛、宁波东钱湖CXO管理论坛等)、数字化转型网,走进灯塔工厂系列、ECIO大会等
首先,我们使用Python InfluxDB客户端库查询数据。接下来我们将DataFrame转换为一个数组。然后我们拟合我们的模型,最后我们做出预测。
# query data with the Python InfluxDB Client Library and remove the trend through differencing
client = InfluxDBClient(url=”https://us-west-2-1.aws.cloud2.influxdata.com”, token=”NyP-HzFGkObUBI4Wwg6Rbd-_SdrTMtZzbFK921VkMQWp3bv_e9BhpBi6fCBr_0-6i0ev32_XWZcmkDPsearTWA==”, org=”0437f6d51b579000″)# write_api = client.write_api(write_options=SYNCHRONOUS)
query_api = client.query_api()
df = query_api.query_data_frame(‘import “join””import “influxdata/influxdb/sample””data = sample.data(set: “airSensor”)’
‘|> filter(fn: (r) => r._field == “temperature” and r.sensor_id == “TLM0100″)”MA = data’
‘|> movingAverage(n:6)”join.time(left: data, right: MA, as: (l, r) => ({l with MA: r._value}))”|> map(fn: (r) => ({r with _value: r._value – r.MA}))”|> keep(columns:[“_value”, “_time”])”|> yield(name:”differenced”)’)
df = df.drop(columns=[‘table’, ‘result’])
y = df[“_value”].to_numpy()
date = df[“_time”].dt.tz_localize(None).to_numpy()
y = pd.Series(y, index=date)
model = sm.tsa.arima.ARIMA(y, order=(1,0,2))
result = model.fit()
六、Ljung-Box检验和Durbin-Watson检验
Ljung-Box检验可用于验证用于拟合ARMA模型的p,q的值是否正确。检验检验残差的自相关性。本质上它检验了残差是独立分布的零假设。当使用此检验时,您的目标是确认零假设或表明残差实际上是独立分布的。首先,你必须用选定的p和q值来拟合你的模型,就像我们上面所做的那样。然后使用Ljung-Box检验来确定这些选择的值是否可接受。测试返回一个Ljung-Box p值。如果这个p值大于0.05,那么你已经成功地确认了原假设,你选择的值是好的。
在拟合模型并用Python运行测试之后…
print(sm.stats.acorr_ljungbox(res.resid, lags=[5], return_df=True))
我们得到检验的p值为0.589648。
这证实了我们的p,q值在模型拟合过程中是可以接受的。
你也可以使用Durbin-Watson检验来检验自相关。虽然Ljung-Box检验具有任何滞后的自相关性,但Durbin-Watson检验仅使用等于1的滞后。Durbin-Watson测试的结果可以在0到4之间变化,其中接近2的值表示没有自相关性。目标值接近2。
print(sm.stats.durbin_watson(result.resid.values))
这里我们得到了以下值,它与前面的测试一致,并证实了我们的模型是好的。
2.0011309357716414
七、用Python和Flux完成ARMA预测脚本
现在,我们已经理解了脚本的各个组成部分,让我们从整体上看一下脚本,并创建一个预测图。
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom influxdb_client import InfluxDBClientfrom datetime import datetime as dtimport statsmodels.api as smfrom statsmodels.tsa.arima.model import ARIMA# query data with the Python InfluxDB Client Library and remove the trend through differencing
client = InfluxDBClient(url=”https://us-west-2-1.aws.cloud2.influxdata.com”, token=”NyP-HzFGkObUBI4Wwg6Rbd-_SdrTMtZzbFK921VkMQWp3bv_e9BhpBi6fCBr_0-6i0ev32_XWZcmkDPsearTWA==”, org=”0437f6d51b579000″)# write_api = client.write_api(write_options=SYNCHRONOUS)
query_api = client.query_api()
df = query_api.query_data_frame(‘import “join””import “influxdata/influxdb/sample””data = sample.data(set: “airSensor”)’
‘|> filter(fn: (r) => r._field == “temperature” and r.sensor_id == “TLM0100″)”MA = data’
‘|> movingAverage(n:6)”join.time(left: data, right: MA, as: (l, r) => ({l with MA: r._value}))”|> map(fn: (r) => ({r with _value: r._value – r.MA}))”|> keep(columns:[“_value”, “_time”])”|> yield(name:”differenced”)’)
df = df.drop(columns=[‘table’, ‘result’])
y = df[“_value”].to_numpy()
date = df[“_time”].dt.tz_localize(None).to_numpy()
y = pd.Series(y, index=date)
model = sm.tsa.arima.ARIMA(y, order=(1,0,2))
result = model.fit()
fig, ax = plt.subplots(figsize=(10, 8))
fig = plot_predict(result, ax=ax)
legend = ax.legend(loc=”upper left”)print(sm.stats.durbin_watson(result.resid.values))print(sm.stats.acorr_ljungbox(result.resid, lags=[5], return_df=True))
plt.show()
八、底线
我希望这篇博文能激励你利用ARMA和InfluxDB来进行预测。我建议您看一看下面的回购,其中包括如何使用这里描述的算法和InfluxDB进行预测和执行异常检测的示例。
Anais Dotis-Georgiou是InfluxData的开发人员倡导者,她热衷于使用数据分析、人工智能和机器学习使数据变得美丽。她将研究、探索和工程结合起来,将收集到的数据转化为有用、有价值和美丽的东西。当她不在屏幕后面时,你可以看到她在外面画画、伸展身体、登机或追逐足球。
原文:
An ARMA or autoregressive moving average model is a forecasting model that predicts future values based on past values. Forecasting is a critical task for several business objectives, such as predictive analytics, predictive maintenance, product planning, budgeting, etc. A big advantage of ARMA models is that they are relatively simple. They only require a small dataset to make a prediction, they are highly accurate for short forecasts, and they work on data without a trend.
In this tutorial, we’ll learn how to use the Python statsmodels package to forecast data using an ARMA model and InfluxDB, the open source time series database. The tutorial will outline how to use the InfluxDB Python client library to query data from InfluxDB and convert the data to a Pandas DataFrame to make working with the time series data easier. Then we’ll make our forecast.
I’ll also dive into the advantages of ARMA in more detail.
Requirements
This tutorial was executed on a macOS system with Python 3 installed via Homebrew. I recommend setting up additional tooling like virtualenv, pyenv, or conda-env to simplify Python and client installations. Otherwise, the full requirements are here:
influxdb-client = 1.30.0
pandas = 1.4.3
influxdb-client >= 1.30.0
pandas >= 1.4.3
matplotlib >= 3.5.2
sklearn >= 1.1.1
This tutorial also assumes that you have a free tier InfluxDB cloud account and that you have created a bucket and created a token. You can think of a bucket as a database or the highest hierarchical level of data organization within InfluxDB. For this tutorial we’ll create a bucket called NOAA.
What is ARMA?
ARMA stands for auto-regressive moving average. It’s a forecasting technique that is a combination of AR (auto-regressive) models and MA (moving average) models. An AR forecast is a linear additive model. The forecasts are the sum of past values times a scaling factor plus the residuals. To learn more about the math behind AR models, I suggest reading this article.
A moving average model is a series of averages. There are different types of moving averages including simple, cumulative, and weighted forms. ARMA models combine the AR and MA techniques to generate a forecast. I recommend reading this post to learn more about AR, MA, and ARMA models. Today we’ll be using the statsmodels ARMA function to make forecasts.
[ Attend Virtual Summit on November 8 – CIO’s Future of Cloud Summit: Mastering Complexity & Digital Innovation – Register Today! ]
Assumptions of AR, MA, and ARMA models
If you’re looking to use AR, MA, and ARMA models then you must first make sure that your data meets the requirements of the models: stationarity. To evaluate whether or not your time series data is stationary, you must check that the mean and covariance remain constant. Luckily we can use InfluxDB and the Flux language to obtain a dataset and make our data stationary.
We’ll do this data preparation in the next section.
Flux for time series differencing and data preparation
Flux is the data scripting language for InfluxDB. For our forecast, we’re using the Air Sensor sample dataset that comes out of the box with InfluxDB. This dataset contains temperature data from multiple sensors. We’re creating a temperature forecast for a single sensor. The data looks like this:
InfluxData
Use the following Flux code to import the dataset and filter for the single time series.
import “join”import “influxdata/influxdb/sample”//dataset is regular time series at 10 second intervals
data = sample.data(set: “airSensor”)
|> filter(fn: (r) => r._field == “temperature” and r.sensor_id == “TLM0100”)
Next we can make our time series weakly stationary by differencing the moving average. Differencing is a technique to remove any trend or slope from our data. We will use moving average differencing for this data preparation step. First we find the moving average of our data.
InfluxData
Raw air temperature data (blue) vs. the moving average (pink).
Next we subtract the moving average from our actual time series after joining the raw data and MA data together.
InfluxData
The differenced data is stationary.
Here is the entire Flux script used to perform this differencing:
import “join”import “influxdata/influxdb/sample”//dataset is regular time series at 10 second intervals
data = sample.data(set: “airSensor”)
|> filter(fn: (r) => r._field == “temperature” and r.sensor_id == “TLM0100”)// |> yield(name: “temp data”)
MA = data
|> movingAverage(n:6)// |> yield(name: “MA”)
differenced = join.time(left: data, right: MA, as: (l, r) => ({l with MA: r._value}))|> map(fn: (r) => ({r with _value: r._value – r.MA}))|> yield(name: “stationary data”)
Please note that this approach estimates the trend cycle. Series decomposition is often performed with linear regression as well.
ARMA and time series forecasts with Python
Now that we’ve prepared our data, we can create a forecast. We must identify the p value and q value of our data in order to use the ARMA method. The p value defines the order of our AR model. The q value defines the order of the MA model. To convert the statsmodels ARIMA function to an ARMA function we provide a d value of 0. The d value is the number of nonseasonal differences needed for stationarity. Since we don’t have seasonality we don’t need any differencing.
First we query our data with the Python InfluxDB client library. Next we convert the DataFrame to an array. Then we fit our model, and finally we make a prediction.
# query data with the Python InfluxDB Client Library and remove the trend through differencing
client = InfluxDBClient(url=”https://us-west-2-1.aws.cloud2.influxdata.com”, token=”NyP-HzFGkObUBI4Wwg6Rbd-_SdrTMtZzbFK921VkMQWp3bv_e9BhpBi6fCBr_0-6i0ev32_XWZcmkDPsearTWA==”, org=”0437f6d51b579000″)# write_api = client.write_api(write_options=SYNCHRONOUS)
query_api = client.query_api()
df = query_api.query_data_frame(‘import “join””import “influxdata/influxdb/sample””data = sample.data(set: “airSensor”)’
‘|> filter(fn: (r) => r._field == “temperature” and r.sensor_id == “TLM0100″)”MA = data’
‘|> movingAverage(n:6)”join.time(left: data, right: MA, as: (l, r) => ({l with MA: r._value}))”|> map(fn: (r) => ({r with _value: r._value – r.MA}))”|> keep(columns:[“_value”, “_time”])”|> yield(name:”differenced”)’)
df = df.drop(columns=[‘table’, ‘result’])
y = df[“_value”].to_numpy()
date = df[“_time”].dt.tz_localize(None).to_numpy()
y = pd.Series(y, index=date)
model = sm.tsa.arima.ARIMA(y, order=(1,0,2))
result = model.fit()
Ljung-Box test and Durbin-Watson test
The Ljung-Box test can be used to verify that the values you used for p,q for fitting an ARMA model are good. The test examines autocorrelations of the residuals. Essentially it tests the null hypothesis that the residuals are independently distributed. When using this test, your goal is to confirm the null hypothesis or show that the residuals are in fact independently distributed. First you must fit your model with chosen p and q values, like we did above. Then use the Ljung-Box test to determine if those chosen values are acceptable. The test returns a Ljung-Box p-value. If this p-value is greater than 0.05, then you have successfully confirmed the null hypothesis and your chosen values are good.
After fitting the model and running the test with Python…
print(sm.stats.acorr_ljungbox(res.resid, lags=[5], return_df=True))
we get a p-value for the test of 0.589648.
This confirms that our p,q values are acceptable during model fitting.
You can also use the Durbin-Watson test to test for autocorrelation. Whereas the Ljung-Box tests for autocorrelation with any lag, the Durbin-Watson test uses only a lag equal to 1. The result of your Durbin-Watson test can vary from 0 to 4 where a value close to 2 indicates no autocorrelation. Aim for a value close to 2.
print(sm.stats.durbin_watson(result.resid.values))
Here we get the following value, which agrees with the previous test and confirms that our model is good.
2.0011309357716414
Complete ARMA forecasting script with Python and Flux
Now that we understand the components of the script, let’s look at the script in its entirety and create a plot of our forecast.
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom influxdb_client import InfluxDBClientfrom datetime import datetime as dtimport statsmodels.api as smfrom statsmodels.tsa.arima.model import ARIMA# query data with the Python InfluxDB Client Library and remove the trend through differencing
client = InfluxDBClient(url=”https://us-west-2-1.aws.cloud2.influxdata.com”, token=”NyP-HzFGkObUBI4Wwg6Rbd-_SdrTMtZzbFK921VkMQWp3bv_e9BhpBi6fCBr_0-6i0ev32_XWZcmkDPsearTWA==”, org=”0437f6d51b579000″)# write_api = client.write_api(write_options=SYNCHRONOUS)
query_api = client.query_api()
df = query_api.query_data_frame(‘import “join””import “influxdata/influxdb/sample””data = sample.data(set: “airSensor”)’
‘|> filter(fn: (r) => r._field == “temperature” and r.sensor_id == “TLM0100″)”MA = data’
‘|> movingAverage(n:6)”join.time(left: data, right: MA, as: (l, r) => ({l with MA: r._value}))”|> map(fn: (r) => ({r with _value: r._value – r.MA}))”|> keep(columns:[“_value”, “_time”])”|> yield(name:”differenced”)’)
df = df.drop(columns=[‘table’, ‘result’])
y = df[“_value”].to_numpy()
date = df[“_time”].dt.tz_localize(None).to_numpy()
y = pd.Series(y, index=date)
model = sm.tsa.arima.ARIMA(y, order=(1,0,2))
result = model.fit()
fig, ax = plt.subplots(figsize=(10, 8))
fig = plot_predict(result, ax=ax)
legend = ax.legend(loc=”upper left”)print(sm.stats.durbin_watson(result.resid.values))print(sm.stats.acorr_ljungbox(result.resid, lags=[5], return_df=True))
plt.show()
InfluxData
The bottom line
I hope this blog post inspires you to take advantage of ARMA and InfluxDB to make forecasts. I encourage you to take a look at the following repo, which includes examples for how to work with both the algorithms described here and InfluxDB to make forecasts and perform anomaly detection.
Anais Dotis-Georgiou is a developer advocate for InfluxData with a passion for making data beautiful with the use of data analytics, AI, and machine learning. She applies a mix of research, exploration, and engineering to translate the data she collects into something useful, valuable, and beautiful. When she is not behind a screen, you can find her outside drawing, stretching, boarding, or chasing after a soccer ball.
CXO联盟(CXO union)是一家聚焦于CIO,CDO,cto,ciso,cfo,coo,chro,cpo,ceo等人群的平台组织,其中在CIO会议领域的领头羊,目前举办了大量的CIO大会、CIO论坛、CIO活动、CIO会议、CIO峰会、CIO会展。如华东CIO会议、华南cio会议、华北cio会议、中国cio会议、西部CIO会议。在这里,你可以参加大量的IT大会、IT行业会议、IT行业论坛、IT行业会展、数字化论坛、数字化转型论坛,在这里你可以认识很多的首席信息官、首席数字官、首席财务官、首席技术官、首席人力资源官、首席运营官、首席执行官、IT总监、财务总监、信息总监、运营总监、采购总监、供应链总监。
数字化转型网(资讯媒体,是企业数字化转型的必读参考,在这里你可以学习大量的知识,如财务数字化转型、供应链数字化转型、运营数字化转型、生产数字化转型、人力资源数字化转型、市场营销数字化转型。通过关注我们的公众号,你就知道如何实现企业数字化转型?数字化转型如何做?
【联盟会员】方正CTO、证通CTO、嘉应CTO、东晶CTO、云投CTO、九鼎CTO、金风CTO、海亮CTO、大连重工CTO、国统CTO、海利得CTO、准油CTO、合肥城建CTO、达意隆CTO、飞马CTO、宏达CTO、天融信CTO、大为CTO、大立CTO、诺普信CTO、三全CTO、合力泰CTO、拓日新能CTO、恒康CTO、东华CTO、福晶CTO、鱼跃CTO、三力士CTO、濮耐CTO、江南化工CTO、奥特迅CTO、合兴包装CTO、鸿博CTO、科大讯飞CTO、奥维CTO、启明CTO、塔牌CTO、民和CTO、安妮CTO、大华CTO、恒邦CTO、天威视讯CTO、奥特佳CTO、盛新锂能CTO、歌尔CTO、九阳CTO、力合科创CTO、滨江CTO、蔚蓝锂芯CTO、北化CTO、聚力CTO、华东数控CTO、大洋CTO、联化CTO、步步高CTO、莱士CTO、川大智胜CTO、泰和CTO、海陆重工CTO、兆新CTO、利尔CTO、升达CTO、德奥CTO、拓维CTO、恩华CTO、大东南CTO、新华都CTO、西仪CTO、浙富CTO、陕天然气CTO、卫士通CTO、美邦服饰CTO、华明CTO、东方雨虹CTO、川润CTO、水晶光电CTO、华昌化工CTO、桂林三金CTO、万马CTO、友阿CTO、神开CTO、久其CTO、联络互动CTO、光迅CTO、博深CTO、天润工业CTO、亚太CTO等
更多阅读:
链条420和428有什么区别(420链条规格型号对照表)丰田雷凌出厂配有哪些东西(雷凌出厂轮胎什么牌子)从石狮到金华大巴车要坐多久(石狮到金华物流专线)交通违法有文书编号怎么查询(交通违法文书编号怎么查看)北京bz40质量如何(北京b40新款2020)南宁共享汽车gofun(南宁共享汽车租车点)共享电动汽车数据(共享电动汽车数据分析报告)智租换电(智租换电是多少伏多少安的)五一驾校在哪里(五一驾校收费多少)二手车一般能砍多少合适(二手车可以砍多少价)
猜你喜欢:
如何确认汽车漏电(怎么检查汽车漏电问题)如何练好风行(如何打风行)如何确认自己的铁牌好了(怎么知道汽车铁牌下来没有)如何管理汽配员工(汽配怎么管)如何算车险费(怎样计算车保险费用)如何算骑车多少马力(轿车怎么算马力)如何管理好车队和司机(车队长给司机开会内容)如何粤赣高速河源段堵车(粤赣高速河源段扩建最新情况2020)如何管理交通劝导员(交通劝导员的作用)如何算车辆保险费(如何算车辆保险费率)
文档下载: 导出为如何用arma分析汽车(运用arma模型分析时间序列的前提).doc文档
本文来自投稿,不代表本人立场,如若转载,请注明出处:http://fsxxzx.cn/article/212352.html