python3機器學習經典實例-第八章解剖時間序列和時序數據31

05-10

操作時間序列數據

現在我們知道如何切分數據並抽取各種子數據集了，接下來介紹如何操作時間序列數據。你可以用各種不同的方式過濾數據。pandas庫提供了各種操作時間序列數據的方式。

創建operating_on_data.py文件，並導入必要資料庫。

import pandas as pdimport matplotlib.pyplot as pltfrom convert_to_timeseries import convert_data_to_timeseries

使用上一節用到的文本文件：將用到第三列和第四列數據：將數據轉化為pandas的數據幀：

# Input file containing datainput_file = data_timeseries.txt# Load datadata1 = convert_data_to_timeseries(input_file, 2)data2 = convert_data_to_timeseries(input_file, 3)dataframe = pd.DataFrame({first: data1, second: data2})

畫出給定年份範圍內的數據：假定我們希望畫出在給定年份範圍內剛才載入的兩列數據的不同，可以用以下方式實現：如果希望對第一列和第二列用不同的條件來過濾數據，可以指定這些條件並將其畫出：

# Plot datadataframe[1952:1955].plot()plt.title(Data overlapped on top of each other)# Plot the differenceplt.figure()difference = dataframe[1952:1955][first] - dataframe[1952:1955][second]difference.plot()plt.title(Difference (first - second))# When first is greater than a certain threshold# and second is smaller than a certain thresholddataframe[(dataframe[first] > 60) & (dataframe[second] < 20)].plot()plt.title(first > 60 and second < 20)plt.show()

結果輸出out