python pandas 怎樣高效地添加一行數據?
01-06
我感覺我目前用的方法很傻
感覺這個步驟是不是新建了一個對象,然後重新賦值給a,這樣的話Dataframe大了就會很慢。
a = a.append(pd.DataFrame(np.matrix(np.repeat(1, 11)),columns = a.columns))
python - Efficiently add single row to Pandas Series or DataFrame
看評論
That"s probably as efficient as any, but Pandas/numpy structures are fundamentally not suited for efficiently growing. They work best when they are created with a fixed size and stay that way. – BrenBarnDec 6 "12 at 20:43
append is a wrapper for concat, so concat would be marginally more efficient, but as @BrenBarn says Pandas is probably not appropriate for updating a HDF5 file every second. If you absolutely need Pandas for some reason, could you collect a list of Series and update the file periodically instead? – Matti JohnDec 6 "12 at 20:54Bren is right about numpy/pandas working best when preallocated. If memory is no constraint just preallocate a huge zeros array and append at the end of the program removing any excess zeros. Which I suppose is a bit of what Matti is saying. – arynaqDec 6 "12 at 21:16
Intro to Data Structures
如果你真的需要incrementally build a dataframe的話,估計你需要實際測試一下兩種方法。。。
我的建議是,如有可能,儘力避免incrementally build a dataframe, 比如用其他data structure 收集齊所有data然後轉變成dataframe做分析。。。順便。。。這類問題上stackoverflow好得多。。。我一般用:
df.loc[i]={"a":1,"b":2}
df.loc[df.shape[0]+1] = {"ds":strToDate("2017-07-21"),"y":0}
如果a是個DataFrame, 添加一行就可以:
a.iloc[a["某列名"].count()]=1
這樣插入的數據全是1
可以類似於C++的vector一樣彈性增長?每次不夠了就重新分配一個兩倍大小。
直觀的感受加上實際操作,我覺得可以利用轉置之後concat連接,也不是很慢。。
concat
推薦閱讀:
※參加數學建模有沒有必要學python?
※如何快速地注釋Python代碼?
※對於 Python 的科學計算有哪些提高運算速度的技巧?
※基於 Python 的中文分詞方案那種比較好?
※Python 多線程效率不高嗎?
TAG:Python |