標籤:

python pandas 怎樣高效地添加一行數據?

我感覺我目前用的方法很傻

a = a.append(pd.DataFrame(np.matrix(np.repeat(1, 11)),columns = a.columns))

感覺這個步驟是不是新建了一個對象,然後重新賦值給a,這樣的話Dataframe大了就會很慢。


python - Efficiently add single row to Pandas Series or DataFrame

看評論

That"s probably as efficient as any, but Pandas/numpy structures are fundamentally not suited for efficiently growing. They work best when they are created with a fixed size and stay that way. – BrenBarnDec 6 "12 at 20:43

append is a wrapper for concat, so concat would be marginally more efficient, but as @BrenBarn says Pandas is probably not appropriate for updating a HDF5 file every second. If you absolutely need Pandas for some reason, could you collect a list of Series and update the file periodically instead? – Matti JohnDec 6 "12 at 20:54

Bren is right about numpy/pandas working best when preallocated. If memory is no constraint just preallocate a huge zeros array and append at the end of the program removing any excess zeros. Which I suppose is a bit of what Matti is saying. – arynaqDec 6 "12 at 21:16

Intro to Data Structures

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

所以一般說來dataframe就是a set of columns, each column is an array of values. In pandas, the array is one way or another a (maybe variant of) numpy ndarray. 而ndarray本身不存在一種in place append的操作。。。因為它實際上是一段連續內存。。。任何需要改變ndarray長度的操作都涉及分配一段長度合適的新的內存,然後copy。。。這是這類操作慢的原因。。。如果pandas dataframe沒有用其他設計減少copy的話,我相信Bren說的"That"s probably as efficient as any"是很對的。。。

所以in general, 正如Bren說的。。。Pandas/numpy structures are fundamentally not suited for efficiently growing.

Matti 和 arynaq說的是兩種常見的對付這個問題的方法。。。我想Matti實際的意思是把要加的rows收集成起來然後concatenate, 這樣只copy一次。arynaq的方法就是預先分配內存比較好理解。。。

如果你真的需要incrementally build a dataframe的話,估計你需要實際測試一下兩種方法。。。

我的建議是,如有可能,儘力避免incrementally build a dataframe, 比如用其他data structure 收集齊所有data然後轉變成dataframe做分析。。。

順便。。。這類問題上stackoverflow好得多。。。


我一般用:

df.loc[i]={"a":1,"b":2}


df.loc[df.shape[0]+1] = {"ds":strToDate("2017-07-21"),"y":0}


如果a是個DataFrame, 添加一行就可以:

a.iloc[a["某列名"].count()]=1

這樣插入的數據全是1


可以類似於C++的vector一樣彈性增長?每次不夠了就重新分配一個兩倍大小。


直觀的感受加上實際操作,我覺得可以利用轉置之後concat連接,也不是很慢。。


concat


推薦閱讀:

參加數學建模有沒有必要學python?
如何快速地注釋Python代碼?
對於 Python 的科學計算有哪些提高運算速度的技巧?
基於 Python 的中文分詞方案那種比較好?
Python 多線程效率不高嗎?

TAG:Python |