python pandas 怎樣高效地添加一行數據？

01-06

我感覺我目前用的方法很傻
a = a.append(pd.DataFrame(np.matrix(np.repeat(1, 11)),columns = a.columns))
感覺這個步驟是不是新建了一個對象，然後重新賦值給a，這樣的話Dataframe大了就會很慢。

python - Efficiently add single row to Pandas Series or DataFrame

看評論

That"s probably as efficient as any, but Pandas/numpy structures are fundamentally not suited for efficiently growing. They work best when they are created with a fixed size and stay that way. – BrenBarnDec 6 "12 at 20:43
append is a wrapper for concat, so concat would be marginally more efficient, but as @BrenBarn says Pandas is probably not appropriate for updating a HDF5 file every second. If you absolutely need Pandas for some reason, could you collect a list of Series and update the file periodically instead? – Matti JohnDec 6 "12 at 20:54
Bren is right about numpy/pandas working best when preallocated. If memory is no constraint just preallocate a huge zeros array and append at the end of the program removing any excess zeros. Which I suppose is a bit of what Matti is saying. – arynaqDec 6 "12 at 21:16

Intro to Data Structures

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

所以一般說來dataframe就是a set of columns, each column is an array of values. In pandas, the array is one way or another a (maybe variant of) numpy ndarray. 而ndarray本身不存在一種in place append的操作。。。因為它實際上是一段連續內存。。。任何需要改變ndarray長度的操作都涉及分配一段長度合適的新的內存，然後copy。。。這是這類操作慢的原因。。。如果pandas dataframe沒有用其他設計減少copy的話，我相信Bren說的"That"s probably as efficient as any"是很對的。。。

所以in general, 正如Bren說的。。。Pandas/numpy structures are fundamentally not suited for efficiently growing.

Matti 和 arynaq說的是兩種常見的對付這個問題的方法。。。我想Matti實際的意思是把要加的rows收集成起來然後concatenate, 這樣只copy一次。arynaq的方法就是預先分配內存比較好理解。。。

如果你真的需要incrementally build a dataframe的話，估計你需要實際測試一下兩種方法。。。

我的建議是，如有可能，儘力避免incrementally build a dataframe, 比如用其他data structure 收集齊所有data然後轉變成dataframe做分析。。。

順便。。。這類問題上stackoverflow好得多。。。

我一般用：

df.loc[i]={"a":1,"b":2}

df.loc[df.shape[0]+1] = {"ds":strToDate("2017-07-21"),"y":0}

如果a是個DataFrame, 添加一行就可以：

a.iloc[a["某列名"].count()]=1

這樣插入的數據全是1

可以類似於C++的vector一樣彈性增長？每次不夠了就重新分配一個兩倍大小。

直觀的感受加上實際操作，我覺得可以利用轉置之後concat連接，也不是很慢。。

concat