用Python分析指數： 10月18日指數高低Z值表。

01-28

本文閱讀時間需要15-30分鐘。

衡量市場，指數高低是一個難題!

價值投資者很難知道，

現在是高估，還是低估?

買的是便宜還是，貴了?

現在應該買/賣，還是再等等？

針對這個問題，我在網上看到了一些量化的處理方法。例如：平均數法，中位數法，比例法等等。這種方法往往過於簡單，只能衡量集中度。不能衡量離散度和概率。

也許統計方法中的標準差Z值法更加適合。既可以衡量某個指數的指標的集中度，還可以衡量離散度，和風險情況。儘管指數的數據也不是完美的正態分布，但Z值法依然存在較大參考意義。

我的觀點

Z值越大，越高估。因為大數定理認為：Z>1, Z>2,意味著繼續變大的可能性小於16%, 5%。Z值越小，越低估。因為大數定理認為：Z<-1, Z<-2,意味著繼續變小的可能性小於16%, 5%綜觀550多隻指數的歷史數據。絕大部分指數的Z值都在-2，3之間。註：少數的能源，金屬類指數曾經出現過短暫瘋狂的。Z值法就不太適用

我使用Python的Pandas 和 Matplotlib 等工具，加上一些渠道獲得的指數數據（尤其是市盈率），做了這個工具。主要目的是：

方便自己定投使用。知道何時開始定投，何時停止定投，何時止盈。（目前還沒有止盈過）結合統計學，熟悉Python的基本數據分析方法。網上分享給願意參考的人,交流和學習

分享是對自己最好的投資！

歡迎指正。

1 Python 基礎模塊初始化

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns
%matplotlib inline
!free -h
# 以下代碼是為了顯示正文正常 import matplotlib as mpl import matplotlib.font_manager as font_manager
path_eng = /usr/share/fonts/chinese/REFSAN.TTF
path_CHN = "/usr/share/fonts/chinese/simhei.ttf"
prop = font_manager.FontProperties(fname=path_CHN) #Set the microsoft sans serief as default font family. if show chinese test, set path_CHN instead. #prop.set_weight = light
mpl.rcParams[font.family] = prop.get_name()

total used free shared buff/cache available
Mem: 992M 546M 200M 12K 245M 302M
Swap: 2.0G 23M 2.0G

2 資料庫導入

#import data #數據源：UQer. #數據採集是另外一套程序，由於UQer中文支持，圖形支持不太好，下載數據到本地來進一步處理。 #8月份開始，UQ停止了數據下載功能。現在恢復了，不知道能用多長時間。先用著再說。

sec_map = pd.read_hdf("uqer/sec_map.h5","map") # sec_map 包含了約2800多個指數,實際指數約550隻 #sec_map = sec_map.set_index("ticker")
history = pd.read_hdf("uqer/uq_history.h5","history") #hisotry 包含了從2004年到2017年10月18日的指數數據( 大約86萬條數據).
print(sec_map.columns)
print(history.columns)
Index([baseDate, basePoint, endDate, indexType, indexTypeCD,
porgFullName, pubOrgCD, publishDate, secID, secShortName],
dtype=object)
Index([secID, Close, PB1, PB2, PE1, PE2, TurnoverValue,
TurnoverVol],
dtype=object)

3 定義指標-畫圖函數（）

根據大數定律，在正態分布情況下，

Z值=0,左右概率是50%

Z值在（-1，+1）左右的概率合計是68%,

Z值在(-2,+2)左右區間的概率合計是95%。

例：Z值=1，其他數據大於1的概率<=84%,小於1的概率>=16%。可近似認為：目前已經高估

例：Z值=-1，其他數據小於-1的概率<=84%,大於-1的概率<=16% 可近似認為：目前已經低估

def show_KPI(history,ticker ="000001",source="uq",KPI="Close"):

#get the ticker shortName for easy understanding
shortname =sec_map.loc[ticker]["secShortName"] #get the security index chinese name #check if the request KPI is compatible with database if source =="uq":
if KPI not in ["Close","TurnoverValue","PE1","PE2"]:

return elif source =="csi":
if KPI not in ["Close","Turnover","P/E1","D/P1"]:
return # setup two time span, short for this year(2017),long for full history
timespan={7:"2017",
6:"2016",
5:"2015",
4:"2014",
3:"2013",
2:"2012",
1:"2011",

0:"2007:2016"} #define the time span for each year. 0 is for full range(2007~2017)

history= history.loc[ticker][KPI] #show the selected year KPI performances

history_thisyear =history.loc[timespan[7]]

history_compare =[history,history_thisyear]

#his=history.swaplevel().sort_index().loc["2014"] #initialize the matplot, chinese font and size and qty of subplots

fig, axes = plt.subplots(1,2, figsize=(10,5),sharey=True)
for i,v_history in enumerate(history_compare): #0 for full histry, 1 for 2017
mean=v_history.mean()
std =v_history.std()
last=v_history.tail(1)

#print(last.index.strftime(%Y-%m-%d),last.values[0])

fig.axes[i].plot(v_history,"b")

#xticks = v_history.date_range(start=dStart, end=dEnd, freq=W-Tue) #fig.axes[i].set_xticklabels(rotation=30)

fig.axes[i].legend(loc=best)
if np.isnan(mean) or np.isnan(std):
pass else:

fig.axes[i].axhline(mean,color="y",label ="mean")
fig.axes[i].axhline(mean + std,color="r",label ="+Z1")
fig.axes[i].axhline(mean - std,color="g",label ="-Z1")
#fig.axes[i].set_title(today)

fig.axes[i].axhline(round(mean +2*std,2), linestyle ="dashed",c="k",label="+Z2",lw=0.5)
fig.axes[i].axhline(round(mean -2*std,2), linestyle ="dashed",c="k",label="-Z2",lw=0.5)

today="{2} {0}:{1:,.2f}".format(KPI,last.values[0],last.index.strftime(%Y-%m-%d))
fig.suptitle("({0} _{1}) KPI performance n {2} ".format(shortname, ticker,today))
fig.autofmt_xdate(rotation=45, ha=right)

plt.subplots_adjust(wspace=0, hspace=0.5)
plt.show()

4 歷史數據分組正態化處理 -獲得Z值

infolist = ["Close","PE1","PE2","PB1","PB2","TurnoverValue","TurnoverVol"] nUQ_Stat=pd.DataFrame()nnndef COEV(x):n return( x.std()/x.mean())n ndef init(data_grp,column):n #print(df.head(2))n #df_grp =df.groupby(level="ticker")n df_Stat = data_grp[column].agg([count,np.mean, np.std,COEV,last])n n df_Stat["Zscore"]=(df_Stat["last"]-df_Stat["mean"])/df_Stat["std"]n df_Stat["Group_Type"]=columnn #print(df_Stat.columns)n return df_Stat.sort_values(by="Zscore")n nHistory_grp=history.groupby(level="ticker")nnfor info in infolist:n UQ_Stat=UQ_Stat.append(init(History_grp,info))nnnprint("{0} 只指數將被分析".format(UQ_Stat.groupby(level=0).count().shape[0]))n

5 全市場概覽 - （價格，市盈率，市凈率）

查看和比較目前所有指數的Z值平均數

[-0.5，0.5] 常態

小於-0.5,市場低估

大於0.5, 市場高估活躍

#print(UQ_Stat.shape)n#shortname =sec_map.loc[ticker]["secShortName"]ncolumns =["PE1","TurnoverValue","Close",] nnfig,axes=plt.subplots(1,3,figsize=(15,5), sharex=True, sharey=True)nfig.suptitle("10月19日指數Z值頻數圖（2007-2017） ")nfor i in range(len(columns)): n mask=UQ_Stat["Group_Type"]==columns[i]n UQ_Result=UQ_Stat.loc[mask]n #print(UQ_Result.shape,fig.axes[i])n mean = UQ_Result["Zscore"].mean() n skew = UQ_Result["Zscore"].skew() n today =UQ_Result["Zscore"].tail(1).valuesn mean_limit=0.5n #print(columns[i],mean,skew)n if (mean>=(0-mean_limit) )and (mean<=mean_limit):n color_mean = "blue"n elif mean>mean_limit:n color_mean ="Y"n elif mean<(0-mean_limit):n color_mean ="g"n fig.axes[i].hist(UQ_Result["Zscore"],bins=50,color=color_mean)n fig.axes[i].set_title("{0}".format(columns[i]))n fig.axes[i].axvline(0,color="k",linestylex=--)n fig.axes[i].set_xlabel("{0}".format(columns[i]))n #fig.axes[i].set_ylabel("指數頻數")n n #UQ_Result["Zscore"].hist(bins=50)n #fig.axes[i].legend()nn #mask2=UQ_Result["Zscore"]<=0nspace =0.2nplt.subplots_adjust(wspace=0, hspace=space)n

6 三年時間以上的指數Z值

6.1 市盈率Z值

- 最高5個指數。某指數與自己過去歷史的市盈率相比，現在所處的位置。

- 最低5個指數。某指數與自己過去歷史的市盈率相比，現在所處的位置。

我的觀點Z值越大，越高估。因為大數定理認為：Z>1, Z>2,意味著繼續變大的可能性小於16%, 5%。我的觀點Z值越小，越低估。因為大數定理認為：Z<-1, Z<-2,意味著繼續變小的可能性小於16%, 5%

綜觀550多隻指數的歷史數據。絕大部分指數的Z值都在-2，3之間。

註：少數的能源，金屬類指數曾經出現過短暫瘋狂的。Z值法就不太適用

ndays = 750nZscorelimit = -0.1nType ="PE1"nnmask1 = (UQ_Stat["Zscore"]<Zscorelimit)nmask3 = (UQ_Stat["count"] >=ndays)nmask2 = UQ_Stat["Group_Type"] == Typennmask = mask3 & mask2nUQ_Z1M =UQ_Stat.loc[mask,["Zscore","last","Group_Type"]].drop_duplicates()nUQ_Z1M=UQ_Z1M.join(sec_map[["secShortName"]]).sort_values(by="Zscore")nnUQ_Z1M=UQ_Z1M.rename(columns={"Zscore": "Z值", n "last": "最新數據",n "Group_Type":"指標類型",n "secShortName":"名稱"n }n )nUQ_Z1M.index.name="代碼"n#idx = pd.IndexSlicen#UQ_Z1M=UQ_Z1M.set_index("指標類型",append=True)n#mask = UQ_Z1M["名稱"].str.contains("餐")nn#UQ_Z1M[mask]nprint(" 550指數市盈率Z值最高5個和最低5個指數")nUQ_Z1M.iloc[np.arange(-5,5)].sort_values(by="Z值").style.bar(subset=["Z值"],align="zero", color=[ #5fba7d,#d65f5f,],width_=100/2)n

6.2 指數市盈率Z值和指數收盤價Z值加權表

mask1 =(UQ_Stat["count"]>750) n#mask2 =(UQ_Stat["Zscore"]<0.5) nmask = mask1nWeight_Close = 0.2nWeight_PE1 = 0.8nntmp =UQ_Stat[mask].pivot(columns="Group_Type",values="Zscore")ntmp["OverallScore"] =(tmp["Close"]* Weight_Close +tmp["PE1"]* Weight_PE1)nntmp = tmp[~tmp.OverallScore.isnull()].sort_values(by="OverallScore",ascending=True)ntmp =tmp.join(sec_map.secShortName)ntmp = tmp[[usecShortName,uOverallScore,uClose, uPE1,uTurnoverValue ]]ntmp.iloc[np.arange(-5,5)].sort_values(by="OverallScore").style.bar(subset=["OverallScore"],align="zero", color=[ #5fba7d,#d65f5f,],width_=100/2)n

6.3 最高和最低的指數市盈率，和收盤價例子?

指數：文化傳媒和食品飲料。排除中證電信，原因是不同數據源市盈率差異太大。
紅色的線表示，Z值=1
綠色的線表示，Z值=-1
註：

左圖：（以過去10年所有數據為基礎計算Z值）,最後一個點2017年10月18日
右圖：（以2017年的所有數據為基礎計算Z值）,最後一個點2017年10月18日

KPIs=["PE1","Close"]nsecCodes =["399248","000807"] nfor secCode in secCodes:n for KPI in KPIs:n show_KPI(history,secCode,KPI=KPI)n

分享是工作和生活中最好的投資！

謝謝指正！

本文的全文（含Python 代碼在知乎發布），主要供Python學習和量化分析學習討論.https://zhuanlan.zhihu.com/p/30273447

本文的分析結果在雪球發布，共參考指數的Z值和分析學習討論。快樂的爸: 10月18日指數Z值高低表（新增紅綠條）衡量市場，指數高低是一個難題! 價值投資者很難知道，現在是高估，還是低估? 買的是便宜還是，貴了? 應該現在買/賣，還是...