Now all slices are fed to the random forest to generate class vectors. The number of class vector per random forest per window per sample is simply equal to the number of slices given to the random forest Hence, if we have Nrfrandom forest per window the size of a class vector is (recall we have N samples and C classes): And finally the total size of the Multi-Grain Scanning output will be: This short calculation is just meant to give you an idea of the data processing during the Multi-Grain Scanning phase. The actual memory consumption depends on the format given (aka float, int, double, etc.) and it might be worth looking at it carefully when dealing with large datasets. 预测每根K线涨跌 获取每根k线的交易数据后,把open,close,high,low,volume,ema, macd, linreg, momentum, rsi, var, cycle, atr作为特征指标,下根K线涨跌作为预测指标 #获取当前时间 fromdatetime importdatetimenow = datetime.now() startDate = '2010-4-16' endDate = now #获取沪深300股指期货数据,频率为1分钟 df=get_price( 'IF88', start_date=startDate, end_date=endDate, frequency= '1d', fields= None, country= 'cn')open = df[ 'open'].valuesclose = df[ 'close'].valuesvolume = df[ 'volume'].valueshigh = df[ 'high'].valueslow = df[ 'low'].values importtalib asta importpandas aspd importnumpy asnp fromsklearn importpreprocessingema = ta.EMA(close, timeperiod= 30).tolist()macd = ta.MACD(close, fastperiod= 12, slowperiod= 26, signalperiod = 9)[ 0].tolist()momentum = ta.MOM(close, timeperiod= 10).tolist()rsi = ta.RSI(close, timeperiod= 14).tolist()linreg = ta.LINEARREG(close, timeperiod= 14).tolist()var = ta.VAR(close, timeperiod= 5, nbdev= 1).tolist() #获取当前的收盘价的希尔伯特变换 cycle = ta.HT_DCPERIOD(close).tolist() #获取平均真实波动范围指标ATR,时间段为14 atr = ta.ATR(high, low, close, timeperiod= 14).tolist() #把每根k线的指标放入数组X中,并转置 X = np.array([open,close,high,low,volume,ema, macd, linreg, momentum, rsi, var, cycle, atr]).T #输出可知数组X包含了ema, macd, linreg等13个指标数值 X[ 2] array([ 3215. , 3267.2, 3281.2, 3208. , 114531. , nan, nan, nan, nan, nan, nan, nan, nan])y=[]c=close[ 0] #用i遍历整个数据集 fori inrange( 1, len(X)): #如果高点突破参考线的1.0015倍,即上涨if(close[i]>close[i- 1]): #把参考点加到列表basicLine里,并且新参考点变为原来的1.0015倍,y.append( 1) elif(close[i]<close[i- 1]): y.append( 0) elif(close[i]==close[i- 1]): y.append( 2) #添加最后一个数据的标签为1 y.append( 1) #把y转化为ndarray数组 y=np.array(y) #输出验证标签集是否准确 print(len(y)) fori inrange( 1, 10): print(close[i],y[i],i) 16633214.6 1 13267.2 0 23236.2 0 33221.2 0 43219.6 0 53138.8 0 63129.0 0 73083.8 1 83107.0 0 9#把数据集分解成随机的训练和测试子集, 参数test_size表示测试集所占比例 X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size= 0.33) #输出可知测试特征集为维度是50*4的数组ndarray X_te.shape (549, 13) 首先调用和训练算法. 参数shape_1X在这里是指某一样本的维度。 我把维度也作为图像特征输入到机器里. 显然,它与iris数据集并不是很相关,但仍然需要定义 . 0.1.3版本可输入整数作为 shape_1X参数。 gcForest参数说明 shape_1X: 单个样本元素的形状[n_lines,n_cols]。 调用mg_scanning时需要!对于序列数据,可以给出单个int。 n_mgsRFtree: 多粒度扫描期间随机森林中的树木数量。 window:int(default = None) 多粒度扫描期间使用的窗口大小列表。如果“无”,则不进行切片。 stride:int(default = 1) 切片数据时使用的步骤。 cascade_test_size:float或int(default = 0.2) 级联训练集分裂的分数或绝对数。 n_cascadeRF:int(default = 2) 级联层中随机森林的数量,对于每个伪随机森林,创建完整的随机森林,因此一层中随机森林的总数将为2 * n_cascadeRF。 n_cascadeRFtree:int(default = 101) 级联层中单个随机森林中的树数。 min_samples_mgs:float或int(default = 0.1) (责任编辑:本港台直播) |