本港台开奖现场直播 j2开奖直播报码现场
当前位置: 新闻频道 > IT新闻 >

码报:战斗民族开源神器ClickHouse:一款适合于构建量化(2)

时间:2017-07-28 19:12来源:香港现场开奖 作者:118开奖 点击:
clickhouse-client -- query ='INSERT INTO table VALUES' data.txtclickhouse-client -- query ='INSERT INTO table FORMAT TabSeparated' data.tsv 给样本数据库创建表 创建表的语句: $ clickhou

clickhouse-client -- query='INSERT INTO tableVALUES' < data.txtclickhouse-client -- query='INSERT INTO tableFORMATTabSeparated' < data.tsv

给样本数据库创建表

创建表的语句:

$ clickhouse-client --multilineClickHouse client version 0.0.53720.Connecting to localhost: 9000.Connected to ClickHouse server version 0.0.53720.:) CREATE TABLE ontime( Year UInt16, Quarter UInt8, Month UInt8, DayofMonth UInt8, DayOfWeek UInt8, FlightDate Date, UniqueCarrier FixedString( 7), AirlineID Int32, Carrier FixedString( 2), TailNum String, FlightNum String, OriginAirportID Int32, OriginAirportSeqID Int32, OriginCityMarketID Int32, Origin FixedString( 5), OriginCityName String, OriginState FixedString( 2), OriginStateFips String, OriginStateName String, OriginWac Int32, DestAirportID Int32, DestAirportSeqID Int32, DestCityMarketID Int32, Dest FixedString( 5), DestCityName String, DestState FixedString( 2), DestStateFips String, DestStateName String, DestWac Int32, CRSDepTime Int32, DepTime Int32, DepDelay Int32, DepDelayMinutes Int32, DepDel15 Int32, DepartureDelayGroups String, DepTimeBlk String, TaxiOut Int32, WheelsOff Int32, WheelsOn Int32, TaxiIn Int32, CRSArrTime Int32, ArrTime Int32, ArrDelay Int32, ArrDelayMinutes Int32, ArrDel15 Int32, ArrivalDelayGroups Int32, ArrTimeBlk String, Cancelled UInt8, CancellationCode FixedString( 1), Diverted UInt8, CRSElapsedTime Int32, ActualElapsedTime Int32, AirTime Int32, Flights Int32, Distance Int32, DistanceGroup UInt8, CarrierDelay Int32, WeatherDelay Int32, NASDelay Int32, SecurityDelay Int32, LateAircraftDelay Int32, FirstDepTime String, TotalAddGTime String, LongestAddGTime String, DivAirportLandings String, DivReachedDest String, DivActualElapsedTime String, DivArrDelay String, DivDistance String, Div1Airport String, Div1AirportID Int32, Div1AirportSeqID Int32, Div1WheelsOn String, Div1TotalGTime String, Div1LongestGTime String, Div1WheelsOff String, Div1TailNum String, Div2Airport String, Div2AirportID Int32, Div2AirportSeqID Int32, Div2WheelsOn String, Div2TotalGTime String, Div2LongestGTime String, Div2WheelsOff String, Div2TailNum String, Div3Airport String, Div3AirportID Int32, Div3AirportSeqID Int32, Div3WheelsOn String, Div3TotalGTime String, Div3LongestGTime String, Div3WheelsOff String, Div3TailNum String, Div4Airport String, Div4AirportID Int32, Div4AirportSeqID Int32, Div4WheelsOn String, Div4TotalGTime String, Div4LongestGTime String, Div4WheelsOff String, Div4TailNum String, Div5Airport String, Div5AirportID Int32, Div5AirportSeqID Int32, Div5WheelsOn String, Div5TotalGTime String, Div5LongestGTime String, Div5WheelsOff String, Div5TailNum String)ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);

现在我们有了一个MergeTree类型的表。推荐在生产环境中使用MergeTree类型的表。这种类型的表有一个用于增量排序的主键,允许通过主键快速执行查询。

导入数据xz -v -c - d< ontime.csv.xz | clickhouse-client -- query= "INSERT INTO ontime FORMAT CSV"

Clickhouse的INSERT语句允许以任何支持的格式导入数据。数据导入只需要O(1)的RAM消耗。INSERT语句可以接受任何大小数据量作为输入。强烈建议不要以太小的数据块插入数据。注意,以大小为max_insert_block_size(默认为1048576行)的块进行插入是一个原子操作(即数据块要么完全插入,要么完全不插入)。要是在插入操作的时候断开了连接,你可能不清楚数据块是否成功插入了。为了实现妥妥地一次性完工,Clickhouse支持复制表的幂等性。这意味着你可以重试插入相同的数据块(可能在不同的副本上),但实际上这个数据块只插入了一次。由于本指南中是从本地导入数据,所以暂时不care数据块生成和一次准确性问题。

用INSERT语句向MergeTree类型表中插数据的操作是非阻塞的,SELECT操作也是非阻塞式的,你可以在插入操作后即刻执行SELECT查询。

我们的示例数据集有点不太理想,有两点原因:

第一个原因是示例中的字符串(String)数据用枚举(Enum)或者数值类型是最合适的。

(责任编辑:本港台直播)
顶一下
(0)
0%
踩一下
(0)
0%
------分隔线----------------------------
栏目列表
推荐内容