Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-02-28 | DRILL-1328: Support table statistics - Part 2 | Gautam Parai | |
Add support for avg row-width and major type statistics. Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance. Update/fix rowcount, selectivity and ndv computations to improve plan costing. Add options for configuring collection/usage of statistics. Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs). Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries. Add support for CPU sampling and nested scalar columns. Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures. Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests. FUNCS: Statistics functions as UDFs: Separate Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A. * custom versions of "count" that always return BigInt * HyperLogLog based NDV that returns BigInt that works only on VarChars * HyperLogLog with binary output that only works on VarChars OPS: Updated protobufs for new ops OPS: Implemented StatisticsMerge OPS: Implemented StatisticsUnpivot ANALYZE: AnalyzeTable functionality * JavaCC syntax more-or-less copied from LucidDB. * (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel ANALYZE: Add getMetadataTable() to AbstractSchema USAGE: Change field access in QueryWrapper USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel * since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor * This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans. USAGE: Attach DrillStatsTable to DrillTable. * DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table * In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used. ** Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated. ** Query is set up to extract only the most recent statistics results for each column. closes #729 | |||
2018-10-25 | DRILL-6381: (Part 3) Planner and Execution implementation to support ↵ | rebase | |
Secondary Indexes 1. Index Planning Rules and Plan generators - DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns. - DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations. - Plan Generators: Covering, Non-Covering and Intersect physical plan generators. - Support planning with functional indexes such as CAST functions. - Enhance PlannerSettings with several configuration options for indexes. 2. Index Selection and Statistics - An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties. - Costing of index intersection for comparison with single-index plans. 3. Planning and execution operators - Support RangePartitioning physical operator during query planning and execution. - Support RowKeyJoin physical operator during query planning and execution. - HashTable and HashJoin changes to support RowKeyJoin and Index Intersection. - Enhance Materializer to keep track of subscan association with a particular rowkey join. 4. Index Planning utilities - Utility classes to perform RexNode analysis, including conversion to and from SchemaPath. - Utility class to analyze filter condition and an input collation to determine output collation. - Helper classes to maintain index contexts for logical and physical planning phase. - IndexPlanUtils utility class for various helper methods. 5. Miscellaneous - Separate physical rel for DirectScan. - Modify LimitExchangeTranspose rule to handle SingleMergeExchange. - MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema Co-authored-by: Aman Sinha <asinha@maprtech.com> Co-authored-by: chunhui-shi <cshi@maprtech.com> Co-authored-by: Gautam Parai <gparai@maprtech.com> Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com> Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com> Conflicts: exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java exec/java-exec/src/main/resources/drill-module.conf logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java Resolve merge comflicts and compilation issues. |