aboutsummaryrefslogtreecommitdiff
path: root/exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdSelectivity.java
AgeCommit message (Collapse)Author
2019-02-28DRILL-1328: Support table statistics - Part 2Gautam Parai
Add support for avg row-width and major type statistics. Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance. Update/fix rowcount, selectivity and ndv computations to improve plan costing. Add options for configuring collection/usage of statistics. Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs). Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries. Add support for CPU sampling and nested scalar columns. Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures. Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests. FUNCS: Statistics functions as UDFs: Separate Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A. * custom versions of "count" that always return BigInt * HyperLogLog based NDV that returns BigInt that works only on VarChars * HyperLogLog with binary output that only works on VarChars OPS: Updated protobufs for new ops OPS: Implemented StatisticsMerge OPS: Implemented StatisticsUnpivot ANALYZE: AnalyzeTable functionality * JavaCC syntax more-or-less copied from LucidDB. * (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel ANALYZE: Add getMetadataTable() to AbstractSchema USAGE: Change field access in QueryWrapper USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel * since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor * This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans. USAGE: Attach DrillStatsTable to DrillTable. * DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table * In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used. ** Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated. ** Query is set up to extract only the most recent statistics results for each column. closes #729
2018-10-25DRILL-6381: (Part 3) Planner and Execution implementation to support ↵rebase
Secondary Indexes   1. Index Planning Rules and Plan generators     - DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns. - DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations.     - Plan Generators: Covering, Non-Covering and Intersect physical plan generators.     - Support planning with functional indexes such as CAST functions.     - Enhance PlannerSettings with several configuration options for indexes.   2. Index Selection and Statistics     - An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties.     - Costing of index intersection for comparison with single-index plans.   3. Planning and execution operators     - Support RangePartitioning physical operator during query planning and execution.     - Support RowKeyJoin physical operator during query planning and execution.     - HashTable and HashJoin changes to support RowKeyJoin and Index Intersection.     - Enhance Materializer to keep track of subscan association with a particular rowkey join.   4. Index Planning utilities     - Utility classes to perform RexNode analysis, including conversion to and from SchemaPath.     - Utility class to analyze filter condition and an input collation to determine output collation.     - Helper classes to maintain index contexts for logical and physical planning phase.     - IndexPlanUtils utility class for various helper methods.   5. Miscellaneous     - Separate physical rel for DirectScan.     - Modify LimitExchangeTranspose rule to handle SingleMergeExchange. - MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema Co-authored-by: Aman Sinha <asinha@maprtech.com> Co-authored-by: chunhui-shi <cshi@maprtech.com> Co-authored-by: Gautam Parai <gparai@maprtech.com> Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com> Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com> Conflicts: exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java exec/java-exec/src/main/resources/drill-module.conf logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java Resolve merge comflicts and compilation issues.