Age | Commit message (Collapse) | Author |
|
closes #1666
|
|
Add support for avg row-width and major type statistics.
Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance.
Update/fix rowcount, selectivity and ndv computations to improve plan costing.
Add options for configuring collection/usage of statistics.
Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs).
Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries.
Add support for CPU sampling and nested scalar columns.
Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures.
Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests.
FUNCS: Statistics functions as UDFs:
Separate
Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A.
* custom versions of "count" that always return BigInt
* HyperLogLog based NDV that returns BigInt that works only on VarChars
* HyperLogLog with binary output that only works on VarChars
OPS: Updated protobufs for new ops
OPS: Implemented StatisticsMerge
OPS: Implemented StatisticsUnpivot
ANALYZE: AnalyzeTable functionality
* JavaCC syntax more-or-less copied from LucidDB.
* (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel
ANALYZE: Add getMetadataTable() to AbstractSchema
USAGE: Change field access in QueryWrapper
USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel
* since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor
* This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans.
USAGE: Attach DrillStatsTable to DrillTable.
* DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table
* In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used.
** Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated.
** Query is set up to extract only the most recent statistics results for each column.
closes #729
|
|
|
|
Note: this PR only adds support for CREATE / DROP SCHEMA commands which allow to store and delete schema. Schema usage during querying the data will be covered in other PRs.
1. Added parser methods / handles to parse CREATE / DROP schema commands.
2. Added SchemaProviders classes to separate ways of schema provision (file, table function).
3. Added schema parsing using ANTLR4 (lexer, parser, visitors).
4. Added appropriate unit tests.
close apache/drill#1615
|
|
|
|
closes #1207
|
|
Fix AssertionError: type mismatch for tests with aggregate functions.
Fix VARIANCE agg function
Remove using deprecated Subtype enum
Fix 'Failure while loading table a in database hbase' error
Fix 'Field ordinal 1 is invalid for type '(DrillRecordRow[*])'' unit test failures
|
|
- fixed all compiling errors (main changes were: Maven changes, chenges RelNode -> RelRoot, implementing some new methods from updated interfaces, chenges some literals, logger changes);
- fixed unexpected column errors, validation errors and assertion errors after Calcite update;
- fixed describe table/schema statement according to updated logic;
- added fixes with time-intervals;
- changed precision of BINARY to 65536 (was 1048576) according to updated logic (Calcite overrides bigger precision to own maxPrecision);
- ignored some incorrect tests with DRILL-3244;
- changed "Table not found" message to "Object not found within" according to new Calcite changes.
|
|
closes #1033
|
|
close apache/drill#666
|
|
1) Configuration / parsing / options / protos
2) Zookeeper integration
3) Registration / unregistration / lazy-init
4) Unit tests
This closes #574
|
|
status on command return - implement DROP TABLE IF EXISTS and DROP VIEW IF EXISTS; - added unit test for DROP TABLE IF EXISTS; - added unit test for DROP VIEW IF EXISTS; - added unit test for "IF" hive UDF.
This closes #541
|
|
close apache/drill#436.
|
|
- Create a new simplified version of the Hadoop Text class that doesn't include massive dependencies.
- Update Vectors to use new Text class.
- Update the jdbc-all module to have a test which also includes complex types.
- Clean up exclusions in Jdbc jar file to reduce file size.
- Add an enforcer rule that ensures the adbc-all jar exclusions are maintained in the future.
This closes #336.
|
|
- Extract Accountor interface from Implementation
- Separate FMPP modules to separate out Vector Needs versus external needs
- Separate out Vector classes from those that are VectorAccessible.
- Cleanup Memory Exception hiearchy
|
|
rebasing on top of master required conflict resolution in Parser.tdd and parserImpls.ftl
this closes #114
|
|
this closes #140
|
|
|
|
batch if data is oversized; add unit tests
|
|
partitioning column list is empty.
|
|
Add partition comparator function into project under writer.
|
|
- Update Large Buffer allocation so Drill releases immediately rather than waiting for Garbage Collection
- Remove DrillBuf.wrap() and all references to it.
- Update Parquet Reader to reduce object churn and indirection.
- Add additional metric to memory iterator
- Add Large and small buffer metric historgram tracking
- Add memory tracking reporter
- Update Netty to 4.0.27
|
|
|
|
and use validated rowtype from SqlValidator to honor the final output field. ProjectRemove should honor parent's output field name. Fix Parser, allow * in compound identifier in DrillParserImpl.
Make sure ProjectRemove will honor the output fieldName and use validated rowtype from SqlValidator to honor the final output field. This is required, since Drill's execution framework is name-based, different from Calcite's ordinal-based execution engine.
|
|
Add @Inject DrillBuf
Move comparison functions to memory sensitive ones
Add scalar replacement functionality for value holders
Simplify date parsing function
Add local compiled code caching
|
|
print millis
|
|
workspace
|
|
- Unfork Netty and upgrade to 4.0.20.Final
- Create support for EPOLL native impl on Linux (disable by providing -Ddrill.exec.disable_linux_epoll=false)
- Move BitData connection to Multiplex with DataTunnel level backpressure
|
|
array) as a JSON string
* This contains additional changes to the original patch which was merged.
+ Renamed "flatten" to "complex-to-json"
+ With the new patch, we return VARCHAR instead of VARBINARY.
+ Added test case.
+ Minor code re-factoring.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
accounting fixes
trim buffers
switch to using setSafe and copySafe methods only
adaptive allocation
operator based allocator wip
handle OOM
Operator Context
|
|
|
|
in queries.
|
|
Use Optiq parser template to generate Drill parser
a) exec/java-exec/pom.xml changes:
1. Write a plugin to move current existing codegen directory to target
(fmpp can't handle more than one directory as template input dir).
2. Change template directory path in fmpp plugin.
3. Extract CombinedParser.jj into target/codegen/templates directory.
4. Plugin to compile CombinedParser.jj using javacc.
b) Add parser.tdd to define values for freemarker variables in CombinedParser.jj template.
c) Define grammar and SqlCall types for new DDL statements.
d) Handlers to rewrite newly added SqlCall DDL statements as select queries from INFORMATION_SCHEMA.
|
|
|
|
Switch to Avatica based JDBC driver.
Update QuerySubmitter to support SQL queries.
Update SqlAccesors to support getObject()
Remove ref, clean up SQL packages some.
Various performance fixes. Updating result set so first set of results must be returned before control is return to client to allow metadata to populate for aggressive tools like sqlline
Move timeout functionality to TestTools.
Update Expression materializer so that it will return a nullable int if a field is not found.
Update Project record batch to support simple wildcard queries.
Updates to move JSON record reader test to expecting VarCharVector.getObject to return a String rather than a byte[].
|
|
appropriate excludes. Move Synth-log to sandbox until it is part of main mvn build to avoid weird RAT problems.
|
|
|