bigdata/drill.git - Apache Drill CI loop

Age	Commit message (Collapse)	Author
2019-03-14	DRILL-7068: Support memory adjustment framework for resource management with ↵	HanumathRao
	Queues. closes #1677
2019-02-28	DRILL-1328: Support table statistics - Part 2	Gautam Parai
	Add support for avg row-width and major type statistics. Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance. Update/fix rowcount, selectivity and ndv computations to improve plan costing. Add options for configuring collection/usage of statistics. Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs). Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries. Add support for CPU sampling and nested scalar columns. Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures. Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests. FUNCS: Statistics functions as UDFs: Separate Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A. * custom versions of "count" that always return BigInt * HyperLogLog based NDV that returns BigInt that works only on VarChars * HyperLogLog with binary output that only works on VarChars OPS: Updated protobufs for new ops OPS: Implemented StatisticsMerge OPS: Implemented StatisticsUnpivot ANALYZE: AnalyzeTable functionality * JavaCC syntax more-or-less copied from LucidDB. * (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel ANALYZE: Add getMetadataTable() to AbstractSchema USAGE: Change field access in QueryWrapper USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel * since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor * This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans. USAGE: Attach DrillStatsTable to DrillTable. * DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table * In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used. Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated. Query is set up to extract only the most recent statistics results for each column. closes #729
2018-08-28	DRILL-6422: Replace guava imports with shaded ones	Volodymyr Vysotskyi

2018-06-06	DRILL-6438: Remove excess logging form the tests.	Timothy Farkas
	- Removed usages of System.out and System.err from the test and replaced with loggers closes #1284
2018-04-17	DRILL-6295: PartitionerDecorator may close partitioners while CustomRunnable ↵	Vlad Rozov
	are active during query cancellation This closes #1208
2018-01-26	DRILL-5730: Mock testing improvements and interface improvements	Timothy Farkas
	closes #1045
2017-11-14	DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories	Timothy Farkas
	This change includes: DRILL-5783: - A unit test is created for the priority queue in the TopN operator. - The code generation classes passed around a completely unused function registry reference in some places so it is removed. - The priority queue had unused parameters for some of its methods so it is removed. DRILL-5841: - Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them. DRILL-5894: - Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin. Misc: - General code cleanup. - Removed unnecessary use of String.format in the tests. This closes #984
2017-10-09	DRILL-5716: Queue-driven memory allocation	Paul Rogers
	* Creates new core resource management and query queue abstractions. * Adds queue information to the Protobuf layer. * Foreman and Planner changes - Abstracts memory management out to the new resource management layer. This means deferring generating the physical plan JSON to later in the process after memory planning. * Web UI changes * Adds queue information to the main page and the profile page to each query. * Also sorts the list of options displayed in the Web UI. - Added memory reserve A new config parameter, exec.queue.memory_reserve_ratio, sets aside a slice of total memory for operators that do not participate in the memory assignment process. The default is 20% testing will tell us if that value should be larger or smaller. * Additional minor fixes - Code cleanup. - Added mechanism to abandon lease release during shutdown. - Log queue configuration only when the config changes, rather than on every query. - Apply Boaz’ option to enforce a minimum memory allocation per operator. - Additional logging to help testers see what is happening. closes #928
2017-10-04	DRILL-5752 this change includes:	Timothy Farkas
	1. Increased test parallelism and fixed associated bugs 2. Added test categories and categorized tests appropriately - Don't exclude anything by default - Increase test timeout - Fixed flakey test closes #940
2017-09-16	DRILL-5723: Added System Internal Options That can be Modified at Runtime	Timothy Farkas
	Changes include: 1. Addition of internal options. 2. Refactoring of OptionManagers and OptionValidators. 3. Fixed ambiguity in the meaning of an option type, and changed its name to accessibleScopes. 4. Updated javadocs in the Option System classes. 5. Added RestClientFixture for testing the Rest API. 6. Fixed flakey test in TestExceptionInjection caused by race condition. 7. Fixed various tests which started zookeeper but failed to shut it down at the end of tests. 8. Added port hunting to the Drill Webserver for testing 9. Fixed various flaky tests 10. Fix compile issue closes #923
2017-09-05	DRILL-5546: Handle schema change exception failure caused by empty input or ↵	Jinfeng Ni
	empty batch. 1. Modify ScanBatch's logic when it iterates list of RecordReader. 1) Skip RecordReader if it returns 0 row && present same schema. A new schema (by calling Mutator.isNewSchema() ) means either a new top level field is added, or a field in a nested field is added, or an existing field type is changed. 2) Implicit columns are presumed to have constant schema, and are added to outgoing container before any regular column is added in. 3) ScanBatch will return NONE directly (called as "fast NONE"), if all its RecordReaders haver empty input and thus are skipped, in stead of returing OK_NEW_SCHEMA first. 2. Modify IteratorValidatorBatchIterator to allow 1) fast NONE ( before seeing a OK_NEW_SCHEMA) 2) batch with empty list of columns. 2. Modify JsonRecordReader when it get 0 row. Do not insert a nullable-int column for 0 row input. Together with ScanBatch, Drill will skip empty json files. 3. Modify binary operators such as join, union to handle fast none for either one side or both sides. Abstract the logic in AbstractBinaryRecordBatch, except for MergeJoin as its implementation is quite different from others. 4. Fix and refactor union all operator. 1) Correct union operator hanndling 0 input rows. Previously, it will ignore inputs with 0 row and put nullable-int into output schema, which causes various of schema change issue in down-stream operator. The new behavior is to take schema with 0 into account in determining the output schema, in the same way with > 0 input rows. By doing that, we ensure Union operator will not behave like a schema-lossy operator. 2) Add a UnionInputIterator to simplify the logic to iterate the left/right inputs, removing significant chunk of duplicate codes in previous implementation. The new union all operator reduces the code size into half, comparing the old one. 5. Introduce UntypedNullVector to handle convertFromJson() function, when the input batch contains 0 row. Problem: The function convertFromJSon() is different from other regular functions in that it only knows the output schema after evaluation is performed. When input has 0 row, Drill essentially does not have a way to know the output type, and previously will assume Map type. That works under the assumption other operators like Union would ignore batch with 0 row, which is no longer the case in the current implementation. Solution: Use MinorType.NULL at the output type for convertFromJSON() when input contains 0 row. The new UntypedNullVector is used to represent a column with MinorType.NULL. 6. HBaseGroupScan convert star column into list of row_key and column family. HBaseRecordReader should reject column star since it expectes star has been converted somewhere else. In HBase a column family always has map type, and a non-rowkey column always has nullable varbinary type, this ensures that HBaseRecordReader across different HBase regions will have the same top level schema, even if the region is empty or prune all the rows due to filter pushdown optimization. In other words, we will not see different top level schema from different HBaseRecordReader for the same table. However, such change will not be able to handle hard schema change : c1 exists in cf1 in one region, but not in another region. Further work is required to handle hard schema change. 7. Modify scan cost estimation when the query involves * column. This is to remove the planning randomness since previously two different operators could have same cost. 8. Add a new flag 'outputProj' to Project operator, to indicate if Project is for the query's final output. Such Project is added by TopProjectVisitor, to handle fast NONE when all the inputs to the query are empty and are skipped. 1) column star is replaced with empty list 2) regular column reference is replaced with nullable-int column 3) An expression will go through ExpressionTreeMaterializer, and use the type of materialized expression as the output type 4) Return an OK_NEW_SCHEMA with the schema using the above logic, then return a NONE to down-stream operator. 9. Add unit test to test operators handling empty input. 10. Add unit test to test query when inputs are all empty. DRILL-5546: Revise code based on review comments. Handle implicit column in scan batch. Change interface in ScanBatch's constructor. 1) Ensure either the implicit column list is empty, or all the reader has the same set of implicit columns. 2) We could skip the implicit columns when check if there is a schema change coming from record reader. 3) ScanBatch accept a list in stead of iterator, since we may need go through the implicit column list multiple times, and verify the size of two lists are same. ScanBatch code review comments. Add more unit tests. Share code path in ProjectBatch to handle normal setupNewSchema() and handleNullInput(). - Move SimpleRecordBatch out of TopNBatch to make it sharable across different places. - Add Unit test verify schema for star column query against multilevel tables. Unit test framework change - Fix memory leak in unit test framework. - Allow SchemaTestBuilder to pass in BatchSchema. close #906
2017-08-25	DRILL-5547: Linking config options with system option manager	Jyothsna Donapati
	closes #868
2017-02-03	DRILL-5043: Function that returns a unique id per session/connection similar ↵	Nagarajan Chinnasamy
	to MySQL's CONNECTION_ID() #685
2017-01-10	DRILL-5116: Enable generated code debugging in each Drill operator	Paul Rogers
	DRILL-5052 added the ability to debug generated code. The reviewer suggested permitting the technique to be used for all Drill operators. This PR provides the required fixes. Most were small changes, others dealt with the rather clever way that the existing byte-code merge converted static nested classes to non-static inner classes, with the way that constructors were inserted at the byte-code level and so on. See the JIRA for the details. This code passed the unit tests twice: once with the traditional byte-code manipulations, a second time using "plain-old Java" code compilation. Plain-old Java is turned off by default, but can be turned on for all operators with a single config change: see the JIRA for info. Consider the plain-old Java option to be experimental: very handy for debugging, perhaps not quite tested enough for production use. close apache/drill#716
2015-11-12	DRILL-3987: (REFACTOR) Common and Vector modules building.	Jacques Nadeau
	- Extract Accountor interface from Implementation - Separate FMPP modules to separate out Vector Needs versus external needs - Separate out Vector classes from those that are VectorAccessible. - Cleanup Memory Exception hiearchy
2015-05-07	DRILL-2809: Increase the default value of partitioner_sender_threads_factor.	Aman Sinha

2015-05-05	DRILL-2902: Add support for context functions: user (synonyms session_user ↵	vkorukanti
	and system_user) and current_schema
2015-05-02	DRILL-2826: Simplify and centralize Operator Cleanup	Jacques Nadeau
	- Remove cleanup method from RecordBatch interface - Make OperatorContext creation and closing the management of FragmentContext - Make OperatorContext an abstract class and the impl only available to FragmentContext - Make RecordBatch closing the responsibility of the RootExec - Make all closes be suppresing closes to maximize memory release in failure - Add new CloseableRecordBatch interface used by RootExec - Make RootExec AutoCloseable - Update RecordBatchCreator to return CloseableRecordBatches so that RootExec can maintain list - Generate list of operators through change in ImplCreator
2015-04-03	DRILL-2674: Add user authenticator interface and PAM based implementation.	vkorukanti

2015-03-18	DRILL-2210 Introducing multithreading capability to PartitonerSender	Yuliya Feldman