bigdata/drill.git - Apache Drill CI loop

Age	Commit message (Collapse)	Author
2019-03-14	DRILL-2326: Fix scalar replacement for the case when static method which ↵	Volodymyr Vysotskyi
	does not return values is called - Fix check for return function value to handle the case when created object is returned without assigning it to the local variable closes #1687
2019-03-01	DRILL-6927: Avoid double conversion from impala timestamp when hive native ↵	Volodymyr Vysotskyi
	parquet reader is used closes #1655
2019-01-25	DRILL-6977: Improve Hive tests configuration	Igor Guzenko
	1. HiveTestBase data initialization moved to static block to be initialized once for all derivatives. 2. Extracted Hive driver and storage plugin management from HiveTestDataGenerator to HiveTestFixture class. This increased cohesion of generator and added loose coupling between hive test configuration and data generation tasks. 3. Replaced usage of Guava ImmutableLists with TestBaseViewSupport helper methods by using standard JDK collections. closes #1613
2019-01-03	DRILL-540: Allow querying hive views in Drill	Igor Guzenko
	1. Added DrillHiveViewTable which allows construction of DrillViewTable based on Hive metadata 2. Added initialization of DrillHiveViewTable in HiveSchemaFactory 3. Extracted conversion of Hive data types from DrillHiveTable to HiveToRelDataTypeConverter 4. Removed throwing of UnsupportedOperationException from HiveStoragePlugin 5. Added TestHiveViewsSupport and authorization tests 6. Added closeSilently() method to AutoCloseables closes #1559
2018-11-15	DRILL-6744: Support varchar and decimal push down	Arina Ielchiieva
	1. Added enableStringsSignedMinMax parquet format plugin config and store.parquet.reader.strings_signed_min_max session option to control reading binary statistics for files generated by prior versions of Parquet 1.10.0. 2. Added ParquetReaderConfig to store configuration needed during reading parquet statistics or files. 3. Provided mechanism to enable varchar / decimal filter push down. 4. Added VersionUtil to compare Drill versions in string representation. 5. Added appropriate unit tests. closes #1537
2018-11-09	DRILL-4456: Add Hive translate UDF	Volodymyr Vysotskyi
	closes #1527
2018-08-28	DRILL-6422: Replace guava imports with shaded ones	Volodymyr Vysotskyi

2018-08-27	DRILL-6492: Ensure schema / workspace case insensitivity in Drill	Arina Ielchiieva
	1. StoragePluginsRegistryImpl was updated: a. for backward compatibility at init to convert all existing storage plugins names to lower case, in case of duplicates, to log warning and skip the duplicate. b. to wrap persistent plugins registry into case insensitive store wrapper (CaseInsensitivePersistentStore) to ensure all given keys are converted into lower case when performing insert, update, delete, search operations. c. to load system storage plugins dynamically by @SystemStorage annotation. 2. StoragePlugins class was updated to stored storage plugins configs by name in case insensitive map. 3. SchemaUtilities.searchSchemaTree method was updated to convert all schema names into lower case to ensure that are they are matched case insensitively (all schemas are stored in Drill in lower case). 4. FileSystemConfig was updated to store workspaces by name in case insensitive hash map. 5. All plugins schema factories are now extend AbstractSchemaFactory to ensure that given schema name is converted to lower case. 6. New method areTableNamesAreCaseInsensitive was added to AbstractSchema to indicate if schema tables names are case insensitive. By default, false. Schema implementation is responsible for table names case insensitive search in case it supports one. Currently, information_schema, sys and hive do so. 7. System storage plugins (information_schema, sys) were refactored to ensure their schema, table names are case insensitive, also the annotation @SystemPlugin and additional constructor were added to allow dynamically load system plugins at storage plugin registry during init phase. 8. MetadataProvider was updated to concert all schema filter conditions into lower case to ensure schema would be matched case insensitively. 9. ShowSchemasHandler, ShowTablesHandler, DescribeTableHandler were updated to ensure schema / tables names (this depends if schema supports case insensitive table names) would be found case insensitively. git closes #1439
2018-08-10	DRILL-6656: Disallow extra semicolons and multiple statements on the same line.	Timothy Farkas
	closes #1415
2018-07-09	DRILL-6575: Add store.hive.conf.properties option to allow set Hive ↵	Arina Ielchiieva
	properties at session level closes #1365
2018-06-22	DRILL-6454: Native MapR DB plugin support for Hive MapR-DB json table	Vitalii Diravka
	closes #1314
2018-06-06	DRILL-6438: Remove excess logging form the tests.	Timothy Farkas
	- Removed usages of System.out and System.err from the test and replaced with loggers closes #1284
2018-05-11	DRILL-6242 Use java.time.Local{Date\|Time\|DateTime} for Drill Date, Time, ↵	jiang-wu
	Timestamp types. (#3) close apache/drill#1247 * DRILL-6242 - Use java.time.Local{Date\|Time\|DateTime} classes to hold values from corresponding Drill date, time, and timestamp types. Conflicts: exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/ExtendedJsonOutput.java Fix merge conflicts and check style.
2018-04-29	DRILL-6173: Support transitive closure during filter push down and partition ↵	Vitalii Diravka
	pruning closes #1216
2018-04-27	DRILL-6331: Revisit Hive Drill native parquet implementation to be exposed ↵	Arina Ielchiieva
	to Drill optimizations (filter / limit push down, count to direct scan) 1. Factored out common logic for Drill parquet reader and Hive Drill native parquet readers: AbstractParquetGroupScan, AbstractParquetRowGroupScan, AbstractParquetScanBatchCreator. 2. Rules that worked previously only with ParquetGroupScan, now can be applied for any class that extends AbstractParquetGroupScan: DrillFilterItemStarReWriterRule, ParquetPruneScanRule, PruneScanRule. 3. Hive populated partition values based on information returned from Hive metastore. Drill populates partition values based on path difference between selection root and actual file path. Before ColumnExplorer populated partition values based on Drill approach. Since now ColumnExplorer populates values for parquet files from Hive tables, `populateImplicitColumns` method logic was changed to populated partition columns only based on given partition values. 4. Refactored ParquetPartitionDescriptor to be responsible for populating partition values rather than storing this logic in parquet group scan class. 5. Metadata class was moved to separate metadata package (org.apache.drill.exec.store.parquet.metadata). Factored out several inner classed to improve code readability. 6. Collected all Drill native parquet reader unit tests into one class TestHiveDrillNativeParquetReader, also added new tests to cover new functionality. 7. Reduced excessive logging when parquet files metadata is read closes #1214
2018-04-17	DRILL-6320: Fixed license headers.	Drill Dev
	closes #1207
2018-02-23	DRILL-5978: Updating of Apache and MapR Hive libraries to 2.3.2 and ↵	Vitalii Diravka
	2.1.2-mapr-1710 versions respectively * Improvements to allow of reading Hive bucketed transactional ORC tables; * Updating hive properties for tests and resolving dependencies and API conflicts: - Fix for "hive.metastore.schema.verification", MetaException(message: Version information not found in metastore) https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool METASTORE_SCHEMA_VERIFICATION="false" property is added - Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional tables are necessary in Hive metastore - Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property - jackson and parquet libraries are relocated in hive-exec-shade module - org.apache.parquet:parquet-column Drill version is added to "hive-exec" to allow of using Parquet empty group on MessageType level (PARQUET-278) - Removing of commons-codec exclusion from hive core. This dependency is necessary for hive-exec and hive-metastore. - Setting Hive internal properties for transactional scan: HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION, IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES - "io.dropwizard.metrics:metrics-core" with last 4.0.2 version is added to dependencyManagement block in Drill root POM - Exclusion of "hive-exec" in "hive-hbase-handler" is already in Drill root dependencyManagement POM - Hive Calcite libraries are excluded (Calcite CBO was disabled) - "jackson-core" dependency is added to DependencyManagement block in Drill root POM file - For MapR Hive 2.1 client older "com.fasterxml.jackson.core:jackson-databind" is included - "log4j:log4j" dependency is excluded from "hive-exec", "hive-metastore", "hive-hbase-handler". close apache/drill#1111
2018-02-16	DRILL-6130: Fix NPE during physical plan submission for various storage plugins	Arina Ielchiieva
	1. Fixed ser / de issues for Hive, Kafka, Hbase plugins. 2. Added physical plan submission unit test for all storage plugins in contrib module. 3. Refactoring. closes #1108
2018-01-31	DRILL-6106: Use valueOf method instead of constructor since valueOf has a ↵	reudismam
	higher performance by caching frequently requested values. closes #1099
2018-01-26	DRILL-5730: Mock testing improvements and interface improvements	Timothy Farkas
	closes #1045
2018-01-16	DRILL-3993: Fix unit test failures connected with support Calcite 1.13	Volodymyr Vysotskyi
	- Use root schema as default for describe table statement. Fix TestOpenTSDBPlugin.testDescribe() and TestInfoSchemaOnHiveStorage.varCharMaxLengthAndDecimalPrecisionInInfoSchema() unit tests. - Modify expected results for tests: TestPreparedStatementProvider.invalidQueryValidationError(); TestProjectPushDown.testTPCH1(); TestProjectPushDown.testTPCH3(); TestStorageBasedHiveAuthorization.selectUser1_db_u0_only(); TestStorageBasedHiveAuthorization.selectUser0_db_u1g1_only() - Fix TestCTAS.whenTableQueryColumnHasStarAndTableFiledListIsSpecified(), TestViewSupport.createViewWhenViewQueryColumnHasStarAndViewFiledListIsSpecified(), TestInbuiltHiveUDFs.testIf(), testDisableUtf8SupportInQueryString unit tests. - Fix UnsupportedOperationException and NPE for jdbc tests. - Fix AssertionError: Conversion to relational algebra failed to preserve datatypes DrillCompoundIdentifier: According to the changes, made in [CALCITE-546], star Identifier is replaced by empty string during parsing the query. Since Drill uses its own DrillCompoundIdentifier, it should also replace star by empty string before creating SqlIdentifier instance to avoid further errors connected with star column. see SqlIdentifier.isStar() method. SqlConverter: In [CALCITE-1417] added simplification of expressions which should be projected every time when a new project rel node is created using RelBuilder. It causes assertion errors connected with types nullability. This hook was set to false to avoid project expressions simplification. See usage of this hook and RelBuilder.project() method. In Drill the type nullability of the function depends on only the nullability of its arguments. In some cases, a function may return null value even if it had non-nullable arguments. When Calice simplifies expressions, it checks that the type of the result is the same as the type of the expression. Otherwise, makeCast() method is called. But when a function returns null literal, this cast does nothing, even when the function has a non-nullable type. So to avoid this issue, method makeCast() was overridden. DrillAvgVarianceConvertlet: Problem with sum0 and specific changes in old Calcite (it is CALCITE-777). (see HistogramShuttle.visitCall method) Changes were made to avoid changes in Calcite. SqlConverter, DescribeTableHandler, ShowTablesHandler: New Calcite tries to combine both default and specified workspaces during the query validation. In some cases, for example, when describe table statement is used, Calcite tries to find INFORMATION_SCHEMA in the schema used as default. When it does not find the schema, it tries to find a table with such name. For some storage plugins, such as opentsdb and hbase, when a table was not found, the error is thrown, and the query fails. To avoid this issue, default schema was changed to root schema for validation stage for describe table and show tables queries.
2017-11-29	DRILL-5989: Categories some tests to speed up smoke tests. Made travis run ↵	Timothy Farkas
	tests. closes #1053
2017-11-22	DRILL-5941: Skip header / footer improvements for Hive storage plugin	Arina Ielchiieva
	Overview: 1. When table has header / footer process input splits fo the same file in one reader (bug fix for DRILL-5941). 2. Apply skip header logic during reader initialization only once to avoid checks during reading the data (DRILL-5106). 3. Apply skip footer logic only when footer is more then 0, otherwise default processing will be done without buffering data in queue (DRILL-5106). Code changes: 1. AbstractReadersInitializer was introduced to factor out common logic during readers intialization. It will have two implementations: a. Default (each input split group gets its own reader); b. Empty (for empty tables); 2. AbstractRecordsInspector was introduced to improve performance when table has footer is less or equals to 0. It will have two implementations: a. Default (records will be processed one by one without buffering); b. SkipFooter (queue will be used to buffer N records that should be skipped in the end of file processing). 3. When text table has header / footer each table file should be read as one unit. When file is being read as several input splits, they should be grouped. For this purpose LogicalInputSplit class was introduced which replaced InputSplitWrapper class. New class stores list of grouped input splits and returns information about splits on group level. Please note, during planning input splits are grouped only when data is being read from text table has header / footer each table, otherwise each input split is treated separately. 4. Allow HiveAbstractReader to have multiple input splits instead of one. This closes #1030
2017-11-14	DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories	Timothy Farkas
	This change includes: DRILL-5783: - A unit test is created for the priority queue in the TopN operator. - The code generation classes passed around a completely unused function registry reference in some places so it is removed. - The priority queue had unused parameters for some of its methods so it is removed. DRILL-5841: - Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them. DRILL-5894: - Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin. Misc: - General code cleanup. - Removed unnecessary use of String.format in the tests. This closes #984
2017-11-13	DRILL-5832: Change OperatorFixture to use system option manager	Paul Rogers
	- Rename FixtureBuilder to ClusterFixtureBuilder - Provide alternative way to reset system/session options - Fix for DRILL-5833: random failure in TestParquetWriter - Provide strict, but clear, errors for missing options closes #970
2017-10-30	DRILL-5772: Enable UTF-8 support in query string by default	Arina Ielchiieva
	1. Bump up Drill Calcite version to in include CALCITE-2014 changes. 2. Add saffron.properties file to the Drill conf folder. 3. Add appopriate unit tests. closes #936
2017-10-04	DRILL-5752 this change includes:	Timothy Farkas
	1. Increased test parallelism and fixed associated bugs 2. Added test categories and categorized tests appropriately - Don't exclude anything by default - Increase test timeout - Fixed flakey test closes #940
2017-09-19	DRILL-5002: Using hive's date functions on top of date column gives wrong ↵	Vitalii Diravka
	results for local time-zone closes #937
2017-09-16	DRILL-5723: Added System Internal Options That can be Modified at Runtime	Timothy Farkas
	Changes include: 1. Addition of internal options. 2. Refactoring of OptionManagers and OptionValidators. 3. Fixed ambiguity in the meaning of an option type, and changed its name to accessibleScopes. 4. Updated javadocs in the Option System classes. 5. Added RestClientFixture for testing the Rest API. 6. Fixed flakey test in TestExceptionInjection caused by race condition. 7. Fixed various tests which started zookeeper but failed to shut it down at the end of tests. 8. Added port hunting to the Drill Webserver for testing 9. Fixed various flaky tests 10. Fix compile issue closes #923
2017-05-19	DRILL-3250: Drill fails to compare multi-byte characters from hive table - A ↵	Vitalii Diravka
	small refactoring of original fix of this issue (DRILL-4039); - Added test for the fix.
2017-05-12	DRILL-5459: Extend physical operator test framework to test mini plans ↵	Jinfeng Ni
	consisting of multiple operators. This closes #823
2017-05-12	DRILL-5419: Calculate return string length for literals & some string functions	Arina Ielchiieva
	1. Revisited calculation logic for string literals and some string functions (cast, upper, lower, initcap, reverse, concat, concat operator, rpad, lpad, case statement, coalesce, first_value, last_value, lag, lead). Synchronized return type length calculation logic between limit 0 and regular queries. 2. Deprecated width and changed it to precision for string types in MajorType. 3. Revisited FunctionScope and splitted it into FunctionScope and ReturnType. FunctionScope will indicate only function usage in term of number of in / out rows, (n -> 1, 1 -> 1, 1->n). New annotation in UDFs ReturnType will indicate which return type strategy should be used. 4. Changed MAX_VARCHAR_LENGTH from 65536 to 65535. 5. Updated calculation of precision and display size for INTERVALYEAR & INTERVALDAY. 6. Refactored part of function code-gen logic (ValueReference, WorkspaceReference, FunctionAttributes, DrillFuncHolder). This closes #819
2017-01-13	DRILL-4868: fix how hive function set DrillBuf.	chunhui-shi
	This closes #695
2016-12-19	DRILL-5032: Drill query on hive parquet table failed with OutOfMemoryError: ↵	Serhii-Harnyk
	Java heap space close apache/drill#654
2016-10-27	DRILL-4826: Query against INFORMATION_SCHEMA.TABLES degrades as the number ↵	Parth Chandra
	of views increases This closes #592
2016-10-10	DRILL-4618: Correct the usage of random flag in Hive function registry	chunhui-shi
	+ Function visitor should not use previous function holder if this function is non-deterministic closes #509
2016-07-22	DRILL-4673: Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED ↵	Vitalii Diravka
	status on command return - implement DROP TABLE IF EXISTS and DROP VIEW IF EXISTS; - added unit test for DROP TABLE IF EXISTS; - added unit test for DROP VIEW IF EXISTS; - added unit test for "IF" hive UDF. This closes #541
2016-05-02	DRILL-4577: Construct a specific path for querying all the tables from a ↵	Hsuan-Yi Chu
	hive database
2016-04-19	DRILL-4459: Resolve SchemaChangeException while querying hive json table	Vitalii Diravka
	- Replace drill var16char to varchar datatype for hive string datatype - Change testGenericUDF() and testUDF() to use VarChar instead of Var16Char - Add unit test for hive GET_JSON_OBJECT UDF closes #431
2016-03-22	DRILL-3623: For limit 0 queries, optionally use a shorter execution path ↵	Sudheesh Katkam
	when result column types are known + "planner.enable_limit0_optimization" option is disabled by default + Print plan in PlanTestBase if TEST_QUERY_PRINTING_SILENT is set + Fix DrillTestWrapper to verify expected and actual schema + Correct the schema of results in TestInbuiltHiveUDFs#testXpath_Double This closes #405
2016-03-18	DRILL-4372: (continued) Support for Window functions:	Hsuan-Yi Chu
	- CUME_DIST - DENSE_RANK - PERCENT_RANK - RANK - ROW_NUMBER - NTILE - LEAD - LAG - FIRST_VALUE - LAST_VALUE
2016-03-16	DRILL-4372: (continued) Type inference for HiveUDFs	Hsuan-Yi Chu

2016-03-14	DRILL-3745: Hive CHAR not supported	Arina Ielchiieva

2016-03-09	DRILL-4441: Fix varchar data read out of Avro filtering incorrectly due to ↵	Jason Altekruse
	metadata bug The precision of the Varchar datatype was not being set causing inconsistent truncation of values to the default length of 1. Fixed the same issue with varbinary. The test framework was previously taking a string as the baseline for a binary value, which cannot express all possible values. Fixed the test to intstead use a byte array. Thie required updating the hive tests that were using the old method of specifying baselines with a String. Fix cast to varbinary when reading from a data source with schema needed for writing a test. Updated patch to remove varchar lengths from table creation. This issue was fixed more generally by DRILL-4465, which provides a default type length for varchar and varbinary during the setup of calcite. This update now just provides tests to verify the fix in this case. Closes #393
2016-03-02	DRILL-4327: Fix rawtypes warnings in drill codebase	Laurent Goujon
	Fixing most rawtypes warning issues in drill modules. Closes #347
2016-02-29	DRILL-3688: Drill should honor "skip.header.line.count" and ↵	Arina Ielchiieva
	"skip.footer.line.count" attribute of Hive table 1. Functionality to skip header and footer lines while reading Hive data. 2. Unit tests.
2016-02-04	DRILL-4323: Handle skipAll query when use HiveDrillNativeParquetScan	Hsuan-Yi Chu
	Do not add Project when no column is needed to be read out from Scan (e.g., select count(*) from hive.table)
2016-01-29	DRILL-4279: Improve performance for skipAll query against Text/JSON/Parquet ↵	Jinfeng Ni
	table.
2015-12-29	DRILL-3739: (part 2) Fix issues in reading Hive tables with StorageHandler ↵	vkorukanti
	configuration (eg. Hive-HBase tables)
2015-12-15	DRILL-4169: Upgrade Hive storage plugin to work with Hive 1.2.1	vkorukanti
	+ HadoopShims.setTokenStr is moved to Utils.setTokenStr. There is no change in functionality. + Disable binary partitions columns in Hive test suites. Binary partition column feature is regressed in Hive 1.2.1 (HIVE-12680). This should affect only the Hive execution which is used to generate the test data. If Drill is talking to Hive v1.0.0 (which has binary partition columns working), Drill should be able to get the data from Hive without any issues (tested). + Move to tinyint_part from boolean_part as there is an issue with boolean type partition columns too (HIVE-6590). + Update StorageHandler based test as there is an issue with test data generation in Hive 1.2.1. Need a separate test with custom test StorageHandler. this closes #302