Age | Commit message (Collapse) | Author |
|
does not return values is called
- Fix check for return function value to handle the case when created object is returned without assigning it to the local variable
closes #1687
|
|
parquet reader is used
closes #1655
|
|
1. HiveTestBase data initialization moved to static block
to be initialized once for all derivatives.
2. Extracted Hive driver and storage plugin management from HiveTestDataGenerator
to HiveTestFixture class. This increased cohesion of generator and
added loose coupling between hive test configuration and data generation
tasks.
3. Replaced usage of Guava ImmutableLists with TestBaseViewSupport
helper methods by using standard JDK collections.
closes #1613
|
|
1. Added DrillHiveViewTable which allows construction of DrillViewTable based
on Hive metadata
2. Added initialization of DrillHiveViewTable in HiveSchemaFactory
3. Extracted conversion of Hive data types from DrillHiveTable
to HiveToRelDataTypeConverter
4. Removed throwing of UnsupportedOperationException from HiveStoragePlugin
5. Added TestHiveViewsSupport and authorization tests
6. Added closeSilently() method to AutoCloseables
closes #1559
|
|
1. Added enableStringsSignedMinMax parquet format plugin config and store.parquet.reader.strings_signed_min_max session option to control reading binary statistics for files generated by prior versions of Parquet 1.10.0.
2. Added ParquetReaderConfig to store configuration needed during reading parquet statistics or files.
3. Provided mechanism to enable varchar / decimal filter push down.
4. Added VersionUtil to compare Drill versions in string representation.
5. Added appropriate unit tests.
closes #1537
|
|
closes #1527
|
|
|
|
1. StoragePluginsRegistryImpl was updated:
a. for backward compatibility at init to convert all existing storage plugins names to lower case, in case of duplicates, to log warning and skip the duplicate.
b. to wrap persistent plugins registry into case insensitive store wrapper (CaseInsensitivePersistentStore) to ensure all given keys are converted into lower case when performing insert, update, delete, search operations.
c. to load system storage plugins dynamically by @SystemStorage annotation.
2. StoragePlugins class was updated to stored storage plugins configs by name in case insensitive map.
3. SchemaUtilities.searchSchemaTree method was updated to convert all schema names into lower case to ensure that are they are matched case insensitively (all schemas are stored in Drill in lower case).
4. FileSystemConfig was updated to store workspaces by name in case insensitive hash map.
5. All plugins schema factories are now extend AbstractSchemaFactory to ensure that given schema name is converted to lower case.
6. New method areTableNamesAreCaseInsensitive was added to AbstractSchema to indicate if schema tables names are case insensitive. By default, false. Schema implementation is responsible for table names case insensitive search in case it supports one. Currently, information_schema, sys and hive do so.
7. System storage plugins (information_schema, sys) were refactored to ensure their schema, table names are case insensitive, also the annotation @SystemPlugin and additional constructor were added to allow dynamically load system plugins at storage plugin registry during init phase.
8. MetadataProvider was updated to concert all schema filter conditions into lower case to ensure schema would be matched case insensitively.
9. ShowSchemasHandler, ShowTablesHandler, DescribeTableHandler were updated to ensure schema / tables names (this depends if schema supports case insensitive table names) would be found case insensitively.
git closes #1439
|
|
closes #1415
|
|
properties at session level
closes #1365
|
|
closes #1314
|
|
- Removed usages of System.out and System.err from the test and replaced with loggers
closes #1284
|
|
Timestamp types. (#3)
close apache/drill#1247
* DRILL-6242 - Use java.time.Local{Date|Time|DateTime} classes to hold values from corresponding Drill date, time, and timestamp types.
Conflicts:
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/ExtendedJsonOutput.java
Fix merge conflicts and check style.
|
|
pruning
closes #1216
|
|
to Drill optimizations (filter / limit push down, count to direct scan)
1. Factored out common logic for Drill parquet reader and Hive Drill native parquet readers: AbstractParquetGroupScan, AbstractParquetRowGroupScan, AbstractParquetScanBatchCreator.
2. Rules that worked previously only with ParquetGroupScan, now can be applied for any class that extends AbstractParquetGroupScan: DrillFilterItemStarReWriterRule, ParquetPruneScanRule, PruneScanRule.
3. Hive populated partition values based on information returned from Hive metastore. Drill populates partition values based on path difference between selection root and actual file path.
Before ColumnExplorer populated partition values based on Drill approach. Since now ColumnExplorer populates values for parquet files from Hive tables,
`populateImplicitColumns` method logic was changed to populated partition columns only based on given partition values.
4. Refactored ParquetPartitionDescriptor to be responsible for populating partition values rather than storing this logic in parquet group scan class.
5. Metadata class was moved to separate metadata package (org.apache.drill.exec.store.parquet.metadata). Factored out several inner classed to improve code readability.
6. Collected all Drill native parquet reader unit tests into one class TestHiveDrillNativeParquetReader, also added new tests to cover new functionality.
7. Reduced excessive logging when parquet files metadata is read
closes #1214
|
|
closes #1207
|
|
2.1.2-mapr-1710 versions respectively
* Improvements to allow of reading Hive bucketed transactional ORC tables;
* Updating hive properties for tests and resolving dependencies and API conflicts:
- Fix for "hive.metastore.schema.verification", MetaException(message: Version information
not found in metastore) https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool
METASTORE_SCHEMA_VERIFICATION="false" property is added
- Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional
tables are necessary in Hive metastore
- Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict
with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property
- jackson and parquet libraries are relocated in hive-exec-shade module
- org.apache.parquet:parquet-column Drill version is added to "hive-exec" to
allow of using Parquet empty group on MessageType level (PARQUET-278)
- Removing of commons-codec exclusion from hive core. This dependency is
necessary for hive-exec and hive-metastore.
- Setting Hive internal properties for transactional scan:
HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION,
IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES
- "io.dropwizard.metrics:metrics-core" with last 4.0.2 version is added to dependencyManagement block in Drill root POM
- Exclusion of "hive-exec" in "hive-hbase-handler" is already in Drill root dependencyManagement POM
- Hive Calcite libraries are excluded (Calcite CBO was disabled)
- "jackson-core" dependency is added to DependencyManagement block in Drill root POM file
- For MapR Hive 2.1 client older "com.fasterxml.jackson.core:jackson-databind" is included
- "log4j:log4j" dependency is excluded from "hive-exec", "hive-metastore", "hive-hbase-handler".
close apache/drill#1111
|
|
1. Fixed ser / de issues for Hive, Kafka, Hbase plugins.
2. Added physical plan submission unit test for all storage plugins in contrib module.
3. Refactoring.
closes #1108
|
|
higher performance by caching frequently requested values.
closes #1099
|
|
closes #1045
|
|
- Use root schema as default for describe table statement.
Fix TestOpenTSDBPlugin.testDescribe() and TestInfoSchemaOnHiveStorage.varCharMaxLengthAndDecimalPrecisionInInfoSchema() unit tests.
- Modify expected results for tests:
TestPreparedStatementProvider.invalidQueryValidationError();
TestProjectPushDown.testTPCH1();
TestProjectPushDown.testTPCH3();
TestStorageBasedHiveAuthorization.selectUser1_db_u0_only();
TestStorageBasedHiveAuthorization.selectUser0_db_u1g1_only()
- Fix TestCTAS.whenTableQueryColumnHasStarAndTableFiledListIsSpecified(), TestViewSupport.createViewWhenViewQueryColumnHasStarAndViewFiledListIsSpecified(), TestInbuiltHiveUDFs.testIf(), testDisableUtf8SupportInQueryString unit tests.
- Fix UnsupportedOperationException and NPE for jdbc tests.
- Fix AssertionError: Conversion to relational algebra failed to preserve datatypes
*DrillCompoundIdentifier:
According to the changes, made in [CALCITE-546], star Identifier is replaced by empty string during parsing the query. Since Drill uses its own DrillCompoundIdentifier, it should also replace star by empty string before creating SqlIdentifier instance to avoid further errors connected with star column. see SqlIdentifier.isStar() method.
*SqlConverter:
In [CALCITE-1417] added simplification of expressions which should be projected every time when a new project rel node is created using RelBuilder. It causes assertion errors connected with types nullability. This hook was set to false to avoid project expressions simplification. See usage of this hook and RelBuilder.project() method.
In Drill the type nullability of the function depends on only the nullability of its arguments. In some cases, a function may return null value even if it had non-nullable arguments. When Calice simplifies expressions, it checks that the type of the result is the same as the type of the expression. Otherwise, makeCast() method is called. But when a function returns null literal, this cast does nothing, even when the function has a non-nullable type. So to avoid this issue, method makeCast() was overridden.
*DrillAvgVarianceConvertlet:
Problem with sum0 and specific changes in old Calcite (it is CALCITE-777). (see HistogramShuttle.visitCall method) Changes were made to avoid changes in Calcite.
*SqlConverter, DescribeTableHandler, ShowTablesHandler:
New Calcite tries to combine both default and specified workspaces during the query validation. In some cases, for example, when describe table statement is used, Calcite tries to find INFORMATION_SCHEMA in the schema used as default. When it does not find the schema, it tries to find a table with such name. For some storage plugins, such as opentsdb and hbase, when a table was not found, the error is thrown, and the query fails. To avoid this issue, default schema was changed to root schema for validation stage for describe table and show tables queries.
|
|
tests.
closes #1053
|
|
Overview:
1. When table has header / footer process input splits fo the same file in one reader (bug fix for DRILL-5941).
2. Apply skip header logic during reader initialization only once to avoid checks during reading the data (DRILL-5106).
3. Apply skip footer logic only when footer is more then 0, otherwise default processing will be done without buffering data in queue (DRILL-5106).
Code changes:
1. AbstractReadersInitializer was introduced to factor out common logic during readers intialization.
It will have two implementations:
a. Default (each input split group gets its own reader);
b. Empty (for empty tables);
2. AbstractRecordsInspector was introduced to improve performance when table has footer is less or equals to 0.
It will have two implementations:
a. Default (records will be processed one by one without buffering);
b. SkipFooter (queue will be used to buffer N records that should be skipped in the end of file processing).
3. When text table has header / footer each table file should be read as one unit. When file is being read as several input splits, they should be grouped.
For this purpose LogicalInputSplit class was introduced which replaced InputSplitWrapper class. New class stores list of grouped input splits and returns information about splits on group level.
Please note, during planning input splits are grouped only when data is being read from text table has header / footer each table, otherwise each input split is treated separately.
4. Allow HiveAbstractReader to have multiple input splits instead of one.
This closes #1030
|
|
This change includes:
DRILL-5783:
- A unit test is created for the priority queue in the TopN operator.
- The code generation classes passed around a completely unused function registry reference in some places so it is removed.
- The priority queue had unused parameters for some of its methods so it is removed.
DRILL-5841:
- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.
DRILL-5894:
- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.
Misc:
- General code cleanup.
- Removed unnecessary use of String.format in the tests.
This closes #984
|
|
- Rename FixtureBuilder to ClusterFixtureBuilder
- Provide alternative way to reset system/session options
- Fix for DRILL-5833: random failure in TestParquetWriter
- Provide strict, but clear, errors for missing options
closes #970
|
|
1. Bump up Drill Calcite version to in include CALCITE-2014 changes.
2. Add saffron.properties file to the Drill conf folder.
3. Add appopriate unit tests.
closes #936
|
|
1. Increased test parallelism and fixed associated bugs
2. Added test categories and categorized tests appropriately
- Don't exclude anything by default
- Increase test timeout
- Fixed flakey test
closes #940
|
|
results for local time-zone
closes #937
|
|
Changes include:
1. Addition of internal options.
2. Refactoring of OptionManagers and OptionValidators.
3. Fixed ambiguity in the meaning of an option type, and changed its name to accessibleScopes.
4. Updated javadocs in the Option System classes.
5. Added RestClientFixture for testing the Rest API.
6. Fixed flakey test in TestExceptionInjection caused by race condition.
7. Fixed various tests which started zookeeper but failed to shut it down at the end of tests.
8. Added port hunting to the Drill Webserver for testing
9. Fixed various flaky tests
10. Fix compile issue
closes #923
|
|
small refactoring of original fix of this issue (DRILL-4039); - Added test for the fix.
|
|
consisting of multiple operators.
This closes #823
|
|
1. Revisited calculation logic for string literals and some string functions
(cast, upper, lower, initcap, reverse, concat, concat operator, rpad, lpad, case statement,
coalesce, first_value, last_value, lag, lead).
Synchronized return type length calculation logic between limit 0 and regular queries.
2. Deprecated width and changed it to precision for string types in MajorType.
3. Revisited FunctionScope and splitted it into FunctionScope and ReturnType.
FunctionScope will indicate only function usage in term of number of in / out rows, (n -> 1, 1 -> 1, 1->n).
New annotation in UDFs ReturnType will indicate which return type strategy should be used.
4. Changed MAX_VARCHAR_LENGTH from 65536 to 65535.
5. Updated calculation of precision and display size for INTERVALYEAR & INTERVALDAY.
6. Refactored part of function code-gen logic (ValueReference, WorkspaceReference, FunctionAttributes, DrillFuncHolder).
This closes #819
|
|
This closes #695
|
|
Java heap space
close apache/drill#654
|
|
of views
increases
This closes #592
|
|
+ Function visitor should not use previous function holder if this function is non-deterministic
closes #509
|
|
status on command return - implement DROP TABLE IF EXISTS and DROP VIEW IF EXISTS; - added unit test for DROP TABLE IF EXISTS; - added unit test for DROP VIEW IF EXISTS; - added unit test for "IF" hive UDF.
This closes #541
|
|
hive database
|
|
- Replace drill var16char to varchar datatype for hive string datatype
- Change testGenericUDF() and testUDF() to use VarChar instead of Var16Char
- Add unit test for hive GET_JSON_OBJECT UDF
closes #431
|
|
when result column types are known
+ "planner.enable_limit0_optimization" option is disabled by default
+ Print plan in PlanTestBase if TEST_QUERY_PRINTING_SILENT is set
+ Fix DrillTestWrapper to verify expected and actual schema
+ Correct the schema of results in TestInbuiltHiveUDFs#testXpath_Double
This closes #405
|
|
- CUME_DIST
- DENSE_RANK
- PERCENT_RANK
- RANK
- ROW_NUMBER
- NTILE
- LEAD
- LAG
- FIRST_VALUE
- LAST_VALUE
|
|
|
|
|
|
metadata bug
The precision of the Varchar datatype was not being set causing inconsistent
truncation of values to the default length of 1. Fixed the same issue with varbinary.
The test framework was previously taking a string as the baseline for a binary value,
which cannot express all possible values. Fixed the test to intstead use a byte array.
Thie required updating the hive tests that were using the old method of specifying
baselines with a String.
Fix cast to varbinary when reading from a data source with schema needed for writing
a test.
Updated patch to remove varchar lengths from table creation.
This issue was fixed more generally by DRILL-4465, which provides a default
type length for varchar and varbinary during the setup of calcite. This update now
just provides tests to verify the fix in this case.
Closes #393
|
|
Fixing most rawtypes warning issues in drill modules.
Closes #347
|
|
"skip.footer.line.count" attribute of Hive table
1. Functionality to skip header and footer lines while reading Hive data.
2. Unit tests.
|
|
Do not add Project when no column is needed to be read out from Scan (e.g., select count(*) from hive.table)
|
|
table.
|
|
configuration (eg. Hive-HBase tables)
|
|
+ HadoopShims.setTokenStr is moved to Utils.setTokenStr. There is no change
in functionality.
+ Disable binary partitions columns in Hive test suites. Binary
partition column feature is regressed in Hive 1.2.1 (HIVE-12680). This
should affect only the Hive execution which is used to generate the test
data. If Drill is talking to Hive v1.0.0 (which has binary partition
columns working), Drill should be able to get the data from Hive
without any issues (tested).
+ Move to tinyint_part from boolean_part as there is an issue with boolean
type partition columns too (HIVE-6590).
+ Update StorageHandler based test as there is an issue with test data
generation in Hive 1.2.1. Need a separate test with custom test StorageHandler.
this closes #302
|