bigdata/drill.git - Apache Drill CI loop

Age	Commit message (Collapse)	Author
2019-03-16	DRILL-7021: HTTPD Throws NPE and Doesn't Recognize TimeformatHEAD master	Charles S. Givre

2019-03-14	DRILL-7068: Support memory adjustment framework for resource management with ↵	HanumathRao
	Queues. closes #1677
2019-03-14	DRILL-2326: Fix scalar replacement for the case when static method which ↵	Volodymyr Vysotskyi
	does not return values is called - Fix check for return function value to handle the case when created object is returned without assigning it to the local variable closes #1687
2019-03-14	DRILL-6524: Prevent incorrect scalar replacement for the case of assigning ↵	Volodymyr Vysotskyi
	references inside if block
2019-03-14	DRILL-7038: Queries on partitioned columns scan the entire datasets	Bohdan Kazydub
	- Added new optimizer rule which checks if query references directory columns only and has DISTINCT or GROUP BY operation. If the condition holds, instead of scanning full file set the following will be performed: 1) if there is cache metadata file, these directories will be read from it, 2) otherwise directories will be gathered from selection object (PartitionLocation). In the end Scan node will be transformed to DrillValuesRel (containing constant literals) with gathered values so no scan will be performed. closes #1640
2019-03-14	DRILL-7058: Refresh command to support subset of columns	Venkata Jyothsna Donapati
	closes #1666
2019-03-14	DRILL-7092: Rename map to struct in schema definition	Arina Ielchiieva
	1. Renamed map to struct in schema parser. 2. Updated sqlTypeOf function to return STRUCT instead of MAP, drillTypeOf function will return MAP as before until internal renaming is done. 3. Add is_struct alias to already existing is_map function. Function should be revisited once Drill supports true maps. 4. Updated unit tests. closes #1688
2019-03-13	DRILL-7100: Fixed IllegalArgumentException when reading Parquet data	Salim Achouche

2019-03-11	DRILL-7073: CREATE SCHEMA command / TupleSchema / ColumnMetadata improvements	Arina Ielchiieva
	1. Add format, default, column properties logic. 2. Changed schema JSON after serialization. 3. Added appropriate unit tests. closes #1684
2019-03-11	DRILL-6952: Host compliant text reader on the row set framework	Paul Rogers
	The result set loader allows controlling batch sizes. The new scan framework built on top of that framework handles projection, implicit columns, null columns and more. This commit converts the "new" ("compliant") text reader to use the new framework. Options select the use of the V2 ("new") or V3 (row-set based) versions. Unit tests demonstrate V3 functionality. closes #1683
2019-03-08	DRILL-7046: Support for loading and parsing new RM config file	Sorabh Hamirwasia
	closes #1652
2019-03-08	DRILL-7054: timestamp in milliseconds	Giovanni Conte
	closes #1665
2019-03-05	DRILL-5603: Replace String file paths to Hadoop Path	Vitalii Diravka
	- replaced all String path representation with org.apache.hadoop.fs.Path - added PathSerDe.Se JSON serializer - refactoring of DFSPartitionLocation code by leveraging existing listPartitionValues() functionality closes #1657
2019-03-05	DRILL-7074: Scan framework fixes and enhancements	Paul Rogers
	Roll-up of fixes an enhancements that emerged from the effort to host the CSV reader on the new framework. closes #1676
2019-03-04	DRILL-7060: Support JsonParser Feature ↵	Abhishek Girish
	'ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER' (#1663)
2019-03-01	DRILL-4858: REPEATED_COUNT on an array of maps and an array of arrays is not ↵	Bohdan Kazydub
	implemented - Implemented 'repeated_count' function for repeated MAP and repeated LIST; - Updated RepeatedListReader and RepeatedMapReader implementations to return correct value from size() method - Moved repeated_count to freemarker template and added support for more repeated types for the function closes #1641
2019-03-01	DRILL-7041: CompileException happens if a nested coalesce function returns null	Bohdan Kazydub
	- Made `NullExpression`s in `IfExpression` with nested `IfExpression`s to be rewritten to typed ones recursively if necessary closes #1668
2019-02-28	DRILL-1328: Support table statistics - Part 2	Gautam Parai
	Add support for avg row-width and major type statistics. Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance. Update/fix rowcount, selectivity and ndv computations to improve plan costing. Add options for configuring collection/usage of statistics. Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs). Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries. Add support for CPU sampling and nested scalar columns. Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures. Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests. FUNCS: Statistics functions as UDFs: Separate Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A. * custom versions of "count" that always return BigInt * HyperLogLog based NDV that returns BigInt that works only on VarChars * HyperLogLog with binary output that only works on VarChars OPS: Updated protobufs for new ops OPS: Implemented StatisticsMerge OPS: Implemented StatisticsUnpivot ANALYZE: AnalyzeTable functionality * JavaCC syntax more-or-less copied from LucidDB. * (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel ANALYZE: Add getMetadataTable() to AbstractSchema USAGE: Change field access in QueryWrapper USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel * since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor * This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans. USAGE: Attach DrillStatsTable to DrillTable. * DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table * In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used. Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated. Query is set up to extract only the most recent statistics results for each column. closes #729
2019-02-28	DRILL-1328: Support table statistics	Cliff Buchanan

2019-02-25	DRILL-6950: Row set-based scan framework	Paul Rogers
	Adds the "plumbing" that connects the scan operator to the result set loader and the scan projection framework. See the various package-info.java files for the technical datails. Also adds a large number of tests. This PR does not yet introduce an actual scan operator: that will follow in subsequent PRs. closes #1618
2019-02-18	DRILL-7022: Partition pruning is not happening the first time after the ↵	Volodymyr Vysotskyi
	metadata auto-refresh closes #1638
2019-02-18	DRILL-6855: Do not load schema if there is an IOException	Abhishek Ravi
	closes #1626
2019-02-08	DRILL-7024: Refactor ColumnWriter to simplify type-conversion shim	Paul Rogers
	DRILL-7006 added a type conversion "shim" within the row set framework. Basically, we insert a "shim" column writer that takes data in one form (String, say), and does reader-specific conversions to a target format (INT, say). The code works fine, but the shim class ends up needing to override a bunch of methods which it then passes along to the base writer. This PR refactors the code so that the conversion shim is simpler. closes #1633
2019-02-01	DRILL-6862: Update Calcite to 1.18.0	Igor Guzenko
	1. Moved Calcite dependency from profile hadoop-default to general dependency managment 2. Updated Calcite version to 1.18.0-drill-r0 and Avatica version to 1.13.0 3. Hook.REL_BUILDER_SIMPLIFY moved to static block, cause now it can't be removed (fixes DRILL-6830) 4. Removed WrappedAccessor, since it was workaround fixed in CALCITE-1408 5. Fixed setting of multiple options in TestBuilder 6. Timstampadd type inference aligned with CALCITE-2699 7. Dependency update caused 417 kB increase of jdb-all jar size, so the maxsize limit was increased from 39.5 to 40 MB 8. Added test into TestDrillParquetReader to ensure that DRILL-6856 was fixed by Calcite update close apache/drill#1631
2019-02-01	DRILL-7016: Wrong query result with RuntimeFilter enabled when order of join ↵	Sorabh Hamirwasia
	and filter condition is swapped close apache/drill#1628
2019-02-01	DRILL-7019: Add check for redundant imports	Volodymyr Vysotskyi
	close apache/drill#1629
2019-02-01	DRILL-6964: Implement CREATE / DROP SCHEMA commands	Arina Ielchiieva
	Note: this PR only adds support for CREATE / DROP SCHEMA commands which allow to store and delete schema. Schema usage during querying the data will be covered in other PRs. 1. Added parser methods / handles to parse CREATE / DROP schema commands. 2. Added SchemaProviders classes to separate ways of schema provision (file, table function). 3. Added schema parsing using ANTLR4 (lexer, parser, visitors). 4. Added appropriate unit tests. close apache/drill#1615
2019-02-01	DRILL-6997: Semijoin is changing the join ordering for some tpcds queries.	Hanumath Maduri
	close apache/drill#1620
2019-01-30	DRILL-7006: Add type conversion to row writers	Paul Rogers
	Modifies the column metadata and writer abstractions to allow a type conversion "shim" to be specified as part of the schema, then inserted as part of the row set writer. Allows, say, setting an Int or Date from a string, parsing the string to obtain the proper data type to store in the vector. Type conversion not yet supported in the result set loader: some additional complexity needs to be resolved. Adds unit tests for this functionality. Refactors some existing tests to remove rough edges. closes #1623
2019-01-30	DRILL-7007: Use verify method in row set tests	Paul Rogers
	Many of the early RowSet-based tests used the pattern: new RowSetComparison(expected) .verifyAndClearAll(result); Revise this to use the simplified form: RowSetUtilities.verify(expected, result); The original form is retained when tests use additional functionality, such as the ability to perform multiple verifications on the same expected batch. closes #1624
2019-01-25	DRILL-6977: Improve Hive tests configuration	Igor Guzenko
	1. HiveTestBase data initialization moved to static block to be initialized once for all derivatives. 2. Extracted Hive driver and storage plugin management from HiveTestDataGenerator to HiveTestFixture class. This increased cohesion of generator and added loose coupling between hive test configuration and data generation tasks. 3. Replaced usage of Guava ImmutableLists with TestBaseViewSupport helper methods by using standard JDK collections. closes #1613
2019-01-25	DRILL-6962: Function coalesce returns an Error when none of the columns in ↵	Bohdan Kazydub
	coalesce exist in a parquet file - Updated UntypedNullVector to hold value count when vector is allocated and transfered to another one; - Updated RecordBatchLoader and DrillCursor to handle case when only UntypedNull values are present in RecordBatch (special case when data buffer is null but actual values are present); - Added functions to cast UntypedNull value to other types for use in UDFs; - Moved UntypedReader, UntypedHolderReaderImpl and UntypedReaderImpl from org.apache.drill.exec.vector.complex.impl to org.apache.drill.exec.vector package. closes #1614
2019-01-25	DRILL-6533: Allow using literal values in functions which expect FieldReader ↵	Volodymyr Vysotskyi
	instead of ValueHolder closes #1617
2019-01-18	DRILL-6967: Fix TIMESTAMPDIFF function for QUARTER qualifier	Volodymyr Vysotskyi
	closes #1609
2019-01-18	DRILL-6944: UnsupportedOperationException thrown for view over MapR-DB ↵	Igor Guzenko
	binary table 1. Added persistence of MAP key and value types in Drill views (affects .view.drill file) for avoiding cast problems in future. 2. Preserved backward compatibility of older view files by treating untyped maps as ANY. closes #1602
2019-01-14	DRILL-6903: SchemaBuilder code improvements	Arina Ielchiieva
	1. ColumnBuilder: setPrecisionAndScale method 2. SchemaContainer: addColumn method parameter AbstractColumnMetadata was changed to ColumnMetadata 3. MapBuilder / RepeatedListBuilder / UnionBuilder: added constructors without parent, made buildColumn method public 4. TupleMetadata: added toMetadataList method 5. Other refactoring
2019-01-11	DRILL-6959: Fix loss of precision when casting time and timestamp literals ↵	Volodymyr Vysotskyi
	in filter condition closes #1607
2019-01-07	DRILL-6946: Implement java.sql.Connection setSchema and getSchema methods in ↵	Arina Ielchiieva
	DrillConnectionImpl closes #1596
2019-01-03	DRILL-6901: Move schema builder to src/main	Paul Rogers
	Moves the SchemaBuilder class out of the src/test name space into the src/main namespace. Specifically, into the existing record.metadata package. Many files changed in this move. Corrected two minor issues: import of the wrong Arrays class and unnecessary annotations.
2018-12-10	DRILL-6889: Exclude Drill unit tests to avoid Travis timing out	Vitalii Diravka
	closes #1567
2018-12-10	DRILL-6791: Scan projection framework	Paul Rogers
	The "schema projection" mechanism: * Handles none (SELECT COUNT\()), some (SELECT a, b, x) and all (SELECT ) projection. * Handles null columns (for projection a column "x" that does not exist in the base table.) * Handles constant columns as used for file metadata (AKA "implicit" columns). * Handle schema persistence: the need to reuse the same vectors across different scanners * Provides a framework for consuming externally-supplied metadata * Since we don't yet have a way to provide "real" metadata, obtains metadata hints from previous batches and from the projection list (a.b implies that "a" is a map, c[0] implies that "c" is an array, etc.) * Handles merging the set of data source columns and null columns to create the final output batch. * Running tests found a failure due to an uninialized "bits" vector. Added code to explicitly fill the bits vectors with zeros in the "result set loader."
2018-11-29	DRILL-6866: Upgrade to SqlLine 1.6.0	Arina Ielchiieva
	1. Changed SqlLine version to 1.6.0. 2. Overridden new getVersion method in DrillSqlLineApplication. 3. Set maxColumnWidth to 80 to avoid issue described in DRILL-6769. 4. Changed colorScheme to obsidian. 5. Output null value for varchar / char / boolean types as null instead of empty string. 6. Changed access modifier from package default to public for JDBC classes that implement external interfaces to avoid issues when calling methods from these classes using reflection. closes #1556
2018-11-29	DRILL-6863: Drop table is not working if path within workspace starts with "/"	Bohdan Kazydub
	- Made workspace to be honored when table/view name starts with "/" for DROP TABLE, DROP VIEW, CREATE VIEW and SELECT from view queries; - Made "/{name}" and "{name}" to be equivalent names (the leading "/" is removed) when creating temporary tables so that SELECT ... FROM "/{name}" ... and SELECT ... FROM "{name}" ... produce the same results and behave as regular tables in the context. closes #1557
2018-11-29	DRILL-6792: Find the right probe side fragment wrapper & fix DrillBuf ↵	weijie.tong
	reference count bugs & tune the execution flow & support left deep tree closes #1504
2018-11-26	DRILL-6865: Filter is not removed from the plan when parquet table fully ↵	Volodymyr Vysotskyi
	matches the filter closes #1552
2018-11-26	DRILL-6865: Query returns wrong result when filter pruning happens	Volodymyr Vysotskyi

2018-11-26	DRILL-6870: Upgrade to ANTLR4	Arina Ielchiieva
	closes #1554
2018-11-26	DRILL-6858: Add functionality to list directories / files with exceptions ↵	Arina Ielchiieva
	suppression 1. Add listDirectoriesSafe, listFilesSafe, listAllSafe in FileSystemUtil and DrillFileSystemUtil classes. 2. Use FileSystemUtil.listAllSafe during listing files in show files command and information_schema.files table. closes #1547
2018-11-26	DRILL-6857: Read only required row groups in a file when limit push down is ↵	Arina Ielchiieva
	applied closes #1548
2018-11-26	DRILL-6349: Drill JDBC driver fails on Java 1.9+ with NoClassDefFoundError: ↵	ozinoviev
	sun/misc/VM closes #1446