[SPARK-20644][core] Initial ground work for kvstore UI backend.

There are two somewhat unrelated things going on in this patch, but both are meant to make integration of individual UI pages later on much easier. The first part is some tweaking of the code in the listener so that it does less updates of the kvstore for data that changes fast; for example, it avoids writing changes down to the store for every task-related event, since those can arrive very quickly at times. Instead, for these kinds of events, it chooses to only flush things if a certain interval has passed. The interval is based on how often the current spark-shell code updates the progress bar for jobs, so that users can get reasonably accurate data. The code also delays as much as possible hitting the underlying kvstore when replaying apps in the history server. This is to avoid unnecessary writes to disk. The second set of changes prepare the history server and SparkUI for integrating with the kvstore. A new class, AppStatusStore, is used for translating between the stored data and the types used in the UI / API. The SHS now populates a kvstore with data loaded from event logs when an application UI is requested. Because this store can hold references to disk-based resources, the code was modified to retrieve data from the store under a read lock. This allows the SHS to detect when the store is still being used, and only update it (e.g. because an updated event log was detected) when there is no other thread using the store. This change ended up creating a lot of churn in the ApplicationCache code, which was cleaned up a lot in the process. I also removed some metrics which don't make too much sense with the new code. Tested with existing and added unit tests, and by making sure the SHS still works on a real cluster. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #19582 from vanzin/SPARK-20644.
author: Marcelo Vanzin <vanzin@cloudera.com> 2017-11-06 08:45:40 -0600
committer: Imran Rashid <irashid@cloudera.com> 2017-11-06 08:45:40 -0600
commit: c7f38e5adb88d43ef60662c5d6ff4e7a95bff580 (patch)
tree: 073fe172a17e3544aa06cf26370f7cf8273f70b6 /project
parent: 472db58cb19bbd3025eabbd185d920aab0ebb4da (diff)
1 files changed, 2 insertions, 0 deletions
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 45b8870f3b..99cac34c85 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -38,6 +38,8 @@ object MimaExcludes {
   lazy val v23excludes = v22excludes ++ Seq(
     // SPARK-18085: Better History Server scalability for many / large applications
     ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.status.api.v1.ExecutorSummary.executorLogs"),
+    ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.deploy.history.HistoryServer.getSparkUI"),
+
     // [SPARK-20495][SQL] Add StorageLevel to cacheTable API
     ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.cacheTable"),
author	Marcelo Vanzin <vanzin@cloudera.com>	2017-11-06 08:45:40 -0600
committer	Imran Rashid <irashid@cloudera.com>	2017-11-06 08:45:40 -0600
commit	c7f38e5adb88d43ef60662c5d6ff4e7a95bff580 (patch)
tree	073fe172a17e3544aa06cf26370f7cf8273f70b6 /project
parent	472db58cb19bbd3025eabbd185d920aab0ebb4da (diff)