summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/01-Usage.txt90
-rw-r--r--docs/02-Code-layout.txt564
-rw-r--r--docs/03-Linux-kernel-build.txt57
-rw-r--r--docs/04-Cache-hit-rate-howto.txt208
-rw-r--r--docs/05-FAQ.txt18
-rw-r--r--docs/06-Optional-rootfs-build.txt125
6 files changed, 1062 insertions, 0 deletions
diff --git a/docs/01-Usage.txt b/docs/01-Usage.txt
new file mode 100644
index 0000000..3ed01f8
--- /dev/null
+++ b/docs/01-Usage.txt
@@ -0,0 +1,90 @@
+Usage
+=====
+
+1. Requirements/Pre-requisites
+
+ 1. A Linux development environment. This release has been
+ build tested on the following Linux host environments:
+
+ 1. Linux Ubuntu 10.10
+ 2. Red Hat Enterprise Linux WS release 4 (Nahant Update 4)
+
+ This release is not intended to be used on development
+ environments other than Linux.
+
+ 2. An installation of the ARM RealView Development Suite. This
+ release was built and tested with version 4.1 [Build 514].
+
+ 3. An installation of the Perl scripting language. This release
+ was built and tested with v5.10.1.
+
+ 4. An installation of the GNU coreutils suite
+ <http://www.gnu.org/software/coreutils/>. This release was
+ built and tested with v8.5.
+
+2. Build instructions
+
+ To build the software:
+
+ $ tar -jxf arm-virtualizer-v2.0-ti-040911.tar.bz2
+ $ cd arm-virtualizer-v2.0-ti-040911/bootwrapper
+ $ make clean && make
+
+ The resulting file is 'img.axf'.
+
+ This image may be loaded and executed on the model debugger
+ as explained in section 3 below.
+
+ Note that the pre-built stub kernel image is located at:
+
+ arm-virtualizer-v2.0-ti-040911/bootwrapper/payload/kernel
+
+ .. and the placeholder dummy root filesystem image is located
+ at:
+
+ arm-virtualizer-v2.0-ti-040911/bootwrapper/payload/fsimg
+
+ These may be replaced with custom built images such as a
+ suitably configured linux kernel image and a root filesystem
+ image.
+
+ Look at docs/03-Linux-kernel-build.txt for instructions on
+ building a suitable Linux kernel.
+
+ Look at docs/06-Optional-rootfs-build.txt for optionally
+ building a complete root filesystem.
+
+ Note that this release relies on the 'env' utility, which is
+ a part of the coreutils suite. The 'env' utility is expected
+ to be located at '/bin/env'. If it is at a different
+ location then this must be reflected in the first line of
+ the following file:
+
+ arm-virtualizer-v2.0-ti-040911/bootwrapper/makemap
+
+ Failure to make this modification will result in a build
+ failure.
+
+3. Usage
+
+ If the Kingfisher Real-Time System Model (RTSM VE Cortex-A15 KF
+ CCI version 6.2 Beta) is installed, the resulting
+ img.axf file may be loaded, executed and debugged on the
+ model and the associated model debugger.
+
+ This model may be obtained from ARM by separate arrangement.
+
+ Steps to run the software:
+
+ a. Depending upon whether the MPx1 or MPx4 model is being used,
+ update the big-little-mp<x>.mxscript file (x is 1 or
+ 4 as the case may be) with the absolute
+ path to the model and the img.axf file. (Comments in the
+ file indicate where the changes have to be made)
+
+ b. Invoke the modeldebugger and the script file as follows:
+
+ $ <path to modeldebugger> -s <path to big-little-mp<x>.mxscript>
+
+ The default build simultaneously switches clusters
+ every 12 million cycles (appx).
diff --git a/docs/02-Code-layout.txt b/docs/02-Code-layout.txt
new file mode 100644
index 0000000..79af0a3
--- /dev/null
+++ b/docs/02-Code-layout.txt
@@ -0,0 +1,564 @@
+Code layout
+===========
+
+A Introduction
+
+ The software contained in the 'bootwrapper' directory allows
+ the execution of a software payload e.g. a Linux stack to
+ alternate between two multi-core clusters of ARM Cortex-A15
+ & Kingfisher processors connected by a coherent
+ interconnect. To achieve this aim it provides the ability
+ to:
+
+ 1. Save the processor context on one cluster (henceforth
+ called the outbound cluster) and restore it on the other
+ cluster (henceforth called the inbound cluster).
+
+ 2. Hide any software visible microarchitectural differences
+ between the Cortex-A15 & Kingfisher processors.
+
+ 3. Use the ARM Virtualization Extensions to perform 1. & 2.
+ in an payload software agnostic manner.
+
+ This software is intended to be executed on the Kingfisher
+ Real-Time System Model (RTSM VE Cortex-A15 KF CCI version
+ 6.2 Beta).
+
+ In addition to switching the payload software execution
+ between the two clusters, the software also contains support
+ for executing the payload software simultaneously on the two
+ clusters.
+
+ This is called the MP configuration. In it's current state,
+ it mainly involves making the payload software believe that
+ the A15 cluster includes the cpus present on Kingfisher cluster
+ i.e.there is one cluster with more cpus that there
+ physically are. [Note that MP support is highly experimental
+ and unstable. It is NOT the focus of this release and is
+ intended for purely informational purposes. The cluster
+ swithing mode of operation remains the focus of this
+ release.]
+
+ The Virtualizer software needs initialization prior to being
+ used to perform any of the above functions. The
+ initialization needs to be done before the payload software
+ is executed. Hence, it makes sense to do this from the
+ existing boot firmware being used on the platform. The code
+ in the 'bootwrapper' directory is a bareminimal bootloader
+ that:
+
+ 1. Sets up the environment for execution of the payload
+ software in the Non-secure world by programming the
+ appropriate coprocessor and memory mapped peripheral
+ registers from the Secure world.
+
+ 2. Invokes the entry point of the Virtualizer software
+ (bl_setup()) which does the necessary initialization.
+
+ 3. Passes control to the payload software in the Non-secure
+ world.
+
+B Code layout overview
+
+ 1. bootwrapper/
+
+ Apart from containing the bootloader, this directory
+ also contains scatter files to load the bootloader,
+ Virtualizer and the payload software correctly on the
+ target platform as a single ELF file (img.axf).
+
+ The important files here are:
+
+ 1. vectors.S
+
+ 1. Implements the Secure world exception vectors
+ which are loaded to the base of physical memory
+ (0x80000000) at reset.
+
+ 2. boot.S
+
+ 1. Handles a power-on reset.
+
+ 2. Initialises the I-Cache, sets up the stack &
+ passes control to the C handler for performing
+ the rest of the initialization.
+
+ 3. c_start.c
+
+ 1. Picks up from where the start() routine left in
+ the previous file.
+
+ 2. Programs the exception vector tables for the
+ Secure world.
+
+ 3. Provides Non-secure access to certain
+ coprocessor registers and memory mapped
+ peripherals e.g. access to the cache
+ coherent interconnect registers, coprocessors
+ etc.
+
+ 4. Enables functionality which can be initialised
+ only in the Secure world. e.g. Configuration of
+ interrupts as Non-secure.
+
+ 5. Synchronises execution with the secondary cpus
+ (if present) so that any global peripheral is
+ accesses by them only after the primary has
+ initialised it.
+
+ 6. Enters the non-secure HYP mode and initialises
+ the Virtualizer.
+
+ 7. Enters the non-secure SVC mode and jumps to the
+ payload software entry point.
+
+ 4. payload/
+
+ 1. Contains two files 'fsimg' and 'kernel'.
+
+ 2. The 'kernel' is a raw Linux kernel binary image.
+ The instructions to build this Linux image can
+ be found in docs/03-Linux-kernel-build.txt.
+ This image can be replaced with a raw binary
+ image of any other software payload which is
+ desired to be run on this system.
+
+ 3. The 'fsimg' is an empty filesystem stub. If
+ desired, it can be replaced with a suitable
+ filesystem image in a Linux initramfs format. A
+ custom busybox filesystem was used for testing.
+ More complex filesystems may be used if needed
+ but will require the use of MMC emulation with
+ the ARM FastModels.
+ See docs/06-Optional-rootfs-build.txt for
+ details.
+
+ 5. boot.map.template
+
+ 1. Scatter file which combines the payload
+ software, Virtualizer and the bootloader into a
+ single ELF file (img.axf) which can
+ then be loaded on the relevant platform.
+
+ 6. makemap
+
+ 1. Simple perl script that takes an ELF image of
+ the Virtualizer, parses through the relevant
+ sections & adds those sections to
+ the scatter file so that a consolidated image
+ can be created.
+
+ 2. big-little/common
+
+ This directory mainly deals with setting up of the HYP
+ processor mode and the Virtual GIC. This allows the
+ payload software to run unmodified while either the
+ Switching or the MP mode is active in the background.
+
+ The important files here are:
+
+ 1. hyp_vectors.s
+
+ 1. Implements the HYP mode vector table.
+
+ 2. It contains the entry point "bl_setup()" which
+ is invoked by the bootwrapper to initialise the
+ Virtualizer software.
+
+ 3. The exception vector for interrupts
+ [irq_entry()] is the entry point for all
+ physical interrupts. The exception vector for
+ hypervisor traps [hvc_entry()] is the entry
+ point for all accesses made by the payload
+ software that need to be handled in the HYP
+ mode.
+
+ 4. Also contained is rudimentary support for fault
+ exception handlers [dabt_entry(), iabt_entry() &
+ undef_entry()].
+
+ 2. hyp_setup.c
+
+ 1. Extends the initialization of the Virtualizer
+ software into C code. Most importantly it
+ distinguishes between cold (power-on) & warm
+ (inbound cluster) resets. Software is
+ initialised differently in each case.
+
+ 2. If switching is being done asynchronously then
+ the HYP timer interrupt is setup to periodically
+ (~12 million instructions) trigger a switchover
+ to the other cluster.
+
+ 3. If in MP mode, then CCI snoops are enabled for
+ both the clusters.
+
+ 3. vgic_handle.c
+
+ 1. Extends handling of physical interrupts into C
+ code from irq_entry(). Interrupts are
+ acknowledged (optionally EOI'ed) and queued as
+ virtual interrupts. The HYP timer interrupt is
+ handled differently. When recieved, its used as
+ a trigger to initiate the switchover process.
+
+ 4. vgiclib.c
+
+ 1. Implements handling of virtual interrupts once
+ they have been queued up in the vGIC HYP view
+ list registers. It maintains the list registers
+ and also saves and restores the context of the
+ vGIC HYP view interface.
+
+ 5. pagetable_setup.c
+
+ 1. Creates and sets up the HYP mode and 2nd stage
+ translation page tables. Accesses by the payload
+ software to the vGIC physical cpu interface are
+ mapped to the vGIC virtual cpu interface using
+ the 2nd stage translation page tables.
+
+ 2. In the MP configuration, the translation tables
+ are shared by all the cpus in the two clusters.
+ Hence the first cpu in only one of the clusters
+ creates them.
+
+ 6. vgic_setup.c
+
+ 1. Enables virtual interrupts & exceptions.
+ Initialises, the physical cpu interface and the
+ HYP view interface.
+
+ 3. big-little/lib
+
+ This directory implements common functionality thats
+ used across all the Virtualizer code. This includes :
+
+ 1. Locks which can be used with Strongly ordered and
+ Device memory.
+
+ 2. Code tracing support on the Fast Models platform
+ through the use of memory mapped TUBE registers &
+ the Generic Trace plugin.
+ Details of this feature can be found in
+ docs/04-Cache-hit-rate-howto.txt.
+
+ 3. Events to synchronise the switching process between
+ the clusters and within the clusters. They also used
+ to synchronise the setup phase after a cold reset in
+ the MP configuration.
+
+ 4. UART routines to enable support semihosting of
+ printf family of functions.
+
+ 5. Cache maintenance, Stack manipulation & Locking
+ routines.
+
+ 4. big-little/include
+
+ 1. This directory contains the headers specific to HYP
+ mode setup, Switching process & common helper
+ routines. Most importantly, context.h contains the
+ data structures which are used to save and restore
+ the processor context.
+
+ 5. big-little/switcher
+
+ This directory implements code to save and restore
+ processor context and to initiate/handle a
+ async/synchronous switchover request.
+
+ 1. context/
+
+ 1. ns_context.c
+
+ 1. Contains top level routines to save and
+ restore the Non-secure world context.
+
+ 2. It requests the secure world to save its own
+ context and bring the inbound cluster out of
+ reset. It also uses events to synchronise
+ the switching process between the inbound
+ and outbound clusters.
+
+ 2. gic.c
+
+ 1. Contains routines to save and restore the
+ context of the vGIC physical distributor and
+ cpu interfaces.
+
+ 3. sh_vgic.c
+
+ 1. The two clusters share the interrupt
+ controller instead of each cluster having
+ its own. A consequence of this is that there
+ is no longer a 1 to 1 mapping between cpu
+ ids and cpu interface ids e.g. on an
+ MPx1+MPx1 cluster configuration,
+ cpu0 of the Kingfisher cluster would
+ correspond to cpuinterface1 on the shared
+ vGIC. This in turn affects routing of
+ peripheral and software generated
+ interrupts. This file implements code to
+ allow use of the shared vGIC correctly
+ keeping this limitation in mind.
+
+ 2. trigger/
+
+ 1. async_switchover.c
+
+ 1. Contains code to use the HYP timer interrupt
+ as a trigger to initiate a switchover
+ asynchronously.
+
+ 2. sync_switchover.c
+
+ 1. Contains code to handle an HVC instruction
+ executed by the payload software as a
+ trigger to initiate a
+ synchronous switchover.
+
+ 3. handle_switchover.s
+
+ 1. Contains code to start saving the non-secure
+ world context & request the secure world to
+ power down the outbound cluster once the
+ inbound cluster is up and
+ running.
+
+ 6. big-little/virtualisor
+
+ This directory implements code that using the ARM
+ Virtualization extensions:
+
+ 1. Hides any microarchitectural differences between the
+ Cortex-A15 & Kingfisher processors visible to the
+ payload software.
+
+ 2. Provides a different view of the underlying hardware
+ than what really exists e.g. in the switching mode
+ it traps accesses made by the host cluster
+ (Kingfisher cluster currently) to the shared vGIC
+ physical distributor interface, so that routing of
+ interrupts can take place correctly. In the MP mode,
+ the L2 control and MPIDR registers are virtualized
+ to tell the payload software that there is one
+ cluster with multiple processors instead of two.
+
+ The ARM Virtualization extensions provide a set of trap
+ registers (HCPTR (Hyp Coprocessor Trap Register), HSTR
+ (Hyp System Trap Register), HDCR (Hyp Debug
+ Configuration Register)) to be able to select what
+ accesses made by the payload software to the coprocessor
+ block will be trapped in the HYP mode.
+
+ Accesses to memory mapped peripherals e.g. shared vGIC
+ can betrapped into the HYP mode by populating
+ appropriate entries in the 2nd stage translation tables.
+ This is how microarchitectural differences between the
+ two processor sets are resolved.
+
+ Whenever a trap into HYP mode is taken, the HSR (Hyp
+ Syndrome Register) contains enough information about the
+ type of trap taken for the software to take appropriate
+ action.
+
+ The Virtualizer design centres around the traps
+ recognized by the HSR. Also, to deal with
+ microarchitectural differences the concept of a HOST
+ cluster is introduced. It is possible for each
+ cpu to find out the system topology using the Kingfisher
+ System Control Block. Once it knows the host cluster id
+ & whether the software is expected to switch execution
+ or run in the MP mode (provided at compile time), the
+ CPU Can configure itself
+ accordingly.
+
+ The processor cluster for which the payload software has
+ been built to run on [assumed to be Cortex-A15 for this
+ release] is termed as the TARGET while the cluster on
+ which the differences are expected to crop up is called
+ the HOST (assumed to be Kingfisher for this release).
+ The HOST environment variable is used to specify
+ the host cluster. The target cluster is assumed to be
+ the logical complement of the host i.e. cluster ids can
+ only take the values of 0 & 1.
+
+ The HOST processor emulates the TARGET processor by
+ trapping the accesses to differing processor features
+ into the HYP mode. Most of the microarchitectural
+ differences & registers that need to be virtualized are
+ handled in a generic (CPU Independent) layer of
+ code. Additionally, each processor exports functions to
+ setup, handle & optionally save/restore context of each
+ trap that the HSR recognises. These handlers are invoked
+ whenever the software runs
+ on that processor.
+
+ 1. virt_setup.c
+
+ 1. Generic function that initialises the required
+ traps. This is done once each on both the host
+ and target clusters if the trap handler needs
+ to obtain some information about the target
+ cluster to be able to work correctly e.g the
+ Kingfisher processor cluster needs to find out
+ the cache geometry of the Cortex-A15
+ processor cluster to be able to handle cache
+ maintenance operations by set/way correctly.This
+ function further calls any setup function that
+ has been exported by the processor the code is
+ executing on.
+
+ 2. virt_handle.c
+
+ 1. Generic function that extends the hvc_entry()
+ routine to C Code. It calls the generic trap
+ handler (if registered) and then any trap
+ handlers exported by the processor on
+ which the trap has been invoked.
+
+ 3. virt_context.c
+
+ 1. Generic function that saves and restores traps
+ on the host cluster & then calls any
+ save/restore function that has been exported by
+ the processor the code is executing on.
+
+ 4. cache_geom.c
+
+ 1. Generic function that detects cache geometries
+ on the host and target clusters & then maps
+ cache maintenance operations by set/way from the
+ target to the host cache.
+
+ 5. mem_trap.c
+
+ 1. Generic function that sets up any memory traps
+ by editing the 2nd stage translation tables.
+
+ 6. vgic_trap_handler.c
+
+ 1. Generic function that handles trapped accesses
+ to the shared vGIC
+
+ 7. include/
+
+ Header files specific to the Virtualisor code
+
+ 8. cpus/
+
+ Placeholders for any traps that the Kingfisher or A15 processor
+ cluster might want to setup. No traps need to be setup
+ at the moment.
+
+ 9. big-little/secure_world
+
+ Since both Kingfisher & Cortex-A15 processors support ARM
+ TrustZone Security Extensions, there is certain context
+ that needs to be setup, saved & restored in the Secure
+ world.
+
+ This context allows access to certain coprocessor and
+ peripheral registers to the Non-secure world. It also
+ configures the shared vGIC for use by the Non-secure
+ world.
+
+ Execution shifts to the Secure world through the SMC
+ instruction which is a part of the ARM V7-ISA.
+
+ 1. monmode_vectors.s
+
+ 1. Implements the monitor mode vector table. It
+ contains the secure entry point [do_smc()] for
+ the SMC instruction alongwith rudimentary
+ support for other fault exceptions taken while
+ executing in the secure world.
+
+ 2. Three types of SMC exceptions are expected (type
+ of exception is contained in r0):
+
+ 1. SMC_SEC_INIT
+
+ Called once after a power on reset to
+ initialise the Secure world stacks,
+ coherency, pagetables & configure some
+ coprocessor & memory mapped
+ peripheral (Coherent interconnect & shared
+ vGIC) registers for use of these features by
+ the Non-secure world.
+
+ 2. SMC_SEC_SAVE
+
+ Called from ns_context.c to request the
+ secure world to save its context and bring
+ the corresponding core in the inbound
+ cluster out of reset so that it can start
+ restoring the saved state.
+
+ 3. SMC_SEC_SHUTDOWN
+
+ Called from handle_switchover.s to request
+ the secure world to flush the L1 & L2 caches
+ and power down the outbound cluster.
+
+ Also implemented is a function to handle warm
+ resets on the inbound cluster. Bareminimal
+ context is initialised while the rest is restored
+ before control is passed to the Non-secure world
+ handler for restoring context [restore_context()]
+ in ns_context.c
+
+ 2. secure_context.c
+
+ Implements code to save and restore the secure world
+ context
+
+ 3. secure_resets.c
+
+ Implements code to power down the outbound cluster
+ and bring individual cores in the inbound cluster
+ out of reset.
+
+ 4. ve_reset_handler.s
+
+ Base of physical memory in the Versatile Express
+ memory map is at 0x80000000. The processors are
+ brought out of reset at 0x0 which points to Secure
+ RAM/Flash memory. This file implements a small stub
+ function that is placed at 0x0 so that execution
+ jumps to 0x80000000 after a cold reset and to the
+ warm_reset() handler in monmode_vectors.s
+ after a warm reset.
+
+ The secure world code is built into a seperate ELF image
+ to maintain its distinction from the Virtualizer code
+ that executes in the Non-secure world.
+
+ 10. big-little/bl.scf.template
+
+ 1. Scatter file that is used to build the Non-secure
+ world code in the Virtualizer software. The
+ resultant image is bl.axf.
+
+ 11. big-little/bl-sec.scf.template
+
+ 1. Scatter file that is used to build the Secure world
+ code in the Virtualizer software. The resultant
+ image is bl_sec.axf.
+
+ 12. acsr/
+
+ The secure world code is built into a seperate ELF image
+ to maintain its distinction from the Virtualizer code
+ that executes in the Non-secure world.
+
+ 1. helpers.s
+
+ Helper functions to access the CP15 coprocessor
+ space.
+
+ 2. v7.s
+
+ Contains routines to save and restore ARM processor
+ context
diff --git a/docs/03-Linux-kernel-build.txt b/docs/03-Linux-kernel-build.txt
new file mode 100644
index 0000000..972725a
--- /dev/null
+++ b/docs/03-Linux-kernel-build.txt
@@ -0,0 +1,57 @@
+Building and installing a Linux kernel
+======================================
+
+A suitable Linux kernel image for use with the virtualizer
+can be built as follows:
+
+$ tar -jxf arm-virtualizer-v2.0-ti-040911.tar.bz2
+$ cd arm-virtualizer-v2.0-ti-040911/bootwrapper
+$ make clean
+$ pushd /tmp
+$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/maz/ael-kernel.git ael-kernel.git
+$ cd ael-kernel.git
+$ git checkout -b ael-11.06 origin/ael-11.06
+$ yes | make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- vexpress-new_defconfig
+$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j4
+$ popd
+$ cp $OLDPWD/arch/arm/boot/Image payload/kernel
+
+The virtualizer can now be built as usual by invoking:
+
+$ make clean && make
+
+.. in the top bootwrapper directory.
+
+This will result in a file called img.axf located at
+arm-virtualizer-v2.0-ti-040911/bootwrapper/img.axf.
+
+To launch the ARM FastModel with the virtualizer, first modify
+arm-virtualizer-v2.0-ti-040911/bootwrapper/big-little-MP<x>.mxscript
+as usual to fill in paths to the model binary and the img.axf files. The
+mxscript file is adequately commented to assist with this.
+
+Eg. In case of an MP1 model, we would use the big-little-MP1.mxscript file
+and we would specify the path to the model in a manner similar to:
+
+string model = "/home/working_dir/RTSM_VE_Cortex-A15x1-A7x1";
+
+Similarly, in case of an MP4 model, we would use the big-little-MP4.mxscript
+and we would specify the path to the model in a manner similar to:
+
+string model = "/home/working_dir/models/RTSM_VE_Cortex-A15x4-A7x4";
+
+The path to the img.axf file is specified using the app directive as
+follows:
+
+string app = "arm-virtualizer-v2.0-ti-040911/bootwrapper/img.axf";
+
+The model can then be launched using:
+
+modeldebugger -s arm-virtualizer-v2.0-ti-040911/bootwrapper/big-little-MP<x>.mxscript
+
+Where 'x' is the 1 or 4 respectively in the case of an MP1 model run or an
+MP4 model run.
+
+This will result in the Linux kernel console messages appearing the ARM
+FastModel UART emulation window. The virtualizer will switch execution
+between the two clusters at ~12 million instruction intervals.
diff --git a/docs/04-Cache-hit-rate-howto.txt b/docs/04-Cache-hit-rate-howto.txt
new file mode 100644
index 0000000..fec4899
--- /dev/null
+++ b/docs/04-Cache-hit-rate-howto.txt
@@ -0,0 +1,208 @@
+Cache hit-rate HOWTO
+====================
+
+A Introduction
+
+ The ARM Fast Models are accompanied with a trace infrastructure
+ referred to as the Model Trace Interface (MTI). The MTI trace
+ provides a mechanism to dynamically register to events from the
+ model. The GenericTrace.so MTI trace plugin provides a number of
+ trace events whose output can be logged in a simple text file.
+ The usage of this plugin is given in Section B.
+
+ In this document we will consider how the GenericTrace.so plugin
+ can be used during a cluster switchover to calculate the number
+ of cache hits in the outbound cluster L2 cache originating from
+ the inbound cluster before the outbound L2 is flushed and the
+ cluster placed in reset.
+
+B Plugin Usage
+
+ The GenericTrace plugin is loaded using the "--trace-plugin"
+ parameter in the command line to launch the model.
+
+ A list of trace sources provided by the plugin can be listed as
+ follows:
+
+ "RTSM_VE_Cortex-A15x1-A7x1 --trace-plugin GenericTrace.so
+ --parameter TRACE.GenericTrace.trace-sources= "
+
+ A list of parameters supported by the Generic Trace plugin can
+ be listed as follows:
+
+ "RTSM_VE_Cortex-A15x1-A7x1 --trace-plugin GenericTrace.so -l"
+
+ Some of the interesting parameters are:
+
+ TRACE.GenericTrace.trace-file: The trace file to write into. If
+ empty will print to console / STDOUT.
+
+ TRACE.GenericTrace.perf-period: Print performance every N
+ instructions. Since the instruction count and the global counter
+ have the same value on the Fast Models, this parameter provides
+ a good approximation of time.
+
+ TRACE.GenericTrace.flush: If set to true then the trace file will be
+ flushed after every event.
+
+C Plugin Trace sources
+
+ The GenericTrace plugin provides events which allow each cluster
+ to trace snoop requests originating from a different cluster that
+ hit in its caches. For snoops originating from the Kingfisher cluster
+ that hit in the A15 cluster, the event is 'read_for_4_came_from_snoop'
+ & for the opposite case the event is 'read_for_3_came_from_snoop'.
+ The numbers '3' & '4' in the name of the trace sources are the ids
+ of the CCI slave interfaces from where the snoop originated.
+
+ These trace sources are the per-cluster implementation of the
+ event id '0xA' "(Read data last handshake - data returned
+ from the cache rather than from downstream)" of the CCI PMU.
+ Please refer to the "Cache Coherent Interconnect (CCI-400)
+ Architecture Specification" for further details.
+
+ The plugin also provides the ability to trace code execution through
+ a memory mapped "tube" interface. This interface defines a list of
+ registers which when written to in a particular sequence and the
+ 'sw_trace_event' trace source selected during model invocation will
+ print out the register values in the trace file.
+
+ The "tube" interface defines:
+
+ - Three LE 64 bit registers of arbitrary data that can be
+ written (and retain their values).
+
+ - A tube-like char register which when written with '\0'
+ will generate an event with the current state of the
+ 64-bit registers and with the characters sent to the
+ device with a unique sequence_id.
+
+ All of these registers are banked and write-only, the trace
+ event will also output the cluster id and the CPU id. ARM
+ FastModels implement 1 to 4 TUBE interfaces. Please refer to
+ Section E for supported interfaces in the current model
+ release. The memory map of these registers can be found in
+ big-little/include/misc.h.
+
+ The 'write_trace' function in big-litte/lib/tube.c implements the
+ software sequence to program the tube interface. This function is
+ called at various points in switchover process. It prints out a
+ message which indicates that an event is about to start or has
+ completed alongwith the value of the global counter in one of the
+ 64 bit registers. To enable this functionality, the environment
+ variable "TUBE" needs to be defined to TRUE prior to code compilation.
+
+D Putting it all together
+
+ The list of steps to use the above mentioned functionality is:
+
+ 1. Build the Virtualizer code with "TUBE" support. On the
+ tcsh shell, this is as follows;
+
+ $ setenv TUBE TRUE; make clean && make
+
+ 2. Launch the model with the MTI trace plugin support and a
+ selection of the right trace sources using a suitable
+ MXScript file in the 'bootwrapper' directory.
+
+ Once the switchover process starts, the trace file will contain output
+ that looks like this (not including the comments):
+
+ .
+ .
+ .
+ .
+ // Lines beginning with "PERFORMANCE" are a result of the value of the
+ // "TRACE.GenericTrace.perf-period" parameter. This string is printed
+ // every <value> number of instructions (200 in this case) in the trace
+ // file. It indicates at what rate is the model executing instructions
+ // & the number of instructions executed thus far.
+ PERFORMANCE: 2.8 MIPS (Inst:67216767)
+ .
+ .
+ .
+ // Lines beginning with "sw_trace_event<x>" are a result of enabling
+ // "TUBE" support in the code and selecting the "sw_trace_event" source
+ // while invoking the model. The interpretation of this message is:
+ //
+ // <x> : indicates the "TUBE" interface number.
+ // sequence_id : a unique number assigned to each message
+ // cluster_and_cpu_id : in the format 0x<cluster id><cpu id>. Each id
+ // occupies 8 bits.
+ // data0 : first 64-bit register value. Programmed with
+ // the value of the global counter.
+ // data1 : second 64-bit register value. Not used.
+ // data2 : third 64-bit register value. Not used.
+ // message : String written to the TUBE register
+ sw_trace_event2: sequence_id=0x00000001 cluster_and_cpu_id=0x0000 data0=0x000000000401a3dc data1=0x0000000000000000 data2=0x0000000000000000 message="Secure Coherency Enable Start":30
+ .
+ .
+ .
+ PERFORMANCE: 0.2 MIPS (Inst:67217079)
+ sw_trace_event2: sequence_id=0x00000002 cluster_and_cpu_id=0x0000 data0=0x000000000401a581 data1=0x0000000000000000 data2=0x0000000000000000 message="Secure Coherency Enable End":28
+ PERFORMANCE: 0.9 MIPS (Inst:67217301)
+ PERFORMANCE: 5.8 MIPS (Inst:67217511)
+ .
+ .
+ .
+ // Lines beginning with "read_for_<x>_came_from_snoop" are a result of
+ // enabling the event sources for monitoring the cache hits resulting
+ // from snoops originating from master interface <x> on the CCI.
+ // The following line indicates that a snoop from the Kingfisher cluster
+ // hit in the caches of the A15 cluster. It also prints the cache line
+ // address and whether the access was Secure or Non-secure.
+ read_for_4_came_from_snoop: Bus address=0x000000008ff02440 Is non secure=N
+ read_for_4_came_from_snoop: Bus address=0x000000008ff02440 Is non secure=N
+ read_for_4_came_from_snoop: Bus address=0x000000008ff02240 Is non secure=N
+ read_for_4_came_from_snoop: Bus address=0x000000008ff02240 Is non secure=N
+ read_for_4_came_from_snoop: Bus address=0x000000008ff012c0 Is non secure=N
+ PERFORMANCE: 0.0 MIPS (Inst:135292834)
+ sw_trace_event: sequence_id=0x00000010 cluster_and_cpu_id=0x0000 data0=0x000000000810672e data1=0x0000000000000000 data2=0x0000000000000000 message="L2 Flush Begin":15
+ PERFORMANCE: 5.5 MIPS (Inst:135293056)
+ PERFORMANCE: 7.2 MIPS (Inst:135293374)
+ PERFORMANCE: 7.4 MIPS (Inst:135293587)
+ PERFORMANCE: 12.4 MIPS (Inst:135293800)
+ PERFORMANCE: 10.0 MIPS (Inst:135294118)
+ read_for_4_came_from_snoop: Bus address=0x0000000080054a80 Is non secure=Y
+ read_for_4_came_from_snoop: Bus address=0x0000000080054a80 Is non secure=Y
+ read_for_4_came_from_snoop: Bus address=0x0000000080054ac0 Is non secure=Y
+ read_for_4_came_from_snoop: Bus address=0x0000000080054ac0 Is non secure=Y
+ read_for_4_came_from_snoop: Bus address=0x0000000080074c80 Is non secure=Y
+ PERFORMANCE: 0.5 MIPS (Inst:135294331)
+ .
+ .
+ .
+ .
+ PERFORMANCE: 10.5 MIPS (Inst:135541612)
+ PERFORMANCE: 3.3 MIPS (Inst:135541929)
+ sw_trace_event: sequence_id=0x00000011 cluster_and_cpu_id=0x0000 data0=0x0000000008143442 data1=0x0000000000000000 data2=0x0000000000000000 message="L2 Flush End":13
+ .
+ .
+ .
+ .
+
+ Post-processing scripts can be developed which count the number of
+ 'read_for_<x>_came_from_snoop' events between two 'sw_trace_event<x>'
+ events. In the above example, the result will be the number of snoop
+ hits in the A15 caches while they were being flushed. In addition,
+ the "PERFORMANCE" strings can be used to determine the cache hit rate.
+ In this case, they indicate the number of hits in the last 200
+ instructions. Repeated iterations can be done where each iteration
+ changes the point of time when the L2 cache is flushed during a
+ switchover. By monitoring its effect on the cache hit rate, a suitable
+ time can be determined to power down the outbound L2 cache.
+
+E Status of "TUBE" support
+
+ The current version of ARM FastModels (RTSM VE Cortex-A15 KF
+ CCI version 6.2 Beta) implements only one 'tube'
+ interface i.e. TUBE0.
+
+ Subsequent releases will support upto four 'tube' interfaces i.e TUBE0-3.
+ The Virtualizer code has been internally tested to work with all four 'tube'
+ sources and assumes their presence. Writing to a non-existent
+ 'tube' interface is treated as a nop and the trace file will contain
+ messages only from the 'sw_trace_event' source i.e TUBE0.
+
+ (Please correspond with the ARM FastModels team for details on future ARM
+ FastModels releases that will support all four tube interfaces).
diff --git a/docs/05-FAQ.txt b/docs/05-FAQ.txt
new file mode 100644
index 0000000..a2e9790
--- /dev/null
+++ b/docs/05-FAQ.txt
@@ -0,0 +1,18 @@
+Frequently asked questions
+==========================
+
+1. What is the per-core context size that is switched between
+ clusters ?
+
+A. Per-CPU context
+
+ CP15 and VFP context: 768 bytes
+ vGIC Virtual CPU interface (payload view) context: 128 bytes
+ vGIC Virtual CPU interface (HYP mode view) context: 280 bytes
+ vGIC Distributor context (SGIs & PPIs): 128 bytes
+ Virt. Ext. Registers: 40 bytes
+
+ Global context.
+
+ vGIC Distributor context (SPIs): 2048 bytes
+ 2nd stage translation trap context: 40 bytes
diff --git a/docs/06-Optional-rootfs-build.txt b/docs/06-Optional-rootfs-build.txt
new file mode 100644
index 0000000..eb11c60
--- /dev/null
+++ b/docs/06-Optional-rootfs-build.txt
@@ -0,0 +1,125 @@
+Optional Root filesystem build and use instructions
+===================================================
+
+A Introduction
+
+ This note describes ways to build Linux user-land
+ filesystems of varying complexity for use with the
+ virtualizer. Note that there are several ways to create
+ filesystems and this note doesn't cover all possibilities.
+
+ The default virtualizer release contains an empty filesystem
+ stub located at:
+
+ arm-virtualizer-v2.0-ti-040911./bootwrapper/payload/fsimg
+
+ A build using this stub doesn't contain a functional
+ filesytem that the Linux kernel image can use. fsimg can be
+ replaced with a suitable filesystem image but with the
+ following constraints:
+
+ 1. Compressed or uncompressed cpio archives are supported.
+
+ 2. The image size is limited to ~200 MB.
+
+ The size restriction implies that only very 'lean'
+ filesystems such as busybox <http://www.busybox.net/> may be
+ used. While busybox presents a minimal but robust command
+ line environment, quite often a more conventional desktop
+ like environment with window management on top of an X
+ server is required in order to run web browsers etc.
+
+ In this note, we illustrate a method to use a larger (~2GB) filesystem image
+ that can be used with the ARM FastModels MMC emulation. Note that the MMC
+ emulations only supports images that are just under 2GB in size.
+
+ Note that if the MMC route is used, the bootwrapper/payload/fsimg filesystem
+ image will be suppressed and ignored.
+
+ Locating a root filesystem on the MMC emulation allows the Linux kernel to
+ access and use this filesystem. This is facilitated by indicating the
+ filesystem location to the kernel via the kernel command-line arguments by
+ appending 'root=/dev/mmcblk0' (for a single partition MMC image) to the
+ argument list.
+
+ Note that when using this technique, the fsimg file is ignored.
+
+B Building and installing a Linux kernel
+
+ A suitable Linux kernel image for use with the virtualizer
+ can be built as follows:
+
+ $ tar -jxf arm-virtualizer-v2.0-ti-040911.tar.bz2
+ $ cd arm-virtualizer-v2.0-ti-040911/bootwrapper
+ $ make clean
+ $ pushd /tmp
+ $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/maz/ael-kernel.git ael-kernel.git
+ $ cd ael-kernel.git
+ $ git checkout -b ael-11.06 origin/ael-11.06
+ $ yes | make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- vexpress-new_defconfig
+ $ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j4
+ $ popd
+ $ cp $OLDPWD/arch/arm/boot/Image payload/kernel
+
+ Note that the using the vexpress-new_defconfig configuration
+ ensures that the kernel is built with MMC support.
+
+C Building a suitable root filesystem
+
+ A suitable root filesystem can be built using Ubuntu Linux's rootstock utility
+ <https://wiki.ubuntu.com/ARM/RootfsFromScratch> as follows:
+
+ $ sudo apt-get install rootstock
+ $ sudo rootstock --fqdn ubuntu --login ubuntu --password ubuntu --imagesize 2040M --seed lxde,gdm --notarball
+ $ mv qemu-armel-*.img mmc.img
+
+ Note that the complete filesystem build will take ~30
+ minutes. On boot, the username and password is 'ubuntu'.
+
+ The rootstock invocation above will produce a rootfilesystem containing an
+ LXDE desktop <http://lxde.org/> that has a firefox browser.
+
+D Modifying the kernel command line to support the MMC image.
+
+ The virtualizer build system and the mxscripts that are used for launching
+ the ARM FastModel require modifications to support the MMC image.
+
+ The build system modification is to change the Linux kernel command line
+ arguments to make the kernel aware of the location of the root filesystem.
+ The command line should contain the string 'root=/dev/mmcblk0'.
+
+ To make this modification, edit the file bootwrapper/Makefile and change the
+ BOOTARGS specification on line 42 from:
+
+ BOOTARGS=mem=255M console=ttyAMA0,115200 migration_cost=500
+ cachepolicy=writealloc
+
+ to
+
+ BOOTARGS=root=/dev/mmcblk0 mem=255M console=ttyAMA0,115200
+ migration_cost=500 cachepolicy=writealloc
+
+ The ARM FastModel mxscript modification is to get the FastModel to use the
+ mmc.img file created in step C above with the MMC emulation.
+
+ To make this modification uncomment the 'string mmcimage=' line (line 42)
+ and provide the complete path to the mmc.img file generated in step C above.
+
+E Building the virtualizer
+
+ $ cd bootwrapper
+ $ make clean && make
+
+F Launching the ARM FastModel
+
+ $ modeldebugger -s big-little-MP<x>.mxscript
+
+ .. where x is 1 or 4 as the case may be (MP1 build or MP4
+ build).
+
+G Known limitations
+
+ Use of a comprehensive root filesystem as opposed to busybox
+ is known to be unstable on the current ARM FastModel release (Release
+ 6.2 Beta). Subsequent model releases shall contain appropriate fixes as
+ required).