NETWORK TIMING AND CONFIGURATION CELLS

To improve the performance of simulations containing several target systems, Simics provides the capability to simulate loosely coupled systems in parallel.

Configuration cells are the concept used to manage parallel simulation in Simics.

Each cell contains a set of objects that can be run in parallel to objects in other cells, but that cannot run in parallel to objects in the same cell. Typically, each cell contains a single target machine, with all its components and devices. In most cases the cell partitioning is done automatically by Simics by creating a cell for each top-level component. For some advanced use cases it can be necessary to manually configure the cell partitioning in the simulation. This is covered in detail in Wind River (2014d).

Multithreaded Simics is still deterministic, repeatable, reversible, and check- pointable. The behavior is identical to a single-threaded simulation of the same system. To maintain these properties, objects in different cells may only communicate with each other using links. A link transmits messages between

objects with a latency measured in simulated time—for example, an Ethernet cable. This section provides some details on how multithreading in Simics affects timing and system configuration and will be the foundation for understanding network timing.

To allow multithreaded simulation to perform well, Simics lets each thread run for a certain amount of virtual time on its own before it needs to resynchro- nize with the other threads. This timespan is thesynchronization latency. Because of the synchronization latency, Simics does not allow direct communication between objects of different cells. Even if all accesses were properly locked and performed in a thread-safe way, the objects would have no way to control at what time their access would be done in the other cell, so the simulation would stop being deterministic. This is why all communication between cells should happen over links. A model very similar to this has been applied to SystemC simulations (Weinstock et al., 2014), and it has proven to work well in practice for more than a decade with Simics (Magnusson et al., 2002).

To achieve repeatable and deterministic timing between different runs and different hosts, each link is assigned a user-defined minimum latency. Figure 5.3 illustrates this concept using a network link as an example. There are two nodes, AandB, connected over a network with an assigned latency ofl, which is greater or equal to the minimum latency. A and B are typically simulated in parallel, using one host thread to run each target. A network packet sent from nodeAat its local timetwill be delivered in node Bat its local time t1l, even if the packet in practice reachesBmuch sooner than that. The Simics synchronization mecha- nism guarantees thatAandB will not be separated by more simulated time than the minimum latency and thus will ensure synchronization in less virtual time thanl. This fact ensures that the packet will always be available forB to deliver

Simics A

OS Application

B OS Application

Network link Latency l

B t

t t+l t+p+l

t+p+2l

l p l

The network link takes the packet from A at time t and schedules it to arrive at B at time t+l– this is specified by the Simics network simulation semantics

Each round-trip from A to B and back takes 2 latencies l + the processing time p

FIGURE 5.3

Network timing in Simics.

135 Network Simulation in Simics

at the right time no matter how the actual execution ofAandBevolves on a particular host during a particular run.

Once the packet has reached B, it might compute a reply back toA. The time to compute a reply is shown asp in the diagram inFigure 5.3. At timet1l1p, Bsends the reply towardA. Thus, the network latency is applied to all communi- cations over the link and is visible to user software. Figure 5.4 illustrates this:

the network latency is set to 1 second, and the ping time reported from one target machine to another is about 2,000 milliseconds (in the serial console titled

Serial Console on client_a.uart[0]). If you look closely, you can see that the time reported is 10 20 microseconds more than 2,000 milliseconds, which is the overhead of processing packets in the network stacks on both sides of the connection.

Tweaking the network latency is a common performance optimization for networked simulations. In general, the simulation will run faster with a higher latency, because the amount of synchronization in the simulation is reduced.

This will result in the highest aggregate instruction throughput (simulated target instructions per host second) across the network nodes. However, in the case that the target software has a pinglike behavior, sending data to another node and waiting for a reply before continuing, it is often better to lower the latency. With a lower latency, we get more synchronization in Simics itself and lower aggregate instruction throughput (simulated target instructions per host second). However,

FIGURE 5.4

Ping roundtrip times with a network latency of 1,000 milliseconds.

more of those instructions will be spent doing useful work, and thus the overall progress of the target software will be faster (in terms of user-perceived relevant work per host second).

Latency is also relevant for network throughput for protocols like TCP/IP, where only a certain number of packets are allowed to be in flight at any point in time. A high latency requires a long TCP window to provide the best throughput, and if the latency is too high it might force waits onto TCP.

However, Simics latencies are rarely on the order of magnitude needed to truly starve TCP.

The most appropriate network latency will depend on the application, but in practice we have found that latencies of a few milliseconds offer a good compro- mise for most workloads under most circumstances. It is long enough to allow useful parallel execution, while being short enough that most applications will work well.

NETWORK TIMING AND CONFIGURATION CELLS

AUTOMATING TARGET CONFIGURATION AND BOOT