SIGSIM-PADS '19- Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Full Citation in the ACM Digital Library

SESSION: Session on HPC and Parallel Simulation

Analyzing Cost-Performance Tradeoffs of HPC Network Designs under Different Constraints using Simulations

Identifying a suitable network topology and deciding its optimal configuration parameters are critical aspects of the overall HPC system design, procurement and installation process. Typically, multiple network topology choices are compared under the balanced injection-to-global bandwidth criterion to identify the best candidate. However, deviating from this balanced criterion may not impact application performance adversely and is often done in practice due to other considerations such as monetary cost. In this paper, we identify different practical constraints that determine the number of nodes, routers, and links, and in turn, influence dollar costs and impact network design. We design network topologies under one or more such constraints which represent different design points (iso-{*} analysis). We then perform a comprehensive, comparative evaluation of three scalable network topologies -- dragonfly, express mesh, and fat-tree -- enabled by parallel discrete-event simulations (PDES) of relevant HPC workloads. We identify network topologies that perform best under different iso-{*} configurations and compare their performance per dollar based on market data.

Scalable Performance Prediction of Codes with Memory Hierarchy and Pipelines

We present the Analytical Memory Model with Pipelines (AMMP) of the Performance Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware architecture parameters as input, predicts runtime of that code on the target hardware platform, which is defined in the input parameters. PPT-AMMP transforms the code to an (architecture-independent) intermediate representation, then (i) analyzes the basic block structure of the code, (ii) processes architecture-independent virtual memory access patterns that it uses to build memory reuse distance distribution models for each basic block, (iii) runs detailed basic-block level simulations to determine hardware pipeline usage. Further, PPT-AMMP uses machine learning and regression techniques to build the prediction models based on small instances of the input code, then integrates into a higher-order discrete-event simulation model of PPT running on Simian PDES engine. We validate PPT-AMMP on four standard computational physics benchmarks, finally present a use case of hardware parameter sensitivity analysis to identify bottleneck hardware resources on different code inputs.

An Adaptive Non-Blocking GVT Algorithm

In optimistic Parallel Discrete Event Simulations (PDES), the Global Virtual Time (GVT) computation is an important aspect of performance. It must be performed frequently enough to ensure simulation progress and free memory, while still incurring minimal overhead. Many algorithms have been studied for computing the GVT efficiently under a variety of simulation conditions for a variety of models. In this paper we propose a new GVT algorithm which aims to do two things. First, it incurs a very low overhead on the simulation by not requiring the simulation to block execution. Secondly, and most importantly, it has the ability to adapt to simulation conditions while it's running. This allows it to perform well for a variety of models, and helps remove some burden from developers by not requiring intensive tuning.

SESSION: Session on Nework Simulation/Emulation

A Distributed Virtual Time System on Embedded Linux for Evaluating Cyber-Physical Systems

Cyber-physical systems have a cyber presence, collecting and transmitting data, while also collecting information and modifying the physical surrounding world. In order to evaluate the cyber-security of cyber-physical systems, simulation and modeling is a tool often used. In this work, we develop a distributed virtual time system that enables the synchronization of virtual clocks between physical machines enabling a high fidelity simulation based testing platform. The platform combines physical computing and networking hardware for the cyber presence, while allowing for offline simulation and computation of the physical world. By incorporating virtual clocks into distributed embedded Linux devices, the testbed creates the opportunity to interrupt real and emulated cyber-physical applications to inject offline simulated data values. The ability to run real applications and being able to inject simulated data temporally transparent to the running process allows for high fidelity experimentation. Distributed virtual time enables processes and their clocks to be paused, resumed, and dilated across embedded Linux devices through the use of hardware interrupts and a common kernel module. By interconnecting the embedded devices' general purpose IO pins, they can coordinate and synchronize through a distributed virtual time kernel module with low overhead, under 50 microseconds for 8 processes across 4 embedded Linux devices.

Simulation-based Analysis of Network Rules Matching

A common function in networking is to find the best match between a packet's IP header and a list of matching rules, and to take some action based on the rule which is matched. This approach determines whether a packet transits a firewall or router and which interface is chosen for egress when it does, and whether a Network Address Translation transformation is applied. Considerable past research has optimized data structures and algorithms for rules-matching, under the operating assumption that with every specific application the best match is sought for a single IP flow, with a specified protocol, and specified source and destination IP and port numbers. This paper is motivated by a different scenario, in which we seek the simultaneous determination of the best matches for a bundle of flows. The flows are closely related as the bundle is a contiguous subset of the IP header space, meaning each flow draws in each dimension from the same range as other flows do in that same dimension. This specific problem arises in the design of tools that analyze the connectivity of networks. We consider here two algorithms for approaching this problem, which share the characteristic of generalizing the simulation of how devices typically classify a given flow. We study the behavior of these algorithms empirically, and find that the amortized cost of identifying the best matching rule in an ACL is typically measured in (at most) 10's of micro-seconds on an ordinary laptop computer.

Virtual Time Machine for Reproducible Network Emulation

Reproducing network emulation experiments on diverse physical platforms with varying computation and communication resources is non-trivial. Many state-of-the-art network emulation testbeds do not guarantee timing fidelity. Consequently, results obtained from these testbeds can be misleading, especially when insufficient physical resources are provided to run the experiments. Reproducibility is far from being the norm. In this paper, we present a novel approach that can guarantee reproducible results for network emulation. Our system, called the Virtual Time Machine (VTM), takes advantage of both time dilation and carefully controlled scheduling of the virtual machines. Time dilation allows sufficiently scaled resources to run the experiments in virtual time, and controlled VM scheduling prescribes the precise timing of message passing for distributed applications---independent of the resource provisioning of the underlying physical testbed. Preliminary experiments show that VTM can guarantee reproducible results with varying time dilation, resource subscription, and VM scheduling scenarios.

SESSION: Session on Agent Based Simulation

CoFluences: Simulating the Spread of Social Influences via a Hybrid Agent-Based/Fuzzy Cognitive Maps Architecture

Social influences are key drivers of many human behaviors, and have been the focus of an abundance of discrete simulation models. In participatory modeling, the emphasis is on developing models in an intuitive and transparent manner. Fuzzy Cognitive Mapping (FCM) provides such an intuitive and transparent process, but it can only simulate the thinking of one entity rather than how entities influence each other. Hybrid architectures based on FCM and Agent Based Modeling (ABM) can bridge this gap, but current software implementing these architectures either restricted the models (e.g., limiting agent heterogeneity by requiring that they all follow the same rules) or required extensive coding (which participatory modeling avoids). In this paper, we contribute to software development by presenting CoFluences, and to the theory of modeling and simulation by better characterizing hybrid ABM/FCM architectures. CoFluences is the first software to develop and simulate hybrid ABM/FCM models in a participatory setting, and where agents can follow different rules. Although we take a User-Centered Design approach to develop CoFluences, a comprehensive usability study will be necessary to fully evaluate it in context. In addition, the growing interest in developing simulation software involving FCM will call for more standardization, and for a better understanding of how an FCM behaves in a hybrid simulation.

An Agent-Based Simulation API for Speculative PDES Runtime Environments

Agent-Based Modeling and Simulation (ABMS) is an effective paradigm to model systems exhibiting complex interactions, also with the goal of studying the emergent behavior of these systems. While ABMS has been effectively used in many disciplines, many successful models are still run only sequentially. Relying on simple and easy-to-use languages such as NetLogo limits the possibility to benefit from more effective runtime paradigms, such as speculative Parallel Discrete Event Simulation (PDES). In this paper, we discuss a semantically-rich API allowing to implement Agent-Based Models in a simple and effective way. We also describe the critical points which should be taken into account to implement this API in a speculative PDES environment, to scale up simulations on distributed massively-parallel clusters. We present an experimental assessment showing how our proposal allows to implement complicated interactions with a reduced complexity, while delivering a non-negligible performance increase.

Round-based Super-Individuals - Balancing Speed and Accuracy

Agent- or individual-based models which are based on a continuous-time Markov chain semantics are increasingly receiving attention in simulation. To reduce computational cost, model aggregation techniques based on Markov chain lumping can be leveraged. However, for models with nested, attributed agents, and arbitrary functions determining their dynamics it is not trivial to find a partition that satisfies the lumpability conditions. Thus, we exploit the potential of the so-called super-individual approaches where sub-populations of agents are approximated by representatives based on some criteria for similarity, and propose a round-based execution scheme to balance speed and accuracy of the simulations. For realization we use an expressive rule-based modeling and simulation framework, evaluate the performance using a fish habitat model, and discuss open questions for future research.

Modeling Human Temporal Dynamics in Agent-Based Simulations

Time-based habitual behavior is exhibited in humans globally. Given that sleep has such an innate influence on our daily activities, modeling the patterns of the sleep cycle in order to understand the extent of its impact allows us to also capture stable behavioral features that can be utilized for predictive measures. In this paper we show that patterns of temporal preference are consistent and resilient across users of several real-world datasets. Furthermore, we integrate those patterns into large-scale agent-based models to simulate the activity of users in the involved datasets to validate predictive accuracy. Following simulations reveal that incorporating clustering features based on time-based behavior into agent-based models not only result in a significant decrease in computational overhead, but also result in predictive accuracy comparable to the baseline models.

SESSION: Session on Applications

Mechanisms for Cell-to-cell and Cell-free Spread of HIV-1 in Cellular Automata Models

Several discrete simulation models have been created to study the spread of human immunodeficiency virus type 1 (HIV-1) within a human body. This is motivated both by the prevalence of the virus, and by the possibility of asking questions in simulations that would be unethical to test in trials. Among discrete simulation techniques, cellular automata (CA) have been particularly used in HIV-1 research. CA commonly assume that a cell is almost exclusively infected by neighboring cells (i.e., cell-to-cell transmission), and that more distal cells (i.e., cell-free transmission) have an extremely small probability to transmit the disease. The mechanisms are more nuanced in recent biological research, suggesting that cell-to-cell transmission may account for about 60% of all transmissions. We show that a representative sample of five previously validated CA models of HIV-1 can all be altered (by changing neighborhood structures and infection probabilities) to produce a realistic share of cell-to-cell and cell-free viral transmissions. Increasing the realism for modes of transmission, however, has mixed consequences on preserving the models' validity: their predictions at 600 weeks are generally unchanged, but viral dynamics are markedly different. We offer several suggestions to create CA models of HIV-1 with realistic infections and plausible viral dynamics.

Transitioning Spiking Neural Network Simulators to Heterogeneous Hardware

Spiking neural networks (SNN) are among the most computationally intensive types of simulation models, with node counts on the order of up to 10^11. Currently, there is intensive research into hardware platforms suitable to support large-scale SNN simulations, whereas several of the most widely used simulators still rely purely on the execution on CPUs. Enabling the execution of these established simulators on heterogeneous hardware allows new studies to exploit the many-core hardware prevalent in modern supercomputing environments, while still being able to reproduce and compare with results from a vast body of existing literature. In this paper, we propose a transition approach for CPU-based SNN simulators to enable the execution on heterogeneous hardware (e.g., CPUs, GPUs, and FPGAs) with only limited modifications to an existing simulator code base, and without changes to model code. Our approach relies on manual porting of a small number of core simulator functionalities as found in common SNN simulators, whereas unmodified model code is analyzed and transformed automatically. We apply our approach to the well-known simulator NEST and make a version executable on heterogeneous hardware available to the community. Our measurements show that at full utilization, a single GPU achieves the performance of about 9 CPU cores.

Scaffolded Training Environment for Physics Programming (STEPP): Modeling High School Physics using Concept Maps and State Machines

We are a year into the development of a software tool for modeling and simulation (M&S) of 1D and 2D kinematics consistent with Newton's laws of motion. Our goal has been to introduce modeling and computational thinking into learning high-school physics. There are two main contributions from an M&S perspective: (1) the use of conceptual modeling, and (2) the application of Finite State Machines (FSMs) to model physical behavior. Both of these techniques have been used by the M&S community to model high-level "soft systems" and discrete events. However, they have not been used to teach physics and represent ways in which M&S can improve physics education. We introduce the NSF-sponsored STEPP project along with its hypothesis and goals. We also describe the development of the three STEPP modules, the server architecture, the assessment plan, and the expected outcomes.

SESSION: Session on Nework Simulation/Emulation

Fit Fly: A Case Study on Interconnect Innovation through Parallel Simulation

To meet the demand for exascale-level performance from high-performance computing (HPC) interconnects, many system architects are turning to simulation results for accurate and reliable predictions of the performance of prospective technologies. Testing full-scale networks with a variety of benchmarking tools, including synthetic workloads and application traces, can give crucial insight into what ideas are most promising without needing to physically construct a test network.

While flexible, however, this approach is extremely compute time intensive. We address this time complexity challenge through the use of large-scale, optimistic parallel simulation that ultimately leads to faster HPC network architecture innovations. In this paper we demonstrate this innovation capability through a real-world network design case study. Specifically, we have simulated and compared four extreme-scale interconnects: Dragonfly, Megafly, Slim Fly, and a new dual-rail-dual-plane variation of the Slim Fly network topology.

We present this new variant of Slim Fly, dubbed Fit Fly, to show how interconnect innovation and evaluation---beyond what is possible through analytic methods---can be achieved through parallel simulation. We validate and compare the model with various network designs using the CODES interconnect simulation framework. By running large-scale simulations in a parallel environment, we are able to quickly generate reliable performance results that can help network designers break ground on the next generation of high-performance network designs.

Virtual-Time-Accelerated Emulation for Blockchain Network and Application Evaluation

Blockchain technologies are in the ascendant of transforming the ways we manage contracts, make transactions, and manifest own- ership of property. The trend calls for a realistic testing and evalua- tion platform for blockchain applications and systems. We present Minichain, a container-based emulator that allows testing proof- of-work-based blockchains on a commodity computer. Minichain contains a realistic and configurable network environment, which is missing in today's blockchain testbeds. This unique feature enables us to evaluate the impact of network events (e.g., cyber-attacks) and conditions (e.g., congested or failed links) on blockchain appli- cations. Meanwhile, Minichain allows the direct execution of un- modified application code in the containers for fidelity, and utilizes the virtual time technique to speed up experiments and improve the system scale that one can accurately emulate. In particular, we mathematically analyze the convergence of the proof-of-work- based consensus algorithm to show the effectiveness of virtual time. We evaluate the performance of Minichain across both net- work layer and application layer, and demonstrate its usability by emulating a selfish mining attack initiated from the network layer

Modeling and Analysis of Application Interference on Dragonfly+

Dragonfly class of networks are considered as promising interconnects for next-generation supercomputers. While Dragonfly+ networks offer more path diversity than the original Dragonfly design, they are still prone to performance variability due to their hierarchical architecture and resource sharing design. Event-driven network simulators are indispensable tools for navigating complex system design. In this study, we quantitatively evaluate a variety of application communication interactions on a 3,456-node Dragonfly+ system by using the CODES toolkit. This study looks at the impact of communication interference from a user's perspective. Specifically, for a given application submitted by a user, we examine how this application will behave with the existing workload running in the system under different job placement policies. Our simulation study considers hundreds of experiment configurations including four target applications with representative communication patterns under a variety of network traffic conditions. Our study shows that intra-job interference can cause severe performance degradation for communication-intensive applications. Inter-job interference can generally be reduced for applications with one-to-one or one-to-many communication patterns through job isolation. Application with one-to-all communication pattern is resilient to network interference.

SESSION: Session on Modeling Methodology

From Effects to Causes: Reversible Simulation and Reverse Exploration of Microscopic Traffic Models

We propose an approach for reverse-in-time exploration of the state space of microscopic traffic simulations starting from a user-specified class of outcomes. As a basis for our approach, we present a reversible execution scheme applicable to common car-following and lane-changing models from the traffic simulation literature. The execution scheme permits perfect reversal of a previous forward simulation, which to our knowledge has not been attempted previously in the context of established traffic simulation models. Further, we perform reverse state space explorations directly from user-specified simulation states, i.e., reverse-in-time model checking. By exploring all sequences of possible previous states from a final state, reachability questions can be answered more conclusively than purely through forward simulations. In a case study, reverse exploration is used to identify conditions that lead to specified accident situations, with running time reductions by factors of more than 20 compared to traditional forward exploration.

Capturing and Reporting Provenance Information of Simulation Studies Based on an Artifact-Based Workflow Approach

Provenance comprises information about how a product has been generated in a process. Thus, provenance information about an entire simulation study would support the interpretation and reuse of the developed simulation model and simulation experiments. However, current approaches only support to capture parts of the provenance information of a simulation study, i.e., the provenance information of the simulation data generated by individual simulation experiments. In this work, we extend a declarative, artifact-based workflow to capture provenance information about an entire simulation by observing the user in the study process. The workflow relates the building processes of central products of a simulation study, such as the conceptual model, requirements, input data, simulation model, and simulation experiments. Additionally, the workflow guides the modeler through the simulation study process while ensuring the consistency between its products. Further, we also develop different strategies to report the captured provenance information. These enable the user to respectively understand the simulation study at different levels of abstractions.

Using Scientific Visualization Techniques to Visualize Parallel Network Simulations

Although parallel discrete event simulation has been used to simulate and study the performance of various network topologies, little effort has been spent on visualizing the time series data that result from these simulations. Visualization can be useful in multiple aspects of simulation, from debugging and validating models to gaining deeper insights from the data. In this paper, we present our preliminary work in developing 3-dimensional animations of data from optimistic parallel discrete event simulations. The visualizations are developed by using VTK and ParaView, and examples are shown on fat-tree and dragonfly network models using the ROSS simulator. We also discuss our plans for future work with the visualizations and their integration into an in situ analysis and visualization system being currently developed.

Proactive Service Recovery in Emergency Departments: A Hybrid Modelling Approach using Forecasting and Real-Time Simulation

This work in progress is an application of a hybrid modelling (HM) approach for short-term decision support in urgent and emergency healthcare. It uses seasonal ARIMA time-series forecasting to predict emergency department (ED) overcrowding in a near-future moving window (1-4 hours) using data downloaded from a digital platform (NHSquicker). NHSquicker delivers near real-time wait times from multiple centres of urgent care in the South-West of England. Alongside historical distributions, this near real-time data is used to populate an ED discrete event simulation model. The ARIMA forecasts trigger real-time simulation experimentation of ED scenarios including proactive diversion of low-acuity patients to alternative facilities in the urgent care network.