This paper presents an approach for optimizing the robustness of production and logistic systems based on deep generative models, a special method of deep learning. Robustness here refers to setting controllable factors of a system in such a way that variance in the uncontrollable factors (noise) has a minimal effect on given output parameters. In a case study, the proposed method is tested and compared to a traditional method for robustness analysis. The basic idea is to use deep neural networks to generate data for experiment plans and rate them by use of a simulation model of the production system. We propose to use two Generative Adversarial Networks (GANs) to generate optimized experiment plans for the decision factors and the noise factors, respectively, in a competitive, turn-based game. In one turn, the controllable factors are optimized and the noise remains constant, and vice versa in the next turn. For the calculations of the robustness, the planned experiments are conducted and rated using a simulation model in each learning step.
We introduce ABIDES, an open source Agent-Based Interactive Discrete Event Simulation environment. ABIDES is designed from the ground up to support agent-based research in market applications. While proprietary simulations are available within trading firms, there are no broadly available high-fidelity market simulation environments. ABIDES enables the simulation of tens of thousands of trading agents interacting with an exchange agent to facilitate transactions. It supports configurable pairwise noisy network latency between each individual agent as well as the exchange. Our simulator's message-based design is modeled after NASDAQ's published equity trading protocols ITCH and OUCH. We introduce the design of the simulator and illustrate its use and configuration with sample code, validating the environment with example trading scenarios. The utility of ABIDES for financial research is illustrated through experiments to develop a market impact model. The core of ABIDES is a general-purpose discrete event simulation, and we demonstrate its breadth of application with a non-finance work-in-progress simulating secure multiparty federated learning. We close with discussion of additional experimental problems it can be, or is being, used to explore, such as the development of machine learning trading algorithms. We hope that the availability of such a platform will facilitate research in this important area.
A rollback operation in a speculative parallel discrete event simulator has traditionally targeted the perfect reconstruction of the state to be restored after a timestamp-order violation. This imposes that the rollback support entails specific capabilities and consequently pays given costs. In this article we propose approximated rollbacks, which allow a simulation object to perfectly realign its virtual time to the timestamp of the state to be restored, but lead the reconstructed state to be an approximation of what it should really be. The advantage is an important reduction of the cost for managing the state restore task in a rollback phase, as well as for managing the activities (i.e. state saving) that actually enable rollbacks to be executed. Our proposal is suited for stochastic simulations, and explores a tradeoff between the statistical representativeness of the outcome of the simulation run and the execution performance. We provide mechanisms that enable the application programmer to control this tradeoff, as well as simulation-platform level mechanisms that constitute the basis for managing approximate rollbacks in general simulation scenarios. A study on the aforementioned tradeoff is also presented.
The paper introduced a novel state-rollback mechanism named approximated rollbacks for speculative parallel discrete event simulators. The artifact of the paper is available online and is properly documented. It contains a simulation framework comprising of a parallel discrete event simulator called ROOT-Sim and a set of Application Programming Interfaces (APIs) for approximated rollbacks. This simulation framework can be employed in a wide range of discrete-event-based simulation scenarios. The experiment results were successfully replicated. Therefore, I assign all the functional, reusable, available, and results replicated badges to this paper.
Traditional parallel discrete event simulation (PDES) systems treat each simulation thread in the same manner, regardless of whether a thread has events to process in its input queue or not. At the same time, many real-life simulation models exhibit significant execution locality, where only part of the model (and thus a subset of threads) are actively sending or receiving messages in a given time period. These inactive threads still continuously check their queues and participate in simulation-wide time synchronization mechanisms, such as computing Global Virtual Time (GVT). This wastes resources, ties up CPU cores with threads that offer no contribution to event processing and limits the performance and scalability of the simulation.
In this paper, we propose a new paradigm for managing PDES threads that we call Demand-Driven PDES (DD-PDES). The key idea behind DD-PDES is to identify threads that have no events to process and de-schedule them from the CPU until they receive a message requiring event processing. Furthermore, these inactive threads are also excluded from participation in the GVT computation, accelerating that process as a result. DD-PDES ensures that the CPU cycles are mostly spent on actual event processing, resulting in performance improvements. This architecture allows for significant over-subscription of threads by exceeding the number of available hardware thread contexts on the chip. We demonstrate that on a Knights Landing processor, DD-PDES significantly outperforms the traditional simulation equipped with the best currently proposed GVT algorithms.
Reducing the waste of resource usage (e.g., CPU-cycles) when a causality error occurs in speculative parallel discrete event simulation (PDES) is still a core objective. In this article, we target this objective in the context of speculative PDES run on top of shared-memory machines. We propose an Operating System approach that is based on the exploitation of the Inter-Processor-Interrupt (IPI) facility offered by off-the-shelf hardware chipsets, which enables cross-CPU-core control of the execution flow of threads. As soon as a thread T produces a new event placed in the past virtual time of a simulation object currently run by another thread T', our IPI-based support allows T to change the execution flow of T'---with very minimal delay---so to enable the early squash of the currently processed (and no longer consistent) event. Our solution is fully transparent to the application level code, and is coupled with a lightweight heuristic-based mechanism that determines the actual goodness of killing thread T' via the IPI (rather than skipping the IPI send) depending on the expected residual execution time of the incorrect event being processed. We integrated our proposal within the speculative open-source USE (Ultimate Share Everything) PDES package, and we report experimental results obtained by running various PDES models on top of two shared-memory hardware architectures equipped with 32 and 24 (48 Hyper-threads) CPU-cores, which demonstrate the effectiveness of our proposal.
The increasing complexity and heterogeneity of systems at large scale, combined with challenging characteristics of applications driven by data dominated by adaptivity and irregularity, pose a need for fundamental rethinking and retooling of modeling and simulation (ModSim) for systems and applications. ModSim as a science and practice will be discussed from the perspectives of methods and tools and its myriad of uses, such as system-application co-design, performance prediction, or system and application optimization. To achieve this demanding goal, the presentation initially will offer an analysis and critique of the traditional methodologies and their state of the art. Then, attention will focus on new ideas related to machine learning-both as an increasingly important application workload and a method for ModSim. The context of mapping these applications to leading-edge systems will include analysis and the need for "dynamic performance modeling" as an actionable way to optimize effectively for performance during execution. Throughout, particular emphasis will be on methods and practices that are practical, accurate, and can be applied to extreme-scale computing (as broadly defined).
State fast-forwarding has been proposed as a method to reduce the computational cost of microscopic traffic simulations while retaining per-vehicle trajectories. However, since fast-forwarding relies on vehicles isolated on the road, its benefits extend only to situations of sparse traffic. In this paper, we propose fast-forwarding of vehicle clusters by training artificial neural networks to capture the interactions between vehicles across multiple simulation time steps. We explore various configurations of neural networks in light of the trade-off between accuracy and performance. Measurements in road network simulations demonstrate that cluster fast-forwarding can substantially outperform both time-driven state updates and single-vehicle fast-forwarding, while introducing only a small deviation in travel times.
Cellular Automata have been used on many occasions to model the spread of the Human Immunodeficiency Virus (HIV) within a human body. This is in part due to the relative simplicity of crafting their rules and the convenience of visualizing disease dynamics in 2D. Although such models appeared in 2001 and have been extended in dozens of studies, their potential to serve as a virtual laboratory has been limited by their computationally intensive nature. So far, they have been used to simulate at most 0.5 million cells instead of the billion cells that may harbor the virus. Simulating too few cells is a key issue for calibration (the 'small' models are calibrated based on results observed in a much larger space), prevents us from using a sufficient proportion of cells to model latent HIV reservoirs (in which HIV can hide for years), and prohibits even more computationally intensive aspects such as tracking mutations (which is essential to assess drug resistance). In short, the low number of cells prevents these models from answering many of the questions that would make them useful as virtual laboratories. Although the models may be scaled by running on clusters, this is not always an option since interdisciplinary research in discrete models of HIV often takes place on the lab's computer, and patients for whom we seek to provide virtual laboratories may have limited access to computational resources. Given these constraints, we demonstrate how to optimize simulations of HIV on a workstation by combining features such as just-in-time compilation, parallelism at the level of threads, pseudo random number generators, and simplified handling of neighbors in a cellular automaton. Our results demonstrate that, within 10 minutes, we can finish a simulation run for 6.7 billion cells instead of 60,000 cells in an unoptimized simulation.
The paper, whose reproducibility is assessed in this report, proposes a methodology aimed at improving the performance of Cellular Automata-based HIV models. The artifact is available online with instructions to replicate the results, which have been added by the authors upon request. The author of this report confidently assigns the functional, available and results replicated badges to this paper.
Speeding up the simulation of discrete-event wafer fab models is essential because optimizing the scheduling and dispatching policies under various circumstances requires repeated evaluation of the decision candidates during parameter-space exploration. In this paper, we present a runtime abstraction-level conversion approach for discrete-event wafer-fabrication (wafer-fab) models to gain simulation speedup. During the simulation, if a machine group of the wafer fab models reaches a steady state, then the proposed approach attempts to substitute this group model with a mean-delay model (MDM) as a high abstraction level model. The MDM abstracts the detailed operations of the group's sub-component models into an average delay based on the queueing modeling, which can guarantee acceptable accuracy under steady state. The proposed abstraction-level converter (ALC) observes the queueing parameters of low-level groups to identify the convergence of each group's work-in-progress (WIP) level through a statistical test. When a group's WIP level is converged, the output-to-input couplings between the models are revised to change a wafer-lot process flow from the low-level group to a mean-delay model. When the ALC detects a divergence caused by a re-entrant flow or a machine-down, the high-level model is switched back to its corresponding low-level group model. The ALC then generates dummy wafer-lot events to synchronize the busyness of high-level steady state. The proposed method was applied to case studies of wafer-fab systems and achieves simulation speedup from 6.1 to 11.8 times with corresponding 2.5 to 5.9% degradation inaccuracy.
While transitioning to exascale systems, it has become clear that power management plays a fundamental role to support a viable utilization of the underlying hardware, also performance-wise. To meet power restrictions imposed by future exascale supercomputers, runtime environments will be required to enforce self-tuning schemes to run dynamic workloads under an imposed power cap. Literature results show that, for a wide class of multi-threaded applications, tuning both the degree of parallelism and frequency/voltage of cores allows a more effective use of the budget, compared to techniques that use only one of these mechanisms in isolation. In this paper, we explore the issues associated with applying these techniques on speculative Time-Warp based simulation runtime environments. We discuss how the differences in two antithetical Time Warp-based simulation environments impact the obtained results. Our assessment confirms that the performance gains achieved through a proper allocation of the power budget can be significant. We also identify the research challenges that would make these form of self-tuning more broadly applicable.
High performance computing (HPC) systems are large-scale computing systems with thousands of compute nodes. Massive energy consumption is a critical issue for HPC systems. In this paper, we develop an auction mechanism model for energy consumption reduction in an HPC system. Our proposed model includes an optimized resource allocation scheme for HPC jobs based on processor frequency and a Vickery-Clarke-Groove (VCG)-based forward auction model to enable energy reduction participation from HPC users. The model ensures truthful participation from HPC users, where users benefit from revealing their true valuation of energy reduction. We implement a job scheduler simulator and our mechanism model on a parallel discrete-event simulation engine. Through trace-based simulation, we demonstrate the effectiveness of our auction mechanism model. Simulation shows that our model can achieve overall energy reduction for an HPC system, while ensuring truthful participation from the users.
During long-term operation of a high-performance computing (HPC) system with thousands of components, many components will inevitably fail. The current trend in HPC interconnect router linkage is moving away from passive copper and toward active optical-based cables. Optical links offer greater bandwidth maximums in a smaller wire gauge, less signal loss, and lower latency over long distances and have no risk of electromagnetic interference from other nearby cables. The benefits of active optical links, however, come with a cost: an increased risk of component failure compared with that of passive copper cables.
One way to increase the resilience of a network is to add redundant links; if one of a multiplicity of links between any two routers fails, a single hop path will still exist between them. But adding redundant links comes at the cost of using more router ports for router-router linkage, reducing the maximum size of the network with a fixed router radix. Alternatively, a secondary plane of routers can be added to the interconnect, keeping the number of compute node endpoints the same but where each node has multiple rails of packet injection, at least one per router plane. This multirail-multiplanar type of network interconnect allows the overall size of the network to be unchanged but results in a large performance benefit, even with lower-specification hardware, while also increasing the resilience of the network to link failure.
We extend the CODES framework to enable multirail-multiplanar 1D-Dragonfly and Megafly networks and to allow for arbitrary link failure patterns with added dynamic failure-aware routing so that topology resilience can be measured. We use this extension to evaluate two similarly sized 1D-Dragonfly and Megafly networks with and without secondary router planes, and we compare their application communication performance with increasing levels of link failure.
The performance of Agent-based Traffic Simulations (ABTS) has been shown to benefit tremendously from offloading to accelerators such as GPUs. In the search for the most suitable hardware platform, reconfigurable hardware is a natural choice. Some recent work considered ABTS on Field-Programmable Gate Arrays (FPGAs), yet only implemented simplified cellular automaton-based models. The recent introduction of support for high-level synthesis from C, C++, and OpenCL in FPGA tool chains allows FPGA designs to be expressed in a form familiar to software developers. However, the performance achievable with this approach in a simulation context is not well-understood. In this work, to the best of our knowledge, we present the first FPGA-accelerated ABTS based on widely-accepted microscopic traffic simulation models, and the first to be generated from high-level code. The achieved speedup of up to 24.3 over a sequential CPU-based execution indicates that recent FPGA toolchains allow simulationists to unlock the performance benefits of reconfigurable hardware without the need to express the simulation models in low-level hardware description languages.
Agent-to-agent communications is an important operation in multi-agent systems and their simulation. Given the data-centric nature of agent-simulations, direct agent-to-agent communication is generally an orthogonal operation to accessing shared data in the simulation. In distributed multi-agent-system simulations in particular, implementing direct agent-to-agent communication may impose serious performance degradation due to potentially large communication and synchronization overheads. In this paper, we propose an efficient agent-to-agent communication method in the context of optimistic distributed simulation of multi-agent systems. An implementation of the proposed method is demonstrated and quantitatively evaluated through its integration into the PDES-MAS simulation kernel.
Development of crowd evacuation systems is a challenge due to involvement of complex interrelated aspects, diversity of involved individuals and/or environment, and lack of direct evidence. Evacuation modeling and simulation is used to analyze various possible outcomes as different scenarios unfold, typically when the complexity of scenario is high. However, incorporation of different aspect categories in a unified modeling space is a challenge. In this paper, we addressed this challenge by combining individual, social and technological models of people during evacuation, while pivoting all these aspects on a common agent-based modeling framework and a grid-based hypothetical environment. By simulating these models, an insight into the effectiveness of several interesting evacuation scenarios is provided. Based on the simulation results, a couple of useful recommendations are also given. The most important recommendation is not to use potential field indicating the exits dynamics as an exit strategy particularly for a spatial complexity environment.
Bike sharing systems are a popular form of sustainable and affordable transport that has been introduced to cities around the world in recent years. Nevertheless, designing these systems to meet the requirements of the operators and also satisfy the demand of the users, is a complex problem. In this paper we focus on the recently introduced bike sharing system in the city of Edinburgh and use data analytics combined with formal modelling approaches to investigate the current behaviour and possible future behaviour of the system. Specifically we use a spatio-temporal logic, SSTL (the signal spatio-temporal logic), to formally characterise properties of the captured system, and through this identify potential problems as user demand grows. In order to investigate these problems further we use the CARMA modelling language and tool suite to construct a stochastic model of the system to investigate possible future scenarios, including decentralised redistribution. This model is parameterised and validated using data from the operational system.
The authors request for this paper the following badges: (1) Artifacts Available (2) Artifacts Evaluated - Functional (3) Artifacts Evaluated Reusable (4) Results Replicated. After the review process, all of them were assigned, as the artifact met all the requirements.
The COVID-19 pandemic represents an unprecedented global crisis and serves as a reminder of the social, economic and health burden of infectious diseases. The ongoing trends towards urbanization, global travel, climate change and a generally older and immuno-compromised population continue to make epidemic planning and control challenging. Recent advances in computing, AI, and bigdata have created new opportunities for realizing the vision of real-time epidemic science.
In this talk I will describe our group's work developing scalable and pervasive computing-based concepts, theories and tools for planning, forecasting and response in the event of epidemics. I will draw on our work in supporting federal agencies as they plan and respond to the COVID-19 pandemic outbreak. I will end the talk by outlining directions for future work.
Succinct, declarative, and domain-specific modeling languages have many advantages when creating simulation models. However, it is often challenging to efficiently execute models defined in such languages. We use code generation for model-specific simulators. Code generation has been successfully applied for high-performance algorithms in many application domains. By generating tailored simulators for specific simulation models defined in a domain-specific language, we get the best of both worlds: a succinct, declarative and formal presentation of the model and an efficient execution. We illustrate this based on a simple domain-specific language for biochemical reaction networks as well as on the network representation of the established BioNetGen language. We implement two approaches adopting the same simulation algorithms: one generic simulator that parses models at runtime and one generator that produces a simulator specialized to a given model based on partial evaluation and code generation. Akin to profile-guided optimization we also use dynamic execution of the model to further optimize the simulators. The performance of the approaches is carefully benchmarked using representative models of small to mid-sized biochemical reaction networks. The generic simulator achieves a performance similar to state of the art simulators in the domain, whereas the specialized simulator outperforms established simulation algorithms with a speedup of more than an order of magnitude. Both implementations are available online to the community under a permissive open-source license.
The authors request for this paper the following badges: (1)Artifacts Available(2)Artifacts Evaluated ? Functional(3)Artifacts Evaluated? Reusable(4)Results Replicated. After the review process, all of them were assigned, as the artifact met all the requirements.
Network emulators enable rapid prototyping and testing of applications. In a typical emulation the execution order and process execution burst lengths are managed by the host platform's operating system, largely independent of the emulator. Timer based mechanisms are typically used, but the imprecision of timer firings introduces imprecision in the advancement of time. This leads to statistical variation in behavior which is not due to the model.
We describe Kronos, a small set of modifications to the Linux kernel that use precise instruction level tracking of process execution and control over execution order of containers, and so improve the mapping of executed behavior to advancement in time. This, and control of execution and placement of emulated processes in virtual time make the behavior of the emulation independent of the CPU resources of the platform which hosts the emulation. Under Kronos each process has its own virtual clock which is advanced based on a count of the number of x86 assembly instructions executed by its children. We experimentally show that Kronos is scalable, in the sense that the system behavior is accurately captured even as the size of the emulated system increases relative to fixed emulation resources. We demonstrate the impact of Kronos' time advancement precision by comparing it against emulations which like Kronos are embedded in virtual time, but unlike Kronos rely on Linux timers to control virtual machines and measure their progress in virtual time. We also present two useful applications where Kronos aids in generating high fidelity emulation results at low hardware costs: (1) analysing protocol performance and (2) enabling analysis of cyber physical control systems.
With the significantly growing investment in quantum communi-cations, quantum key distribution (QKD), as a key application toshare a security key between two remote parties, has been deployedin urban areas and even at a continental scale. To meet the designrequirements of QKD on a quantum communication network, todayresearchers extensively conduct simulation-based evaluations in ad-dition to physical experiments for cost efficiency. A practical QKDsystem must be implemented on a large scale via a network, notjust between a few pairs of users. Existing discrete-event simulatorsoffer models for QKD hardware and protocols based on sequentialexecution. In this work, we investigate the parallel simulation ofQKD networks for scalability enhancement. Our contributions layin the exploration of QKD network characteristics to be leveragedfor parallel simulation. We also develop a parallel simulator forQKD networks with an optimized scheme for network partition.Experimental results show that to simulate a 64-node QKD net-work, our parallel simulator can complete the experiment 9 timesfaster than a sequential simulator running on the same machine.Our linear-regression-based network partition scheme can furtheraccelerate the simulation experiments up to two times than using arandomized network partition scheme.