A Survey on Technical Methods for Analyzing and Predicting the Reliability of Large-Scale Distributed Systems

نویسندگانسید علی شریفی-سید مرتضی بابامیر
تاریخ انتشار۲۰۱۴-۱۲-۰۱
نوع نشریهالکترونیکی
نمایه نشریهISC

چکیده مقاله

Reliability of distributed systems revolves around the way probability of free of failure software operating in a specified environment for a specified period with implications for having a huge effect on SDLC( Software Development Life Cycle). The reliability in distributed systems also is one of crucial concerns for all software vendors and end users. Software reliability models in providing reliable software are important since it makes important decisions. These models are powerful victors in the forecast, oversight and assessment of software reliability. Software reliability is categorized into three groups of user Oriented, architecture, state-based that emerge in the design stage of the software development. Big distributed systems are systems with distributed, dynamic, complex and modern computing environment. These environments incorporate several independent or grouped objects which can communicate and cooperate with one another in order to carry out various operations. In this paper we explained various issues of reliability in different computing environments such as Cloud, Grid and SOA, and. It is highly important in defining robust /reliable and fault resilient architectures before way actual commencing of performing its development. The purpose of this paper is to study technical approaches in analyzing and predicting the dependency of large-scale distributed systems. How these models can handle the different determinant aspects of reliability and introduce their own experimental solutions/models? The common goal of all these models is to provide a robust environment to end users either by in advance reliability forecast at design phase or the provision of a comprehensive failure resistant system which in turn permits the applications to carry on their operation fairly under occurrence of faults We studied these models by testing their reliability a well as other various influencing factors over way reliability of whole system and pointed out that way present models possess brilliant areas and finally outlined diversified models for betterment of system reliability. Regarding the reliability computations, Methods and models of various factors can roughly be classified into User Oriented, Architecture, State based models. These models suggested different solutions on the basis of their experiments based on either precedent data of similar applications or on some simulation results or on building some testing environment similar to real environment to test their solutions. Because of Internet indefinite status, companies require that the reliability of business-scale distributed systems must be assessed and forecast continuously. At the early stage of any operating distributed system, certain factors such as User Load, CPU Load, and Network Traffic have significantly less effect on the overall reliability of system; however it poses increasingly needs for comprehensive failure resistant technique.