Vertical and horizontal integration of various systems in production companies is a logical trend. MES systems play an irreplaceable role in bridging the gap between enterprise information systems (ERP) and production automation (PLC) systems. While enterprise information systems are centralized, production controllers are inherently distributed. Distributed MES systems try to deal with these systems. They are based on the principle of multi-agent systems with different nature of these systems.

Conventional centralized systems (especially planning systems and centralized control systems) face the new demands of a modern manufacturing environment. These include:

  • Unpredictable order development - changes in production orders that are already in progress.
  • Changes in the production environment - changes in the workshop environment also occur during the execution of orders.
  • Complexity of production environment and orders - modern production environment is characterized by increasing complexity of orders and high degree of variability in the arrangement and setup of production facilities.

Due to their hierarchical nature, centralized systems are considered to be highly static with a low degree of adaptability to increasingly dynamic changes in orders and manufacturing environments. The decision-making process is concentrated on the top layers of the imaginary pyramid of the company's information systems (especially in ERP systems and related tools). This makes production planning difficult to respond to dynamic changes and emerging exceptions in the lowest levels of the manufacturing environment.

Increasing flexibility in production can be achieved by applying two main strategies:

  • Moving some decision-making processes from enterprise systems (ERP) to a lower layer of control information systems (MES), which is characterized by shorter planning cycles and faster response to change.
  • By distributing the decision-making process to a set of independently functioning entities that are able to realize and optimize the process through mutual cooperation, self-ability, availability of local information and resources.

In order to implement these principles, the Distributed Control Systems (DCS) paradigm has been defined. The basic idea of DCS systems is to distribute decision-making and system functionalities to independently functioning entities called "holon" or "agent".

en Distributed control systems diference

So far, it seems unrealistic that in the near future, based on the distribution paradigm, current ERP systems will be operational. The main platform for the implementation of this principle therefore occupy MES systems. However, these are usually based on the principle of centralization. The next section describes how to distribute typical MES functions. First, let's look at one particular architecture to describe the individual entities used to distribute MES functions.

Distributed system elements

Take the PABADIS concept as an example of distributed architecture, developed under the auspices of the European Union (FP6) with the participation of SAP, Siemens, the Austrian Academy of Sciences, Fiat and others.
PABADIS delivers complete vertical integration of enterprise ERP, MES and automation systems based on the distributed systems paradigm.

There are three basic entities on the ERP interface side:

  • Order Agent Supervisor (OAS) - managing production orders sent from the ERP system to the MES system and processed by Order Agents (OA).
  • Resource Agent Supervisor (RAS) - a direct interface between the ERP system and the Resource Agents (RA). It allows ERP to influence resource utilization.
  • Product Data Repository (PDR) - used by ERP and OA to exchange production data. PDR provides the data (operation codebooks, material codebooks…) that OA agents need to execute a production order.

The core of the MES system consists of:

  • Order Agents (OA) - representing the production order.

On the interface side with automation systems stands:

  • Resource Agents (RA) - representing the production resource.

Auxiliary tools for MES are:

  • Ability Broker (AB) - manages the RA resource database and provides OA information.
  • Information Collector (IC) - manages historical production order execution and resource utilization data. Data can be used by internal entities (RAS…) or external systems.
  • Device Observer (DO) - Auxiliary agent that searches for and registers new resources in MES. It establishes communication with the newly integrated resource, informs other agents of the existence of the new resource, and initiates a process allowing communication between the control device and its RA.

OA and RA are responsible for planning the execution of production orders, including securing production resources. The decision-making process is carried out by a group of OA who work independently but coordinate their actions and decisions according to the production orders they execute and a set of rules that ensure the selflessness of these agents in their decisions.

Architektura MES v distribuovaném systému

Life cycle of production order in distributed system

The distributed approach paradigm also affects the life cycle of a production order. Let's take a look at what steps a lifecycle of a contract consists of:

  • ERP sends production order to OAS.
  • OAS decomposes the order, creates an OA and assigns the relevant production order segment.
  • OA receives production data from PDR
  • OA requests AB for available RA resources
  • OA schedules the operation by reserving one RA
  • RA performs the operation
  • OA sends the report to the OAS after completion of the operation and terminates
  • The OAS forwards the report to the ERP

Planning in a distributed system

Planning is an important step in the life cycle of a production order. Planning in distributed systems consists of resource-oriented planning and custom-oriented rescheduling. In the first stage, OA determines the timeframe for execution of the assigned production order segment.In the next phase, OA will ask AB for resources with the necessary capabilities. Production order operations are not tied to specific machines, but only refer to the required resource capabilities (resource types). This increases system flexibility in order rescheduling. The OA receives the RA address that is able to execute the given segment of the order. OA sends this RA the time frame within which the segment should be implemented. In the next phase, the selected (leading) RA communicates with other RAs having the same capabilities. The RA manager addresses similar RAs asking for their availability within the required timeframe. Available RAs are assigned to the created cluster. After the cluster is created, the search for a quasi-optimal solution begins. Individual RAs create solution proposals from which the RA manager chooses. RAs use an evaluation function that assesses the availability and cost of resources as well as the length of downtime and machine runtime. The aim is to create an optimal solution for a given resource with regard to efficient use of all resources. After the RA manager receives all solution proposals from individual cluster members, based on the parameters given by OA or set globally for the entire operation, he selects one of the submitted solutions and sends it to the OA. OA's task is then to evaluate the selected solution and decide whether to accept or reject it. When the solution is accepted, OA allocates the necessary resources. If the solution is rejected, there are two options for the next steps:

  1. OA will restart the entire process of selecting solutions. Thus, AB will ask for resources with the necessary capabilities, etc. Due to the dynamics of production operations and order flow, the outcome of a new selection may be different from the previous one.
  2. OA will ask the cluster for a new solution that meets more OA requirements, but is less profitable in terms of resource utilization. This mechanism depends on the system configuration - the balance between optimizing production operations (resource allocation) and optimizing the flow of production orders.

Re-scheduling in distributed system

Rescheduling production orders is an essential element of distributed control systems. In centralized systems, rescheduling occurs when resource outages or order changes occur. This usually means rescheduling the entire production. But in the case of distributed systems, rescheduling is an essential element that regularly occurs at certain stages of production and is intended to maintain change locally and reduce the effort required to change the plan.

Decomposition of the production order

In the previous chapters the decomposition of the contract was mentioned. This is possible if the production order can be divided into autonomous and concurrent sub-parts processed by different OAs. The production order structure is a key factor in MES distribution.

Výrobní zakázka

The basic description of the production order contains information about the product, quantity, date and time of completion, etc. Other articles of the production order are:

  • Process Segments (PS) – This is the basic building block in the production order description. It defines the individual tasks and operations that the system must perform to produce the product. It consists of Ability, which are predefined and reusable operations / capabilities. These operations have a product-specific or order-specific set of parameters. Furthermore, each PS contains a list of materials (input and output). This allows you to decompose the job into a set of sub-jobs that can be executed concurrently and where the running of each sub-job can be managed by a separate OA.
  • Node Operators (NO) – They represent logical links between individual PSs. There are several types of NO that represent logical operators (Sequence, BranchOr, BranchAnd, JoinOr, JoinAnd). Input and output PS are also defined in the NO. There can be multiple inputs and multiple outputs, providing the option of a variant passage when processing a production order. Variant passages then increase the flexibility and adaptability of the system to production downtimes.

Control functions

MES features include data collection, product tracking, batch genealogy, document management, etc. Some of these activities are performed by OA and RA. However, there are centralized OAS (Order Resource Agents) and RAS (Resource Agents Supervisor) agents, which both manage the activities of each agent, but most importantly, provide a link between layers of an imaginary automation pyramid (between the MES layer and the ERP layer). Within this task they carry out two basic groups of activities:

  1. Respond to ERP queries. If the ERP requests some information about the order process or resource performance, the OAS and the RAS ask the relevant agents and forward the response to the ERP.
  2. Periodic reports. In the structure of each production order, control points are defined that trigger a report. These reports are automatically sent by OA and RA to OAS and RAS. The task of control agents is to collect and evaluate these reports and, if necessary, to send these reports to the ERP system.

Control application located on the product

The paradigm of distributed systems is shifted a step further by the PABADIS PROMISE architecture, which introduces the connection of material and information flow. The connection is realized using next-generation RFID chips that contain control and master information about the production order and are connected directly to the product. These RFID chips represent the mobile software agents on which the manufacturing system is distributed. When a mobile software agent recorded in an RFID chip arrives at a production site with the product, data is read from it to serve the processing of the manufacturing operation, to plan the next steps and to move the materials / product. Resource agents are then stored in the computing units of each resource.

Among other things, the new architecture should bring benefits such as greater ERP independence, network failure tolerance, synchronization of material and information flows, autonomous communication between agents without central system intervention, etc.

However, these benefits are not clear. We can find circumstances in which we will not achieve these benefits. Let's look at some of them:

  • ERP Independence - In an ideal scenario, the mobile agent should contain all the information and logic needed to control production operations and material flow. If there are no changes, the agent does not need to communicate with the central system during the execution of the order. However, one of the main benefits of distributed systems is the response to ongoing order changes, which necessitates more frequent communication between the central system and the mobile agent. Another requirement for communication is real-time tracking of the progress of the order. The mobile agent is thus dependent to a certain extent on the central system even during the order.
  • Resistance to network outages - The mobile agent, because it contains complete control information, can theoretically work even if the connectivity is lost. The principle of distributed system operation is continuous communication between agents. Therefore, at least the connectivity to the local production network must be maintained
  • Faster communication - The direct transfer of control data between the mobile agent and the machine could theoretically be faster than the central server. However, current RFID networks have a significantly lower speed than LAN networks (eg Ethernet), which can be used to transfer control data from the central system to the machine.

But there are other areas that the PABADIS PROMISE architecture has to deal with:

  • Security - It is difficult to ensure the integrity of data and code on RFID chips when materials are transported (for example, on a ship, on an aircraft, ...). Even if data is redundantly stored, loss or modification of the RFID chip can compromise overall data integrity.
  • Redundant data storage - If the RFID tag carries control data and code in addition to the identification data, the data must also be redundantly stored in the central system. This brings additional overhead and risk of conflict.
  • Debugging - Identifying and resolving issues is more challenging when it comes to a physically distributed and constantly changing environment. This makes it difficult to debug the system.
  • Control - From a business perspective, process traceability is required. A highly distributed system that uses mobile software agents makes the audit process much more difficult.
  • Cost - Mobile software agents require high capacity RFID chips. Now the price of such RFID chips makes it impossible to use them in large-scale cheap products.

Application of the system in the real life

Systems for decentralized production control based on distributed software agents can be used for products with a high degree of variability in smaller production runs. A good example is the automotive, where the final product with a high price and a high degree of customization. Other possible areas of application of distributed management include the furniture industry, automotive industry including subcontractors, aerospace accessories, chemical and food production.

The practical application of PABADIS architecture is mentioned for example by Rittal and Hatzopoulos. The first is a German manufacturer from Herborn specializing in industrial cabinets. The second mentioned company is Hatzopoulos, which is engaged in flexible production of packaging materials for food companies.


Miroslav Patočka

MES PHARIS Analyst

UNIS, a.s. | Control & Information Systems