Monitoring Flows with Control-M Python Client

You can monitor the flow in Control-M Web which provides all the functionalities to monitor and run the workflow. If you prefer to monitor your workflow using code, you can use Control-M Python client.

Let’s run some workflows that contain jobs with errors and check how they appear in the Monitoring domain. If you need to learn how to create or run flows, see Getting Started with Control-M Python Client.

[1]:
from ctm_python_client.core.workflow import Workflow, WorkflowDefaults
from ctm_python_client.core.comm import Environment
# from ctm_python_client.ext.viz import get_graph
from aapi import *
[2]:
workflow = Workflow(Environment.create_workbench(), WorkflowDefaults(run_as='workbench'))

Running a Workflow

When you run a workflow, the return object is a RunMonitor, which enables you to check the workflow status and query for specific information of a Job, such as the output or logs.

[5]:
workflow.clear_all()

workflow.chain(
    [
        JobCommand('HelloWorld', 'echo "Hello"'),
        JobCommand('CountFiles', 'ls -l | wc')
    ],
    inpath='TestMonitor'
)

run = workflow.run()

Checking the Status

When you run a workflow, Control-M starts all the jobs. If you print the status just after you run a job, the status is probably either Executing or Wait Condition.

  • Executing: The Job or Folder is running

  • Wait Condition: The Job or Folder is waiting for a condition to be satisfied before it can run. In this example the condition is that the previous job finishes.

  • Ended OK: The Job or Folder executed without returning an error code.

  • Ended Not OK: The Job or Folder executed with an error, or the run failed

Use .print_status() to get a printout of the job statuses. The statuses indent according to heirarchy. Note that the status of a Folder or Subfolder is an aggregation of the status of its jobs

[6]:
run.print_statuses()
Run Status
--------------------------------------------------
TestMonitor  ...........................  Wait Condition
    HelloWorld  ........................  Wait Condition
    CountFiles  ........................  Wait Condition

[7]:
import time
time.sleep(4)

# A few seconds should be enough for all jobs to finish
run.print_statuses()
Run Status
--------------------------------------------------
TestMonitor  ...........................  Ended OK
    HelloWorld  ........................  Ended OK
    CountFiles  ........................  Ended OK

Getting the Status of a Specific Job or Folder

To get the status of a specific Job or Folder, use the .get_status() method with the name of the Job or Folder. You can pass it as a name or as the entire path.

[8]:
print(run.get_status('CountFiles'))
# Same as print(run.get_status('TestMonitor/CountFiles'))
{'application': None,
 'ctm': None,
 'cyclic': None,
 'deleted': False,
 'description': None,
 'end_time': 'Jun 14, 2022 11:00:18 AM',
 'estimated_end_time': None,
 'estimated_start_time': None,
 'folder': 'TestMonitor',
 'folder_id': 'workbench:0001g',
 'held': False,
 'host': None,
 'job_id': 'workbench:0001i',
 'job_json': None,
 'log_uri': 'https://localhost:8443/automation-api/run/job/workbench:0001i/log',
 'name': 'CountFiles',
 'number_of_runs': 1,
 'order_date': None,
 'output_uri': 'https://localhost:8443/automation-api/run/job/workbench:0001i/output',
 'start_time': 'Jun 14, 2022 11:00:18 AM',
 'status': 'Ended OK',
 'sub_application': None,
 'type': 'Command'}

To get the output of a Job (if an output exists), you can use the .get_output() or .print_output() method from the RunObject. By default, the output contains the executed commands.

[9]:
output = run.get_output('CountFiles')
print(output)

# same as run.print_output('CountFiles')
+ ls -l
+ wc
     14     121     785

Errors and Troubleshooting

When a job finishes with an error, it does not trigger the next job, which is marked as Waiting for Condition until the previous job runs ok.

[10]:
workflow.clear_all()

workflow.chain(
    [
        JobCommand('Command', 'foo'),
        JobCommand('Bye', 'echo "Finished"'),
    ],
    inpath='TestMonitor'
)

run = workflow.run()
[11]:
time.sleep(2) # Wait until Jobs were ordered
run.print_statuses()
Run Status
--------------------------------------------------
TestMonitor  ...........................  Executing
    Command  ...........................  Ended Not OK
    Bye  ...............................  Wait Condition

We see that the Job “Command” ended Not OK and therefore the next job cannot run. This is because “Bye” depends on the completion of “Command”. If they were indenpendent jobs, they would both run.

To troubleshoot, you can use the .get_log() and the .get_output() methods.

The log gives detailed information of the actions of the job execution. In our example, we do not see the desired information because the failure is in the task itself.

Note: The status of Bye is Wait Condition because there is a way to force the status of a job to OK and continue the workflow. For more information, see the advanced tutorial.

[12]:
print(run.get_log('Command'))
Event Time            Message                                                                   Code

11:01:15 14-Jun-2022  ORDERED JOB:64; DAILY FORCED, ODATE 20220614                              5065
11:01:15 14-Jun-2022  FREED BY USER workbench                                                   5402
11:01:15 14-Jun-2022  SUBMITTED TO workbench                                                    5105
11:01:16 14-Jun-2022  STARTED AT 20220614110115 ON workbench                                    5101
11:01:16 14-Jun-2022  JOB STATE CHANGED TO Executing                                            5120
11:01:16 14-Jun-2022  ENDED AT 20220614110116. OSCOMPSTAT 127. RUNCNT 1                         5100
11:01:16 14-Jun-2022  Message from Agent: /home/workbench/ctm/sysout/CMD.LOG_00001k_00001       5169
11:01:16 14-Jun-2022  ENDED NOTOK. NUMBER OF FAILURES SET TO 1                                  5134
11:01:16 14-Jun-2022  JOB STATE CHANGED TO Analyzed                                             5120
11:01:16 14-Jun-2022  Job STATE CHANGED TO Post processed                                       5120

In the logs we only see that it was ended not ok, so let’s try the output

[13]:
run.print_output('Command')
+ foo
/home/workbench/ctm/runtime/CMD.0000001k_001: line 2: foo: command not found

Notice that the output shows that the issue is that the command that we tried to run does not exist.

The action that you can take depends on the cause of the error.

Possible scenarios:

  • The command can only be used in a production environment, but not a test environment. You can mark the job to run as dummy

workflow.get('TestMonitor/Command').run_as_dummy = True
  • You forgot to download the command before you ran the job. Run a flow that downloads or installs the command and then rerun the job

  • There is a typo in the command. Edit the workflow and run it again.