Monitoring Flows with Control-M Python Client¶
You can monitor the flow in Control-M Web which provides all the functionalities to monitor and run the workflow. If you prefer to monitor your workflow using code, you can use Control-M Python client.
Let’s run some workflows that contain jobs with errors and check how they appear in the Monitoring domain. If you need to learn how to create or run flows, see Getting Started with Control-M Python Client.
[1]:
from ctm_python_client.core.workflow import Workflow, WorkflowDefaults
from ctm_python_client.core.comm import Environment
# from ctm_python_client.ext.viz import get_graph
from aapi import *
[2]:
workflow = Workflow(Environment.create_workbench(), WorkflowDefaults(run_as='workbench'))
Running a Workflow¶
When you run a workflow, the return object is a RunMonitor
, which enables you to check the workflow status and query for specific information of a Job, such as the output or logs.
[5]:
workflow.clear_all()
workflow.chain(
[
JobCommand('HelloWorld', 'echo "Hello"'),
JobCommand('CountFiles', 'ls -l | wc')
],
inpath='TestMonitor'
)
run = workflow.run()
Checking the Status¶
When you run a workflow, Control-M starts all the jobs. If you print the status just after you run a job, the status is probably either Executing or Wait Condition.
Executing: The Job or Folder is running
Wait Condition: The Job or Folder is waiting for a condition to be satisfied before it can run. In this example the condition is that the previous job finishes.
Ended OK: The Job or Folder executed without returning an error code.
Ended Not OK: The Job or Folder executed with an error, or the run failed
Use .print_status()
to get a printout of the job statuses. The statuses indent according to heirarchy. Note that the status of a Folder or Subfolder is an aggregation of the status of its jobs
[6]:
run.print_statuses()
Run Status
--------------------------------------------------
TestMonitor ........................... Wait Condition
HelloWorld ........................ Wait Condition
CountFiles ........................ Wait Condition
[7]:
import time
time.sleep(4)
# A few seconds should be enough for all jobs to finish
run.print_statuses()
Run Status
--------------------------------------------------
TestMonitor ........................... Ended OK
HelloWorld ........................ Ended OK
CountFiles ........................ Ended OK
Getting the Status of a Specific Job or Folder¶
To get the status of a specific Job or Folder, use the .get_status()
method with the name of the Job or Folder. You can pass it as a name or as the entire path.
[8]:
print(run.get_status('CountFiles'))
# Same as print(run.get_status('TestMonitor/CountFiles'))
{'application': None,
'ctm': None,
'cyclic': None,
'deleted': False,
'description': None,
'end_time': 'Jun 14, 2022 11:00:18 AM',
'estimated_end_time': None,
'estimated_start_time': None,
'folder': 'TestMonitor',
'folder_id': 'workbench:0001g',
'held': False,
'host': None,
'job_id': 'workbench:0001i',
'job_json': None,
'log_uri': 'https://localhost:8443/automation-api/run/job/workbench:0001i/log',
'name': 'CountFiles',
'number_of_runs': 1,
'order_date': None,
'output_uri': 'https://localhost:8443/automation-api/run/job/workbench:0001i/output',
'start_time': 'Jun 14, 2022 11:00:18 AM',
'status': 'Ended OK',
'sub_application': None,
'type': 'Command'}
To get the output of a Job (if an output exists), you can use the .get_output()
or .print_output()
method from the RunObject. By default, the output contains the executed commands.
[9]:
output = run.get_output('CountFiles')
print(output)
# same as run.print_output('CountFiles')
+ ls -l
+ wc
14 121 785
Errors and Troubleshooting¶
When a job finishes with an error, it does not trigger the next job, which is marked as Waiting for Condition until the previous job runs ok.
[10]:
workflow.clear_all()
workflow.chain(
[
JobCommand('Command', 'foo'),
JobCommand('Bye', 'echo "Finished"'),
],
inpath='TestMonitor'
)
run = workflow.run()
[11]:
time.sleep(2) # Wait until Jobs were ordered
run.print_statuses()
Run Status
--------------------------------------------------
TestMonitor ........................... Executing
Command ........................... Ended Not OK
Bye ............................... Wait Condition
We see that the Job “Command” ended Not OK and therefore the next job cannot run. This is because “Bye” depends on the completion of “Command”. If they were indenpendent jobs, they would both run.
To troubleshoot, you can use the .get_log()
and the .get_output()
methods.
The log gives detailed information of the actions of the job execution. In our example, we do not see the desired information because the failure is in the task itself.
Note: The status of Bye
is Wait Condition
because there is a way to force the status of a job to OK and continue the workflow. For more information, see the advanced tutorial.
[12]:
print(run.get_log('Command'))
Event Time Message Code
11:01:15 14-Jun-2022 ORDERED JOB:64; DAILY FORCED, ODATE 20220614 5065
11:01:15 14-Jun-2022 FREED BY USER workbench 5402
11:01:15 14-Jun-2022 SUBMITTED TO workbench 5105
11:01:16 14-Jun-2022 STARTED AT 20220614110115 ON workbench 5101
11:01:16 14-Jun-2022 JOB STATE CHANGED TO Executing 5120
11:01:16 14-Jun-2022 ENDED AT 20220614110116. OSCOMPSTAT 127. RUNCNT 1 5100
11:01:16 14-Jun-2022 Message from Agent: /home/workbench/ctm/sysout/CMD.LOG_00001k_00001 5169
11:01:16 14-Jun-2022 ENDED NOTOK. NUMBER OF FAILURES SET TO 1 5134
11:01:16 14-Jun-2022 JOB STATE CHANGED TO Analyzed 5120
11:01:16 14-Jun-2022 Job STATE CHANGED TO Post processed 5120
In the logs we only see that it was ended not ok, so let’s try the output
[13]:
run.print_output('Command')
+ foo
/home/workbench/ctm/runtime/CMD.0000001k_001: line 2: foo: command not found
Notice that the output shows that the issue is that the command that we tried to run does not exist.
The action that you can take depends on the cause of the error.
Possible scenarios:
The command can only be used in a production environment, but not a test environment. You can mark the job to run as dummy
workflow.get('TestMonitor/Command').run_as_dummy = True
You forgot to download the command before you ran the job. Run a flow that downloads or installs the command and then rerun the job
There is a typo in the command. Edit the workflow and run it again.