{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Monitoring Flows with Control-M Python Client\n", "\n", "You can monitor the flow in Control-M Web which provides all the functionalities to monitor and run the workflow.\n", "If you prefer to monitor your workflow using code, you can use Control-M Python client.\n", "\n", "Let's run some workflows that contain jobs with errors and check how they appear in the Monitoring domain.\n", "If you need to learn how to create or run flows, see [Getting Started with Control-M Python Client](https://controlm.github.io/ctm-python-client/notebooks/hello.html)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from ctm_python_client.core.workflow import Workflow, WorkflowDefaults\n", "from ctm_python_client.core.comm import Environment\n", "# from ctm_python_client.ext.viz import get_graph\n", "from aapi import *" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "workflow = Workflow(Environment.create_workbench(), WorkflowDefaults(run_as='workbench'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running a Workflow\n", "\n", "When you run a workflow, the return object is a `RunMonitor`, which enables you to check the workflow status and query for specific information of a Job, such as the output or logs." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "workflow.clear_all()\n", "\n", "workflow.chain(\n", " [\n", " JobCommand('HelloWorld', 'echo \"Hello\"'),\n", " JobCommand('CountFiles', 'ls -l | wc')\n", " ],\n", " inpath='TestMonitor'\n", ")\n", "\n", "run = workflow.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking the Status\n", "\n", "When you run a workflow, Control-M starts all the jobs. If you print the status just after you run a job, the status is probably either Executing or Wait Condition.\n", "\n", "- **Executing**: The Job or Folder is running\n", "\n", "- **Wait Condition**: The Job or Folder is waiting for a condition to be satisfied before it can run. In this example the condition is that the previous job finishes.\n", "\n", "- **Ended OK**: The Job or Folder executed without returning an error code.\n", "\n", "- **Ended Not OK**: The Job or Folder executed with an error, or the run failed\n", "\n", "\n", "Use `.print_status()` to get a printout of the job statuses. The statuses indent according to heirarchy. Note that the status of a Folder or Subfolder is an aggregation of the status of its jobs" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Run Status\n", "--------------------------------------------------\n", "TestMonitor ........................... Wait Condition\n", " HelloWorld ........................ Wait Condition\n", " CountFiles ........................ Wait Condition\n", "\n" ] } ], "source": [ "run.print_statuses()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Run Status\n", "--------------------------------------------------\n", "TestMonitor ........................... Ended OK\n", " HelloWorld ........................ Ended OK\n", " CountFiles ........................ Ended OK\n", "\n" ] } ], "source": [ "import time \n", "time.sleep(4)\n", "\n", "# A few seconds should be enough for all jobs to finish\n", "run.print_statuses()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting the Status of a Specific Job or Folder\n", "\n", "To get the status of a specific Job or Folder, use the `.get_status()` method with the name of the Job or Folder. You can pass it as a name or as the entire path." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'application': None,\n", " 'ctm': None,\n", " 'cyclic': None,\n", " 'deleted': False,\n", " 'description': None,\n", " 'end_time': 'Jun 14, 2022 11:00:18 AM',\n", " 'estimated_end_time': None,\n", " 'estimated_start_time': None,\n", " 'folder': 'TestMonitor',\n", " 'folder_id': 'workbench:0001g',\n", " 'held': False,\n", " 'host': None,\n", " 'job_id': 'workbench:0001i',\n", " 'job_json': None,\n", " 'log_uri': 'https://localhost:8443/automation-api/run/job/workbench:0001i/log',\n", " 'name': 'CountFiles',\n", " 'number_of_runs': 1,\n", " 'order_date': None,\n", " 'output_uri': 'https://localhost:8443/automation-api/run/job/workbench:0001i/output',\n", " 'start_time': 'Jun 14, 2022 11:00:18 AM',\n", " 'status': 'Ended OK',\n", " 'sub_application': None,\n", " 'type': 'Command'}\n" ] } ], "source": [ "print(run.get_status('CountFiles'))\n", "# Same as print(run.get_status('TestMonitor/CountFiles'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get the output of a Job (if an output exists), you can use the `.get_output()` or `.print_output()` method from the RunObject.\n", "By default, the output contains the executed commands." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+ ls -l\n", "+ wc\n", " 14 121 785\n", "\n" ] } ], "source": [ "output = run.get_output('CountFiles')\n", "print(output)\n", "\n", "# same as run.print_output('CountFiles')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Errors and Troubleshooting\n", "\n", "When a job finishes with an error, it does not trigger the next job, which is marked as Waiting for Condition until the previous job runs ok." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "workflow.clear_all()\n", "\n", "workflow.chain(\n", " [\n", " JobCommand('Command', 'foo'),\n", " JobCommand('Bye', 'echo \"Finished\"'),\n", " ],\n", " inpath='TestMonitor'\n", ")\n", "\n", "run = workflow.run()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Run Status\n", "--------------------------------------------------\n", "TestMonitor ........................... Executing\n", " Command ........................... Ended Not OK\n", " Bye ............................... Wait Condition\n", "\n" ] } ], "source": [ "time.sleep(2) # Wait until Jobs were ordered \n", "run.print_statuses()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the Job \"Command\" ended Not OK and therefore the next job cannot run. This is because \"Bye\" depends on the completion of \"Command\". If they were indenpendent jobs, they would both run.\n", "\n", "To troubleshoot, you can use the `.get_log()` and the `.get_output()` methods.\n", "\n", "The log gives detailed information of the actions of the job execution. In our example, we do not see the desired information because the failure is in the task itself.\n", "\n", "**Note**: The status of `Bye` is `Wait Condition` because there is a way to force the status of a job to OK and continue the workflow. For more information, see the advanced tutorial." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Event Time Message \tCode\n", "\n", "11:01:15 14-Jun-2022 ORDERED JOB:64; DAILY FORCED, ODATE 20220614 \t5065\n", "11:01:15 14-Jun-2022 FREED BY USER workbench \t5402\n", "11:01:15 14-Jun-2022 SUBMITTED TO workbench \t5105\n", "11:01:16 14-Jun-2022 STARTED AT 20220614110115 ON workbench \t5101\n", "11:01:16 14-Jun-2022 JOB STATE CHANGED TO Executing \t5120\n", "11:01:16 14-Jun-2022 ENDED AT 20220614110116. OSCOMPSTAT 127. RUNCNT 1 \t5100\n", "11:01:16 14-Jun-2022 Message from Agent: /home/workbench/ctm/sysout/CMD.LOG_00001k_00001\t5169\n", "11:01:16 14-Jun-2022 ENDED NOTOK. NUMBER OF FAILURES SET TO 1 \t5134\n", "11:01:16 14-Jun-2022 JOB STATE CHANGED TO Analyzed \t5120\n", "11:01:16 14-Jun-2022 Job STATE CHANGED TO Post processed \t5120\n", "\n" ] } ], "source": [ "print(run.get_log('Command'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the logs we only see that it was ended not ok, so let's try the output" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+ foo\n", "/home/workbench/ctm/runtime/CMD.0000001k_001: line 2: foo: command not found\n", "\n" ] } ], "source": [ "run.print_output('Command')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the output shows that the issue is that the command that we tried to run does not exist.\n", "\n", "The action that you can take depends on the cause of the error.\n", "\n", "Possible scenarios:\n", "\n", "- The command can only be used in a production environment, but not a test environment. You can mark the job to run as dummy\n", "``` python\n", "workflow.get('TestMonitor/Command').run_as_dummy = True\n", "```\n", "\n", "- You forgot to download the command before you ran the job. Run a flow that downloads or installs the command and then rerun the job\n", "\n", "- There is a typo in the command. Edit the workflow and run it again.\n", "\n" ] } ], "metadata": { "interpreter": { "hash": "e36608863334e111ac1975278d851976534d4d97e80edd449207481e04c86242" }, "kernelspec": { "display_name": "Python 3.10.1 ('venv': venv)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }