The Rise of Activity Recognition

4 MIN Read

Last Updated on April 20, 2021 by Team Wobot

The Rise of Activity Recognition
In these times of intense competition, each business; however large or small, requires pro-active security to stay safe from external as well as internal threats. The traditional mode of security in the form of video surveillance is now reinforced with artificial intelligence to thwart any untoward incident rather than merely recording it.

Activity recognition plays a vital part in decoding raw CCTV surveillance data and providing timely alerts to prevent thefts and other untoward incidents, or to detect any deviations from Standard operating procedure (SOP). When utilised by businesses, it can also help improve employee efficiency and compliance at the workplace.

What is Activity Recognition?
Until recent times, surveillance security was affordable only for larger enterprises but with falling cost of hardware, even small businesses are opting in for CCTV surveillance solutions. CCTV surveillance requires surveying each frame of data manually to catch instances of theft, inefficiency, etc. and that too only after the occurrence of the incident. Luckily, technology has stepped in the video surveillance field to make it easier to identify human behaviour and to report suspicious activity on a real-time basis to prevent any perpetrator from carrying out any harmful activity.

Surveillance videos and images can now be processed by way of a cutting-edge research process known as activity recognition.This process is also known as Human Activity Recognition or HAR. The advanced model for Activity Recognition is trained to recognize different types of human behaviour. Raw data from surveillance feed is processed to differentiate between normal and suspicious behaviour. Deep learning techniques are crucial to help the model recognise undesirable behaviour and to increase the accuracy of the detections/results.

In this process, the raw data is pre-processed, in order to eliminate noise and other unrelated activity. That data is then segregated into specific segments and final reports are generated based on individual requirements (business rules). In addition to visual data, activity recognition can also be taught to process signals from other sensors such as accelerometers and gyroscopes that are found in smartphones, based on specific requirements & parameters.

Activity Recognition in Videos
The task of action recognition involves analyzing videos and determining what action or motion is being performed. The primary subject of these videos are mostly humans performing some action. However, this requirement can be relaxed to generalize over other subjects such as animals or robots as well. When we consider spatiotemporal action recognition, we deal with action localization. This task not only involves determining what action is being performed but also when and where it is being performed in the said video.

Once raw data from a surveillance video is inserted into a system, activity recognition can be deployed to decipher the data based on the event-recognition algorithms. In simple terms, the software is first trained to recognise normal human activity such as the opening of doors, drinking water out of a glass, washing hands, etc. The software is also trained to pick up subtle suspicious activity such as fidgeting, looking around nervously, splitting from a group, acting violently, extracting or inserting objects into bags, etc., again based on specific requirements.

The video input is then split frame-wise, to decipher each activity in detail. Data from all the sensors is split into sequences that are known as windows. Each window is linked with different activities in a process known as sliding-window-approach. Subsequent window data is usually overlapped during the activity recognition process to prevent any activity from slipping past due to the transition of any activity from one mode to another.

There are multiple approaches that can be used while deploying activity recognition in the videos. One of such approach is Single Stream Network approach. This approach can be used to explore multiple ways to fuse temporal information from consecutive frames using 2D pre-trained convolutions. Other approach is Two Stream Networks approach. Unlike single stream network, which uses spatial context, this architecture has two separate networks one for spatial context (pre-trained), one for motion context. The two streams are trained separately and combined using SVM. This method improves the performance of single stream method by explicitly capturing local temporal movement.

When employed in a business scenario, activity recognition from videos can be trained to identify theft, damage to property or goods, inefficiency by employees, accidents such as fire, and other activities specific to the needs of the business or organisation. Video feeds from multiple cameras can also be analysed to draw specific conclusions based on the trained algorithm.

Action recognition has been seen to have several use cases. For example companies like Wobot Intelligence have created their own Action Recognition Architecture (WAR) which uses two stream spatiotemporal networks and contains several classes / actions that detect unhygienic behavior, pilferage, violent behavior amongst others.

Using Activity Recognition in Video Surveillance to Ensure SOP and Compliance Checks
In addition to preventing undesired incidents at the workplace, activity recognition can also be used to ensure proper adherence of Standard Operating Procedure (SOP) as well as compliance checks. Based on the business size, single or multiple video cameras are utilised to provide surveillance data of employees, processes, material handling,etc., for further processing. This data is analysed using advanced machine learning algorithms trained to identify any deviance from SOPs as well as to provide reports to ensure compliance and improvement on current procedures.

This process of artificial intelligence can hence identify and prevent mishandling, tampering, theft, contamination, and various other forms of deviation from SOP that could damage the functioning and reputation of any business. Such a pro-active and intelligent form of data processing can also help prevent penalties or fines at a later date due to non-compliance.

With this kind of video processing at an advanced level through activity recognition, businesses can certainly expect an increase in efficiency, compliance and stricter adherence to defined SOPs. The cost of hardware, as well as such analytical solutions, has reduced significantly in the last few years making it possible for small businesses too to embrace this intelligent system.

Merely watching video surveillance at the workplace can be repetitive, tiring, and ineffective in terms of costs and timely intervention. Activity recognition from video surveillance can be used to process, record, identify and report incidents of non-compliance of SOPs. This vital process can predict the occurrence of any event that might be detrimental to a business.