AI Categorizer Outsmarts Manually Created Rule-Based Categorization
Alex Paikada, Director, Content Marketing, Relecura
Artificial intelligence (AI) has been making inroads into the intellectual property ecosystem, like any other area of economic activity. The concept of AI has been active in academic circles right from the mid-fifties, even before transistors and electronics came of age. Since then, more than three hundred thousand patent applications have been filed in the domain. From 2011 onwards, a spectacular boom is experienced in the area in tune with increased computing power, connectivity, and challenges faced by the analysts. Lately, the focus area is in easing out the analyses of massive databases running into millions and facilitating the classification of data for increased lucidity and comprehension. The Categorizer tool, designed and brought to the market by Relecura, classifies and buckets data running into tens of thousands into a pre-designed number of groups and renders the subsequent analysis easier. This functionality assumes greater importance as the cumulative IP database is steadily on the rise and is consequently becoming increasingly unwieldy for manual analyses.
The Test Setup
In terms of reliability, efficiency, and accuracy, AI-propelled tools can now compete with and outdistance manual operations. Empirical verification of this prospect was done by engaging a seasoned expert in the field. The manhours spent on this enterprise were self-monitored and recorded.
A set of categories and a list of exemplary documents for each category were given to a domain expert along with an AI classification system. The domain expert was asked to come up with rule-based categorization criteria to categorize the documents. The AI classification system was set to come up with an AI model to categorize the documents. There were 16 categories, and each category had the following number of exemplary documents.
Table 1: Number of Training Documents Per Category
Domain Expert Used Complex Boolean Expressions
The domain expert had the option to use any choice of keywords, class codes, technologies, and semantic concepts, and those could be combined with Boolean and complex operators such as proximities and wildcards to form queries.
The experiment showed that the time taken by the expert to come out with a presentable result was two weeks. It was the time taken to finalize the search rules through iterative finetuning.
AI Classification Model Learnt from the Documents
AI Categorizer used the exemplary documents as training documents to learn the features of the categories to create an AI classification model. The jumble of patent documents had to be classified with respect to the exemplary documents, comparing the semantic similarity.
In the field situations, the time taken to carry out the operation was two hours. Manhours saved was impressive. However, the critical parameters are precision, recall, specificity, and classification accuracy. The following section discusses that aspect.
Table 2 below shows the performance metrics of both parties. In terms of classification accuracy, the AI-powered tool is comparatively better in all categories. When precision is considered, the categorizer registered a range of 95.25% to 98.7%, whereas for the manual categorization, it ranged from 27.77% to 71.54%. As far as recall rate is concerned, the range is 100 to 90.7% for the AI tool, and for the manual operation, it was a bit better in terms of frequency, whereas the range is almost the same. For specificity, on the other hand, the AI tool is remarkably better as in all cases, the value was well above 99%. In contrast, in manual categorization, the corresponding value was 80% or less in six instances and no category achieved at least 99%. The frequency of false-negative cases was slightly higher for the AI tool. Still, the false-positive cases were incomparably high for the manual categorization and impressively minimum for the AI-powered categorizer.
Table 2. Performance Comparison of AI Categorization and Manual Categorization
Either due to the limitation of rule-based models or the capability of current AI algorithms, AI outsmarts the rule-based models.
It should, however, be noted that the results are subjective, and the performance largely depends on the closeness of the categories, the number of documents for each category, expertise of the domain expert on the categories, etc.
When procedural efficiency, timesaving, and performance factors are considered, it is evident that the AI-powered tool is the desirable option. In any case, it is obvious that AI-powered tools are going to play a substantial role in IP- analytics, and sooner than later, the professional fraternity will have to switch over to such a facility to be in the game.