Explained: What is Bucketing in Patent Landscape Study? | HavingIP

Bucketing the patent datasets

In a patent landscape study, we generally analyze patents in terms of hierarchies of categories, and subcategories of a technology. We call it taxonomy, a scheme of hierarchical classification.

After we are done defining the taxonomy at the beginning of the patent landscape study, we retrieve patent datasets from databases and tag them into different predefined categories and subcategories. We call it bucketing.

Basically, bucketing is the process of classifying patents or patent applications based on their subject matter or characteristics into various groups, or “buckets,” as part of a patent landscape study. This can be done for a number of reasons, for example,

  • To spot trends or patterns in the types of patents being filed,
  • To evaluate the relative merits or significance of various patents, or
  • To identify potential areas for further research or advancement, i.e., the areas that are underrepresented or under-explored.
  • Highlight key players in a given field, etc.

There are roughly five ways to go about assigning patents to appropriate buckets or bucketing.

  1. Class-based bucketing,
  2. Keyword-based Query bucketing,
  3. Class and keyword-based query bucketing
  4. Manual bucketing,
  5. Automated bucketing.

Let’s look at each of these ones in the following sections.

1. Class based bucketing

Classes are identified for each category based on a technology’s taxonomy. For example, class H04 represents an electric communication technique. This can further be subdivided into subclasses as

H04B: Transmission

H04H: Broadcast communication

H04J: Multiplex communication

H04K: Secret communication; Jamming of communication

H04L: Transmission of digital information

H04M: Telephonic communication

This sub-classification further goes on to include subclasses H04N, H04Q, H04R, H04S, H04T, and H04W.

Now let’s pick on subclass H04B: Transmission. This subclass has further child classes as

H04B 3/00: Line transmission systems

H04B 5/00: Near field transmission systems

H04B 7/00: Radio transmission systems


Based on the definition of these classes, buckets or subcategories are created under the predefined taxonomy of the technology.

So, here for the electric communication technology, “H04B 7/00: Radio transmission systems” is a bucket, similarly, “H04B 5/00: Near field transmission systems” is another bucket.

Since a patent has multiple classes, there are chances that one patent publication may end up in multiple buckets.

Buckets created based on classification are moderate in accuracy. Here, accuracy refers to the how clear picture it paints of a technological landscape.

2. Keyword based query bucketing

Based on the taxonomy, you can formulate search queries for each of the subcategories. Thus, retrieved patents can be tagged to that subcategory or bucket.

For example, for near-field communication systems, you can run a query on Google Patents as shown below:

(“near field communication”) OR (NFC) OR ((capacit* NEAR2 coupl*) WITH (communicat* OR transmi* OR transceiv*)) OR (((antenna OR sensor* OR (primary OR secondary)) NEAR3 coil) WITH (communicat* OR transmi* OR transceiv*)) OR (RFID OR “radio frequency identification”)

Results returned by the query may have some noise, i.e., an irrelevant set of patents. By refining the query in an iterative process, you may minimize that noise.

Since it’s a keyword-based search, there is always a chance of missing a set of patents, e.g., non-English patent publications lost in machine translation.

Therefore, in terms of accuracy, such an approach to retrieving patent sets to form a bucket is less accurate and takes a a similar time to that of class-based bucketing.

Note: You might be interested in our excellent guide on how to search for patent and non-patent literature on Google Patents.

3. Class and keyword based bucketing

As we have seen already, bucketing based on queries formed using keywords alone from a subcategory may not be that accurate in retrieving a set of patent publications from various databases.

Therefore, in order to cover most of the relevant set of patent publications, it is advisable to use queries based on a combination of classification and keywords. Classes help retrieve the patent datasets that have language barriers and can’t be easily retrieved by a keyword-based query.

Accuracy in this approach of bucketing is above moderate.

4. Manual bucketing

If accuracy is the main concern, manual bucketing is the way to go. Multiple analysts analyze or review the patent literature and manually tag them to different buckets or categories in this approach.

It is possible that while analyzing the set of patents retrieved from databases, you may come across a new category of technology. Thus, the taxonomy may evolve as you work on the project.

Sometimes, this process also involves consulting with experts in the field if required.

This process of bucketing is time-consuming for obvious reasons, but it is the most accurate one.

5. Automated bucketing

The future belongs to artificial intelligence. It is a constant debate about whether AI would be able to replace human analysts when it comes to analyzing patent datasets.

We are not yet at the stage where human analysts can be replaced by AI; however, AI is certainly playing a great role as a good assistant to humans and seems promising.

So, once the patents have been collected, they can be analyzed and classified using automated tools

Once the patents have been collected, we can analyze and classify them using tools such as natural language processing (NLP) or machine learning. This way we can identify and categorize the collected patents into relevant groups or “buckets”

As it offers a reduction in human errors and accelerates the pace of classification, it will become the go-to choice in the future.

The accuracy of automation tools used for classification and bucketing is moderate as of now, and it is only going to increase from here. Anyway, it is obvious that the time taken by these tools is much less than any other process involving humans.

IPCCAT is one such freely available tool from WIPO that provides a classification of any technology you feed into it. For example, if you copy and paste the abstract of any patent publication into this tool, it does a decent job of classifying the invention claimed in the patent publication.

Sometimes, it doesn’t give accurate results when we paste the abstract into it. That is because of two reasons: 1. It isn’t advanced enough to derive the technological context from the abstract. or 2. The abstract isn’t good enough to convey the technological context of the patent’s invention to the tool.

You may further tinker with the IPCCAT tool by putting the invention in your own words, making it easier for the machine to understand the invention along with context. The classification you’d get from this exercise would be pretty good.

So far, we’ve seen various bucketing techniques. We think it’d be interesting for you to know the various steps used for bucketing and how those steps integrate bucketing as a part of the patent landscape study.

So, without further delay, let’s dive right into them.

Step 1: Identify the relevant patents

The first thing you’ll need to do is identify all the patents that are relevant to the topic or technology you’re interested in.

This can be done through various methods such as keyword searches, class-based searches, a combination of keyword and class-based searches, a manual review of the patent literature, consulting with experts in the field, or automation tools.

Step 2: Gather and organize the patents

Once you’ve identified all the relevant patents, the next step is to gather them together and organize them for further analysis.

This might involve creating a database or spreadsheet to store the patents along with relevant metadata such as the patent number, title, abstract, and any other relevant information. Many patent search tools do this for you.

Step 3: Define the bucketing criteria

Now it’s time to define the criteria that will be used to classify the patents into groups or “buckets”.

This might include factors such as the technology or subject matter covered by the patent, the type of patent (utility, design, or plant), the legal status of the patent (active, expired, or pending), the geographic region where the patent was granted, or the company or individual that holds the patent.

It’s important for you to be as specific as possible when defining the bucketing criteria, as this will help to ensure that the resulting buckets are meaningful and useful.

Step 4: Assign patents to buckets

With the bucketing criteria defined, the next step is to assign each patent to the appropriate bucket.

This can be done by searching for patents based on keywords and classes which are relevant to specific buckets. Thus retrieved clusters of patents would be relevant to those buckets.

Moreover, we can do bucketing manually by reviewing each patent and assigning it to the appropriate bucket based on the defined criteria, or it can be done automatically using techniques such as natural language processing or machine learning.

Either way, it’s important to be consistent and thorough when assigning patents to buckets, as this will help to ensure that the resulting buckets are accurate and reliable.

Step 5: Analyze the buckets

Once all the patents have been assigned to buckets, it’s time to analyze the resulting buckets to identify trends, highlight key players in the field, evaluate relative merits of various patents, and identify areas of the landscape that may be underrepresented or under-explored.

This might involve creating visualizations or charts to help understand the distribution of patents across the different buckets, or conducting further analysis to understand the relationships between the patents and their associated buckets.

Step 6: Communicate the results

The final step is to communicate the results of your patent landscape study, which might involve creating a report or presentation to share with stakeholders.

This might include a summary of the key findings, along with any recommendations or insights that have been identified through the analysis.

It’s important to clearly and concisely convey the results of your study, as this will help others to understand the implications of your work and use the results to inform their own research or decision-making.


Bucketing can help you provide a broad overview of the patent landscape and identify trends or patterns that may not be immediately apparent from a raw list of patents.

For example, we might use bucketing in a patent landscape study to identify clusters of patents that are focused on a particular technology or application, or to identify companies or research groups that are particularly active in a given area.

Overall, it is safe to say that the goal of bucketing in a patent landscape study is to provide a more structured and organized view of the landscape, which can be useful for a variety of purposes such as identifying potential partners, competitors, or areas for further research and development.

If you are curious to know more about patents and intellectual property in general, you may want to check our articles specially crafted from the learning point of view.

Related Articles

  1. With our experience and journey, we have crafted the best available search guide on the internet at no cost to take you from beginner to pro level: Prior Art Search Free Guide 101: Do it Yourself

Sonam Singh

My struggle, in the beginning, made me realize the need to create an ultimate resource that can provide answers to both very basic questions like what, why, when, who, how, where, and the most complex topics about intellectual property. Moreover, my passion for writing and my love for patents made it easier for me to create this super-helpful platform for students, professionals, and curious minds wanting to know about IP. Cheers to that.

Recent Posts