Automating SEO Keyword Clustering by Search Intent Using Python
We explore the concept of keyword clustering and its importance in modern SEO strategies. By grouping related keywords, we can create content that aligns with search intent, ensuring better visibility and relevance in search engine results.
Understanding the relationship between search intent and keyword grouping is crucial. We use keyword clustering to organize terms that share semantic meaning, which helps optimize content for both users and search engines.
Automation plays a vital role in scaling SEO automation efforts. By utilizing tools like Python scripts, we streamline repetitive tasks such as TF-IDF analysis and BERT -based semantic evaluations, saving time while improving accuracy.
Implementing K-Means Clustering for Keyword Grouping
One effective method for organizing keywords is through K-means clustering . This technique allows us to categorize terms based on their similarities, enhancing our ability to perform detailed keyword research and refine content strategies.
To achieve superior content optimization , we incorporate semantic analysis techniques. These methods ensure that the clustered keywords reflect the true intent behind user queries, leading to higher engagement and improved rankings.
Tools and Libraries Supporting Keyword Automation
Several resources are essential for implementing these processes effectively. The Scikit-learn documentation provides insights into machine learning models, while the BERT GitHub repository offers tools for natural language processing. Additionally, libraries like Pandas simplify data manipulation during keyword research .
Practical Applications of Python Scripts in SEO
Using Python scripts , we automate the extraction and processing of keyword data. This approach not only enhances efficiency but also ensures consistency across large datasets, making it ideal for SEO professionals and data scientists alike.
Aligning Keyword Strategies with Search Intent
A key focus of this process is matching keywords to specific types of search intent . Whether informational, transactional, or navigational, understanding these categories helps tailor content to meet user needs more effectively.
Benefits of Automating Keyword Clustering
The advantages of automating keyword clustering include increased productivity, reduced manual errors, and enhanced precision in targeting relevant audiences. For content marketers and developers , this translates to improved campaign outcomes and ROI.
Understanding Search Intent: A Multifaceted Approach
When it comes to search intent , there’s much to explore. Techniques range from employing deep learning to infer intent by classifying text and analyzing SERP titles with Natural Language Processing (NLP) , to grouping keywords based on semantic relevance through clustering. The advantages of these methods are clear and impactful.
We are already aware of the benefits of understanding search intent , and we have a variety of techniques available to scale and automate this process effectively.
So, why is there a need for another article focused on automating search intent ?
The answer lies in the growing significance of search intent in the age of AI-driven search technologies.
In the era of the “10 blue links” search results, more content often meant better visibility. However, with the advent of AI search platforms, the dynamics have shifted. These systems aim to reduce computing costs per FLOP while delivering precise results, making the role of search intent even more critical.
SERPs Remain a Goldmine for Search Intent Insights
The techniques explored up to this point often involve creating your own AI-driven process. This includes extracting all the copy from the titles of top-ranking content for a specific keyword and then inputting that data into a neural network model—which you must build and test yourself—or leveraging NLP to cluster keywords effectively.
However, it’s important to note that SERPs still hold the most valuable insights for understanding search intent .
What If Building AI or Using APIs Isn’t an Option?
What if you lack the time or expertise to build your own AI model or invoke the Open AI API ?
While cosine similarity has been widely praised as a solution for helping SEO professionals organize topics for taxonomy and site structures, I firmly believe that clustering searches based on SERP results remains a far superior approach.
This is because AI systems are inherently designed to ground their outputs in SERPs , and for good reason—these results reflect actual user behaviors and intent.
Fortunately, there’s another method that leverages Google’s own AI to do the heavy lifting for you, eliminating the need to scrape SERP content or construct an AI model from scratch.
Consider this: Google ranks URLs based on how likely their content is to satisfy a user’s query, in descending order. It stands to reason that if two keywords share the same intent, their SERPs will likely resemble each other.
For years, many SEO professionals have compared SERP results for different keywords to infer shared (or differing) search intent , especially to stay ahead of core algorithm updates. This concept isn’t new—but the real value lies in automating and scaling this comparison process. By doing so, we gain both speed and enhanced precision in identifying intent clusters.
Step 1: Import SERP Data into Python
Start by loading your SERP data (saved as a CSV file) into a Pandas dataframe.
import pandas as pd
# Load the SERP data
serps_input = pd.read_csv(‘data/sej_serps_input.csv’)
# Clean unnecessary columns
del serps_input[‘Unnamed: 0’]
# Display the dataframe
serps_input.head()
Step 2: Filter for Page 1 Results
Focus only on the top-ranking URLs (Page 1 results) for each keyword.
# Group data by keyword
serps_grpby_keyword = serps_input.groupby(“keyword”)
# Define the number of URLs to consider (e.g., top 15)
k_urls = 15
# Filter function to keep only top-ranked URLs
def filter_k_urls(group_df):
filtered_df = group_df.loc[group_df[‘url’].notnull()]
filtered_df = filtered_df.loc[filtered_df[‘rank’] <= k_urls]
return filtered_df
# Apply filtering and combine results
filtered_serps = serps_grpby_keyword.apply(filter_k_urls)
filtered_serps_df = pd.concat([filtered_serps], axis=0)
# Clean up the dataframe
del filtered_serps_df[‘keyword’]
filtered_serps_df = filtered_serps_df.reset_index()
del filtered_serps_df[‘level_1’]
# Display the filtered dataframe
filtered_serps_df.head()
Step 3: Convert Ranking URLs into a Single String
Compress the ranking URLs into a single string for each keyword to simplify comparisons.
# Group data by keyword again
filtserps_grpby_keyword = filtered_serps_df.groupby(“keyword”)
# Function to concatenate URLs into a single string
def string_serps(df):
df[‘serp_string’] = ”.join(df[‘url’])
return df
# Apply the function and clean the dataframe
strung_serps = filtserps_grpby_keyword.apply(string_serps)
strung_serps = pd.concat([strung_serps], axis=0)
strung_serps = strung_serps[[‘keyword’, ‘serp_string’]].drop_duplicates()
# Display the compressed SERP strings
strung_serps.head()
Step 4: Compare SERP Similarity
Compare the similarity between SERPs for different keywords using a custom function.
from py_stringmatching import WhitespaceTokenizer
# Tokenizer for splitting URLs
ws_tok = WhitespaceTokenizer()
# Function to calculate SERP similarity
def serps_similarity(serps_str1, serps_str2, k=15):
denom = k + 1
norm = sum([2 * (1 / i – 1.0 / denom) for i in range(1, denom)])
# Tokenize and limit to top k URLs
serps_1 = ws_tok.tokenize(serps_str1)[:k]
serps_2 = ws_tok.tokenize(serps_str2)[:k]
# Match positions of URLs
match = lambda a, b: [b.index(x) + 1 if x in b else None for x in a]
pos_intersections = [(i + 1, j) for i, j in enumerate(match(serps_1, serps_2)) if j is not None]
pos_in1_not_in2 = [i + 1 for i, j in enumerate(match(serps_1, serps_2)) if j is None]
pos_in2_not_in1 = [i + 1 for i, j in enumerate(match(serps_2, serps_1)) if j is None]
# Calculate similarity score
a_sum = sum([abs(1 / i – 1 / j) for i, j in pos_intersections])
b_sum = sum([abs(1 / i – 1 / denom) for i in pos_in1_not_in2])
c_sum = sum([abs(1 / i – 1 / denom) for i in pos_in2_not_in1])
intent_prime = a_sum + b_sum + c_sum
intent_dist = 1 – (intent_prime / norm)
return intent_dist
# Align SERPs for comparison
def serps_align(k, df):
prime_df = df.loc[df.keyword == k].rename(columns={“serp_string”: “serp_string_a”, ‘keyword’: ‘keyword_a’})
comp_df = df.loc[df.keyword != k].reset_index(drop=True)
prime_df = prime_df.loc[prime_df.index.repeat(len(comp_df.index))].reset_index(drop=True)
prime_df = pd.concat([prime_df, comp_df], axis=1).rename(columns={“serp_string”: “serp_string_b”, ‘keyword’: ‘keyword_b’})
return prime_df
# Generate all keyword pairs
matched_serps = pd.DataFrame(columns=[‘keyword’, ‘serp_string’, ‘keyword_b’, ‘serp_string_b’])
queries = strung_serps.keyword.to_list()
for q in queries:
temp_df = serps_align(q, strung_serps)
matched_serps = matched_serps.append(temp_df)
# Calculate similarity scores
matched_serps[‘si_simi’] = matched_serps.apply(lambda x: serps_similarity(x.serp_string, x.serp_string_b), axis=1)
# Display the results
matched_serps[[‘keyword’, ‘keyword_b’, ‘si_simi’]].head()
Step 5: Cluster Keywords by Search Intent
Group keywords into clusters based on their similarity scores.
# Define similarity threshold
simi_lim = 0.4
# Join search volume data
keysv_df = serps_input[[‘keyword’, ‘search_volume’]].drop_duplicates()
# Merge data and filter out NaN values
keywords_crossed_vols = matched_serps.merge(keysv_df, on=’keyword’, how=’left’)
keywords_filtered_nonnan = keywords_crossed_vols.dropna()
# Clustering logic
topic_groups = {}
non_sim_topic_groups = {}
def find_topics(si, keyw, topc):
global topic_groups, non_sim_topic_groups
if si >= simi_lim:
if keyw not in topic_groups and topc not in topic_groups:
topic_groups[keyw] = [keyw, topc]
elif keyw in topic_groups and topc not in topic_groups:
topic_groups[keyw].append(topc)
elif topc in topic_groups and keyw not in topic_groups:
topic_groups[topc].append(keyw)
else:
if keyw not in topic_groups:
non_sim_topic_groups[keyw] = [keyw]
if topc not in topic_groups:
non_sim_topic_groups[topc] = [topc]
# Apply clustering
for _, row in keywords_filtered_nonnan.iterrows():
find_topics(row[‘si_simi’], row[‘keyword’], row[‘keyword_b’])
# Display the clusters
topic_groups
Final Output: Topic Groups
Convert the clusters into a dataframe for better readability.
# Convert clusters to a list and then to a dataframe
topic_groups_lst = []
for k, l in topic_groups.items():
for v in l:
topic_groups_lst.append([k, v])
topic_groups_dictdf = pd.DataFrame(topic_groups_lst, columns=[‘topic_group_no’, ‘keyword’])
# Display the final dataframe
topic_groups_dictdf.head()
Activating the Outputs to Enhance Your Search Strategy
The outputs generated from this method can be further refined using advanced techniques such as neural networks, which can process the content of ranking pages to create more accurate clusters and improve cluster group naming. This is a feature already implemented by some commercial tools available in the market. However, even without those enhancements, the current output provides significant value.
With this data, you can integrate the results into your SEO dashboard systems , making your trends and reporting more meaningful and actionable. For paid search campaigns , structuring your Google Ads accounts based on search intent can lead to higher Quality Scores, better ad relevance, and improved campaign performance. Additionally, this approach allows you to merge redundant faceted URLs commonly found in ecommerce platforms, ensuring a cleaner site structure. You can also design a shopping site’s taxonomy around search intent rather than relying solely on traditional product catalogs, creating a more user-centric navigation experience.
These are just a few examples of how this method can be applied. There are undoubtedly more creative and impactful ways to leverage these insights—feel free to share any ideas or applications that come to mind. Regardless of the specific use case, one thing is clear: your SEO keyword research has now become more scalable, accurate, and efficient, empowering you to stay ahead in the ever-evolving world of search.
Partner with our Digital Marketing Agency
Ask Engage Coders to create a comprehensive and inclusive digital marketing plan that takes your business to new heights.