The analytics translator — which terminology is best to know?
For the lecture ‘The analytics translator’, I decomposed the three top-level domains Business, IT, and Analytics Literacy into 15 sub-topics. See as well, the article describing the 15 topics.
I asked myself can I analyze the relationship between each each topic’s terminology. How similar are the topics? Is it more difficult to translate between management practice and the data domain, between AI and product management, and so on?
The results of the analysis, shown as a chord diagram. It shows the similarity between the definitions of individual topics.
Here my three key insights I found of interest:
- The term value is the most prominent one. Thus proof-of-value has the highest correlation score overall.
- Problem-solving and its main terminology is as well often present in project management and other management topics. mostly present in many sub-topics.
- The technology topics, especially AI and data, are quite orthogonal in their vocabulary. During parsing, these topics are difficult to link to business terms (I tried several variants). Which means don’t even try to explain it to mangement.
A fun fact — the term project management and proof-of-value had very limited intersection in its wordings. I was smiling a lot when seeing this result; the agile world is sometimes too chaotic and strange for classical project management.
The analysis is only valid for my search terms. However, the graph and the data sets in the background (the corpus design) are derived fully automated and describes how it is currently reflected within wikipedia articles.
Here, the technical approach to derive the chart. It will become an exercise for the next lecture term :-). It is about NLP / chatbot techniques to compare text blocks.
- Access Wikipedia and extract for each of the 15 topic its 7 most relevant subtopics.
- For each subtopic (15x7) extract the Wikipedia summary.
- Clean up the summary with text processing libraries
- Ensure an equal length of the summaries by, e.g., a text summarizer
- Apply an NLP (e.g. word2vec) comparison of all extracted summaries (calculate the distance between each text block)
- Rank each obtain distance and cut off only for significant similarities.
- Weight each distance and prepare a data frame for visualization.
- Use Plotly or another library to visualise the result.
- Present a ranking of the most common words.