Using LLMs to harvest knowledge from a global member network's email archives
Overview
One of the key challenges organisations face is harnessing the vast amount of information they accumulate over time. A global members network (client confidential), a consortium dedicated to enriching leadership and fostering peer collaboration, found itself with a decade's worth of member email correspondence. This treasure trove contained valuable insights and expertise, but remained largely untapped due to its unwieldy volume and complexity. Also, newer members who joined didn’t have access to past emails, and so topics were repeated regularly.
The Challenge
The members network’s vast email archives were an under utilised asset, with knowledge hidden in email archives. The primary challenge was to distill these large volume of emails into an easily accessible format. Time Under Tension built an experimental Proof of Concept that can not only find answers to questions, but also intelligently extract insights from convoluted email threads.
Our Solution
Our team approached the challenge in four stages:
Data Cleaning: We started by cleansing over 5,000 emails, from a sample spanning two years. This refined data set served as the foundation for our further processing steps.
Text Chunking: The streamlined email content was then segmented into discrete, digestible chunks. These text snippets were transformed into embeddings, creating a map of mathematical vectors that capture the essence of the text for retrieval.
Vector Database Storage: We used a vector database to house these embeddings, crafting a backend that provides fast and accurate access to the processed information.
User Interface: The final piece of our solution was the crafting of the frontend. The interface uses Natural Language Processing (NLP) to facilitate searches and provide users with the information they seek, also directly linking to the original source emails.
Technologies Used
Our solution leverages the following technologies:
Large Language Models (LMM): At the core of our tool is the ability to understand and interpret human language, allowing users to make complex queries and receive relevant answers.
Semantic Search Techniques: By understanding the context of queries, our tool delivers precise excerpts from the vast email datasets.
Vector Database Systems: These databases allow us to store and query large volumes of embeddings, ensuring the speed and accuracy of information retrieval.
Responsive Web Design: Ensuring the tool is accessible across various devices and platforms.
Summary
This proof of concept has demonstrated an interesting use-case for the latest advances in Generative AI. Achieving outcomes that seemed out of reach just a year ago. It serves as an example to organisations of how they can delve into their own archives and reclaim valuable knowledge that, until now, was virtually buried.
What untapped wealth of knowledge do you have hidden in your business, and how might generative AI help to make it discoverable?