BioPharma Gen AI ChatBot



Industry: Healthcare, Medical Research & Education
Location: Boston, MA
Customer: JoVE
Areas of expertise: Back-end; Front-end; Quality assurance; Project management.
Technologies: Python, Next.JS, LLM, Vector DB, Generative AI, ChatBot
Timeline: 2023 – current



Healthcare providers and medical researchers deal with a vast amount of medical information, including patient records, clinical trial data, research papers, and treatment guidelines. Efficiently searching through these documents to find relevant information is critical for physicians and medical professionals to provide accurate diagnoses, develop effective treatment plans, and stay up-to-date with the latest research.

The JoVE project solution can be customized to create a powerful medical knowledge search and retrieval system in the form of a chatbot. By leveraging large language models such as Zephyr-7b-beta, which is used for tasks such as text summarization, text tagging, and text citation and all-MiniLM-L6 for text embedding.  This collection of embeddings, called a vectorstore, can be efficiently stored in several vector databases, including Pinecone, AtlasDB, ChromaDB, and others. We’ve chosen ChromaDB for these purposes.

When a user asks a medical question, the chatbot finds the relevant documents by first embedding the user’s question using HuggingFace API and then performing a similarity search over the vector store. This process ensures that the most relevant medical information is identified and presented to the user.

Since using large language models by themselves can be tricky, we also use the LangChain library to put many API calls together, making the system work smoothly and efficiently. LangChain also facilitates the integration of different LLMs and provides the option to switch between them with relative ease. This allows our system to adapt to new models when necessary.



The goal of the project was to develop an innovative medical chatbot that would allow healthcare professionals to find specific information related to a patient’s condition, treatment options, or the latest research by providing a user-friendly chat interface. Users can ask questions or provide keywords related to the information they are seeking. The chatbot then performs a similarity search of the indexed medical documents and identifies the most relevant sections or passages.

The chatbot must then present the healthcare professional with the relevant excerpts from the documents, along with citations and links to the original sources. This allows the user to quickly review the information and determine its relevance to their patient’s case or research.

In addition, the chatbot can provide a summary of the key points from the identified documents, saving time and effort in reading through lengthy materials. The user can further refine their search by asking follow-up questions or providing additional context, allowing for interactive exploration of the medical knowledge base.

The most important requirement for the medical chatbot was data security, given the sensitive nature of patient information.



By implementing this medical chatbot solution, healthcare providers and researchers can significantly improve their access to relevant medical knowledge. Physicians can find the information they need faster, allowing them to focus on analyzing content and developing effective treatment plans. The chatbot can also help uncover relevant information that may have been missed in manual searches, potentially improving patient outcomes and advancing medical research.

The JoVE project solution can be adapted to create a powerful medical chatbot with the following system architecture:


Document ingestion and pre-processing

Medical documents in various formats (PDF, DOC, TXT, etc.) are ingested into the system.

The documents are pre-processed, which includes text extraction, optical character recognition (OCR) for scanned documents, and data cleaning to remove irrelevant information.

Document indexing

The pre-processed documents are indexed using the Zephyr-7b-beta model for text summarization and the HuggingFace API for text embedding.

Each document is summarized using the Zephyr-7b-beta model to capture its key points and main ideas. The summarized documents are then embedded using the HuggingFace API to create a vector representation of each document. The vector embeddings are stored in a vector database such as ChromaDB, which allows for efficient similarity searches.

User interaction and query processing

Healthcare professionals interact with the chatbot through a user-friendly interface, which can be integrated into existing medical software or accessed via a web or mobile app. When a user enters a query or question, the chatbot processes the input using the Zephyr-7b-beta model to create a summarized question that captures the context of the conversation. The summarized question is then embedded using the HuggingFace API.

Document Retrieval

The chatbot performs a similarity search over the vector database using the embedded summarized question. The similarity search identifies the most relevant document vectors based on their cosine similarity to the question vector. The corresponding documents are retrieved from the document repository.

Document segmentation and context extraction

The retrieved documents are segmented into smaller chunks or passages. Each segment is embedded using the HuggingFace API. The chatbot performs a similarity search over the segment embeddings using the compressed question embedding. The most relevant segments or passages are identified as the context for answering the question.

Answer generation

The chatbot uses the Zephyr-7b-beta model to generate an answer to the condensed question based on the extracted context. The generated answer is presented to the healthcare professional along with the relevant document excerpts and citations.

Iterative refinement

The user can further refine the search by asking follow-up questions or providing additional context. The chatbot processes the new input and repeats steps 3-6 to retrieve more relevant information and generate updated answers.



To ensure the highest level of data protection, especially for sensitive patient information, the entire setup used dedicated servers hosted by both HuggingFace and JoVE . This approach ensured that each healthcare client’s proprietary data remained in a separate and secure environment, preventing unauthorized access and potential data breaches.

By using dedicated servers, the system eliminated the risks associated with shared hosting environments where multiple clients’ data could coexist on the same server. This separation of client data provided an additional layer of security by minimizing the potential for data leakage or cross-contamination between different clients.






Our team is ready to boost your business