I'm working on a project with a friend that requires building "Custom GPTs" via the API
I haven't used LangChain since all the hype around it a year or so again. Time to relearn it. I'm going to refer to the LangChain entity in the abstract as "the agent."
Here are the patterns I need to implement -
Requests will be made via web server requests (I'm going to implement this last)
End user can upload documents.
The agent can read the documents and associate the information with the system prompt.
The agent can search the web for additional information.
I'm going to start with bullet 3.
Creating an agent that can parse and read documents
First off, I found this three hour LangChain crash course on YouTube. I snagged the full video transcript using this website and passed it into ChatGPT o1 model. This way, I can have a conversation with the whole document.
I then asked ChatGPT to write me a bare bones document loader /w the ability to ask questions about them to Open AI. After a little bit of massaging, I ended up with this .
1from langchain_openai import ChatOpenAI
2from langchain.schema import SystemMessage, HumanMessage
3from langchain_community.document_loaders import PyPDFLoader, UnstructuredWordDocumentLoader
4import nltk
5import dotenv
67dotenv.load_dotenv()
8# Add these lines at the start to download required NLTK data
9nltk.download('punkt')
10nltk.download('punkt_tab')
11nltk.download('averaged_perceptron_tagger_eng')
1213def load_pdf_text(pdf_path: str) -> str:
14 """Load all text from a PDF file into a single string."""
15 loader = PyPDFLoader(pdf_path)
16 documents = loader.load() # returns a list of Documents
17 # each Document has .page_content; concatenate them
18 text_chunks = [doc.page_content for doc in documents]
19 return "\n".join(text_chunks)
2021def load_word_text(word_path: str) -> str:
22 """Load all text from a Word (.docx) file into a single string."""
23 loader = UnstructuredWordDocumentLoader(word_path)
24 documents = loader.load()
25 text_chunks = [doc.page_content for doc in documents]
26 return "\n".join(text_chunks)
2728def combine_texts(*all_texts) -> str:
29 """Combine multiple doc strings into one big text with separators."""
30 return "\n---\n".join(text.strip() for text in all_texts if text.strip())
3132def answer_question(system_prompt: str, doc_text: str, user_query: str) -> str:
33 """
34 Takes a system prompt, doc text, and user question,
35 then injects them into an LLM call, returning the final answer.
36 """
37 # 1. Build an LLM that can handle large contexts (gpt-4 or gpt-4-32k, for example)
38 llm = ChatOpenAI(
39 model_name="gpt-4o-mini", # or "gpt-3.5-turbo-16k" / "gpt-4-32k"
40 temperature=0.0
41 )
4243 # 2. Build your combined prompt
44 system_msg = SystemMessage(content=system_prompt)
45 user_msg = HumanMessage(content=f"""
46Here is the text from your documents:
4748=== DOCUMENT CONTENT START ===
49{doc_text}
50=== DOCUMENT CONTENT END ===
5152You MUST only use the above text to answer the question below.
53If the answer is not in the text, say 'Not found in the document'.
5455User's Question:
56{user_query}
57""")
5859 # 3. Call the model
60 response = llm([system_msg, user_msg])
61 return response.content
6263if __name__ == "__main__":
6465 # Example usage
66 # ---------------------------------------------------
67 # 1. Load PDF text
68 pdf_text = load_pdf_text("pdf.pdf")
6970 # 2. Load Word text (docx)
71 word_text = load_word_text("word-doc.docx")
7273 # 3. Combine them all in memory
74 combined_doc_text = combine_texts(pdf_text, word_text)
7576 # Example system prompt
77 system_instructions = (
78 "You are a helpful assistant. "
79 "Answer questions based only on the provided document text. "
80 "If you cannot find the answer, say so."
81 )
8283 # 4. Ask a question
84 user_question = "write me a summary of what the documents tell us"
8586 # 5. Get an answer from the LLM
87 final_answer = answer_question(system_instructions, combined_doc_text, user_question)
8889 print("=== AI ANSWER ===")
90 print(final_answer)
python
Limitations
It can't read scanned PDFs. It views them as having zero data. I'm going to have to figure that one out.
There's nothing "agentic" about this, currently. It's a fixed workflow of read file 1, read file 2, summarize the files.