Skip to main content
Guide9 min read
WC

Written by William Cooke · Founder at VocUI

How to Train a Chatbot on Your Own Data

The phrase "train a chatbot" sounds like something that requires a PhD and a GPU cluster. It doesn't. With modern retrieval-augmented generation tools, you can teach a chatbot to answer questions from your own documents in about 20 minutes — no machine learning experience required.

What "training" actually means

When most people say "train a chatbot on my data," they don't mean fine-tuning a large language model — a process that costs thousands of dollars and months of work. They mean something much simpler: giving the chatbot access to their content so it can answer questions from it.

The technical term for this is Retrieval-Augmented Generation (RAG). Here's how it works in plain English:

  1. Your documents get broken into small chunks (a few hundred words each).
  2. Each chunk gets converted into a mathematical representation called an embedding — a list of numbers that captures the meaning of the text.
  3. When a user asks a question, the system finds the chunks most semantically similar to the question.
  4. Those chunks get passed to an AI language model (like Claude or GPT-4), which uses them to generate a precise answer.

The result is a chatbot that answers from your specific content — not from generic internet knowledge. This is what makes it useful for your business: it knows your pricing, your policies, and your product, because you told it. Learn more on our knowledge base chatbot page.

What data formats you can use

VocUI accepts several content types. You don't need to convert everything into one format — you can mix and match.

  • URLs Paste any web page and VocUI scrapes the text. This works for your help center, pricing page, product pages, blog posts, and more. You can add multiple URLs — one at a time or as a list.
  • PDFs Upload product manuals, onboarding guides, service brochures, or any reference document. VocUI extracts the text automatically, including from multi-page files.
  • DOCX files Microsoft Word documents work just as well as PDFs. If your internal knowledge is in Word files, upload them directly.
  • Q&A pairs Type questions and answers manually. This is the most targeted format because you control exactly what gets asked and what the answer should be. Use this for your most common support questions.
  • Plain text Paste raw text directly if you have content that doesn’t fit neatly into a file or URL.

There's no need to format your content in any special way before uploading. VocUI processes whatever you give it. The only thing that matters is that the content is text — scanned images of documents won't work unless they have been OCR'd first.

Step 1: Add a knowledge source

In your VocUI dashboard, open your chatbot and go to the Knowledge tab. Click "Add source" and choose the type of content you want to add.

Start with your most important content. If you have a help center, start there. If you're a service business, your FAQ page and your "How it works" page are usually the highest-value sources. If you're a lawyer or professional services firm, a document explaining your areas of practice and intake process is a good first source — see our guide for law firm chatbots for specifics.

You don't need to add everything at once. Start with a few key sources, test the chatbot, and add more content where you find gaps.

Step 2: Let the system embed your content

After you add a source, VocUI processes it automatically. You'll see a status indicator in the Knowledge tab — usually it goes from "Processing" to "Ready" within a minute or two, depending on the size of the content.

During processing, VocUI is doing several things:

  • Splitting the content into overlapping chunks to preserve context
  • Sending each chunk through an embedding model to create a vector representation
  • Storing those vectors in a database built for fast semantic search

You don't need to do anything during this step. Just wait for the status to turn green, then move on to testing.

Step 3: Test retrieval

Once your sources are processed, use the built-in chat tester in the dashboard. Ask the chatbot a few questions that your documents should answer.

Pay attention to three things:

  • Accuracy Is the answer factually correct based on your content?
  • Completeness Does it give the full answer, or just part of it?
  • Tone Does it sound like something your business would say?

If the answers are accurate but the tone is off, that's a system prompt issue (covered in Step 4). If the answers are wrong or missing key information, you need to add more content or improve the existing content in your knowledge base.

A common issue: you add a URL, but the chatbot doesn't know a piece of information that's clearly on that page. This usually happens when the content is inside a JavaScript-rendered component that the scraper couldn't read. The fix is to add that information as a Q&A pair or plain text source instead.

Step 4: Refine with a system prompt

The system prompt is a set of instructions the AI sees before every conversation. It's how you control the chatbot's personality, scope, and behavior.

A good system prompt for a small business chatbot covers three things:

  1. Identity What the chatbot is and who it represents. Example: “You are a helpful assistant for Greenfield Landscaping. Answer questions about our services, pricing, and scheduling.”
  2. Scope What topics are in bounds and out of bounds. Example: “Only answer questions related to our landscaping services. Do not provide advice on unrelated topics.”
  3. Escalation What to do when the chatbot can’t help. Example: “If you can’t answer a question, let the customer know and suggest they call us at 555-0100 or send an email to [email protected].”

Keep the system prompt short and direct. A paragraph or two is usually enough. Longer prompts tend to get ignored by the model in subtle ways.

Following along? Create your chatbot now and try each step live.

Try it free

How often to update your knowledge base

The chatbot only knows what you've told it. Any time your products, pricing, policies, or services change, you need to update the knowledge base.

A practical rule: if it would update your FAQ page, it should update your chatbot knowledge base. Some businesses build it into their launch checklist — every time they change a price or add a service, updating the chatbot is a step in the process.

For URL-based sources, you can re-scrape them on demand. If your help center page changes, you can delete the old URL source and add it again — the new content will be processed automatically. VocUI's paid plans also support scheduled re-indexing so you don't have to remember to do this manually.

What to do when the chatbot gives wrong answers

Wrong answers from a knowledge base chatbot almost always have one of three causes:

  1. The information isn’t in the knowledge base. Add it. Either upload the document that has it, or create a Q&A pair that covers the specific question.
  2. The information is there but it’s ambiguous. If your knowledge base says “our services start at $99” in one place and “custom pricing available” in another, the chatbot might give inconsistent answers. Make the content consistent.
  3. The system prompt is too permissive. If you haven’t told the chatbot to stick to your content, it might fill in gaps with general knowledge — which could be wrong for your specific business. Tighten the system prompt.

The conversation logs in your dashboard are the best diagnostic tool. Look at the exact messages where the chatbot went wrong, trace it back to what was in the knowledge base at that time, and fix the source content.

FAQ

Do I need machine learning experience to train a chatbot?
No. VocUI handles all the ML infrastructure — embeddings, vector storage, retrieval — automatically. You just provide the content.
How many documents can I train the chatbot on?
This depends on your plan. The free plan supports a limited number of knowledge sources. Paid plans allow significantly more content, including full website crawls.
How often should I update the chatbot's knowledge base?
Any time your products, pricing, policies, or services change. A good rule: if you would update your FAQ page, update your chatbot knowledge base too.
What happens if the chatbot doesn't know the answer?
By default, VocUI chatbots say they don't have that information rather than making something up. You can configure them to escalate to a human agent in that case.
Can I train the chatbot on confidential documents?
Yes. Your knowledge base content is stored securely and only accessible to your chatbot. It's never used to train any shared models. Check our privacy policy for full details.

You read the guide -- now build it

Upload your content and follow along with a working chatbot in front of you.

Most people finish setup in under 5 minutes.

Create your chatbot

Free plan included -- no credit card needed