Business & Marketing

CustomGPT.ai

Name: CustomGPT.ai
Availability: InStock
Author: AI Demos

A knowledge-base website chatbot that feels unusually human and keeps follow-up context well, but it can still hallucinate support contacts.

Visit CustomGPT.ai

Strong empathyFollow-up context heldRagHallucinated contact info

TL;DR — our verdictUpdated June 2026 · 11 test artifacts

Best tone in the test, with one serious production caveat

Where it wins

You want a customer-facing website chatbot that sounds warm, personal, and more like a real support agent than a policy lookup tool.
You need strong retrieval on support-policy content such as warranty, shipping, and returns questions.
You care about follow-up context retention across multi-turn customer-service conversations.

Main limitation

You need strict no-hallucination behavior for support contact details or escalation instructions.

Pricing (verified plans)

Standard $99/moPremium $499/moEnterprise Custom pricing

Strongest test artifacts

Observed output →Output screenshot →

Our take

CustomGPT was the strongest overall performer in this benchmark for teams that want a customer-support chatbot to sound warm, personal, and human instead of robotic. It retrieved warranty, shipping, and returns information accurately across simple, medium, and complex questions, and it handled follow-up questions with especially strong context retention. The main risk is production-critical: in the frustration scenario, it hallucinated support contact details that were not in the knowledge base. That means its tone is excellent, but it still needs strict grounding controls before going live.

Hands-on CustomGPT test walkthrough.

In-Depth Review

Our detailed analysis of CustomGPT.ai — features, performance, and real-world testing.

AI Demos Team

Expert Reviewer

Verified Review

Feature-by-Feature Breakdown

Knowledge-base answer generation

Strong factual retrieval across warranty, shipping, and returns content.

▾

Test Summary

Feature tested: Knowledge-base answer generation

Result: Passed — Strong factual retrieval across warranty, shipping, and returns content.

Feature tested: Knowledge-base answer generation

Result: Passed

Verdict: Strong factual retrieval across warranty, shipping, and returns content.

Expected behavior: Retrieves and summarizes support-policy information from uploaded knowledge-base documents for both direct and complex customer questions. In this test, it was exercised on warranty coverage, delivery timelines, and a non-returnable product with a manufacturing defect under warranty.

Test case: Text prompt → Image

Input type: Text prompt

Input used: Input artifact (Text prompt): Warranty coverage question

Observed output: Output artifact (Image): CustomGPT correctly answered the warranty coverage question by pulling directly from the uploaded policy document. It outlined what the warranty covers — includ — customgpt-input1-warranty-coverage-step1-initial-response.png

Input artifact: Input artifact (Text prompt): Warranty coverage question

Output artifact: Output artifact (Image): CustomGPT correctly answered the warranty coverage question by pulling directly from the uploaded policy document. It outlined what the warranty covers — includ — customgpt-input1-warranty-coverage-step1-initial-response.png

What changed: Text prompt transformed into Image

Test case: Text prompt → Image

Input type: Text prompt

Input used: Input artifact (Text prompt): Delivery timeline question

Observed output: Output artifact (Image): CustomGPT answered the delivery timeline question accurately by retrieving the information directly from the uploaded policy document. It broke down delivery ti — customgpt-input2-delivery-express-step1-initial-response-1.png

Input artifact: Input artifact (Text prompt): Delivery timeline question

Output artifact: Output artifact (Image): CustomGPT answered the delivery timeline question accurately by retrieving the information directly from the uploaded policy document. It broke down delivery ti — customgpt-input2-delivery-express-step1-initial-response-1.png

What changed: Text prompt transformed into Image

Test case: Text prompt → Image

Input type: Text prompt

Input used: Input artifact (Text prompt): Complex warranty + returns question

Observed output: Output artifact (Image): CustomGPT handled this complex cross-policy question well — correctly distinguishing between the return policy and the warranty policy. It clarified that while — customgpt-input3-nonreturnable-defect-step1-initial-response.png

Input artifact: Input artifact (Text prompt): Complex warranty + returns question

Output artifact: Output artifact (Image): CustomGPT handled this complex cross-policy question well — correctly distinguishing between the return policy and the warranty policy. It clarified that while — customgpt-input3-nonreturnable-defect-step1-initial-response.png

What changed: Text prompt transformed into Image

Why it matters / Conclusion: Retrieval quality was strong across simple, medium, and complex policy questions, with omissions more common than hallucination in standard knowledge-base answers.

Retrieves and summarizes support-policy information from uploaded knowledge-base documents for both direct and complex customer questions. In this test, it was exercised on warranty coverage, delivery timelines, and a non-returnable product with a manufacturing defect under warranty.

INPUT

What does the warranty cover?

↓→

Image

Output artifact for "Knowledge-base answer generation" test: CustomGPT correctly answered the warranty coverage question by pulling directly from the uploaded policy document. It outlined what the warranty covers — includ, customgpt-input1-warranty-coverage-step1-initial-response.png

CustomGPT correctly answered the warranty coverage question by pulling directly from the uploaded policy document. It outlined what the warranty covers — including manufacturing defects and hardware failures — and cited the relevant policy section as a source, confirming the response was grounded in the knowledge base rather than a generic AI reply.

INPUT

How long does delivery take?

↓→

Image

Output artifact for "Knowledge-base answer generation" test: CustomGPT answered the delivery timeline question accurately by retrieving the information directly from the uploaded policy document. It broke down delivery ti, customgpt-input2-delivery-express-step1-initial-response-1.png

CustomGPT answered the delivery timeline question accurately by retrieving the information directly from the uploaded policy document. It broke down delivery timeframes by shipping type — standard and express — and cited the source section, confirming the answer was grounded in the knowledge base rather than a generic estimation.

INPUT

If my product is non-returnable but develops a manufacturing defect within warranty, what options do I have?

↓→

Image

Output artifact for "Knowledge-base answer generation" test: CustomGPT handled this complex cross-policy question well — correctly distinguishing between the return policy and the warranty policy. It clarified that while, customgpt-input3-nonreturnable-defect-step1-initial-response.png

CustomGPT handled this complex cross-policy question well — correctly distinguishing between the return policy and the warranty policy. It clarified that while the product is non-returnable, a manufacturing defect within the warranty period still qualifies for repair or replacement under the warranty clause. The response cited both the returns and warranty sections of the policy document, showing it can resolve questions that require reasoning across multiple policy areas simultaneously.

Bottom Line

Retrieval quality was strong across simple, medium, and complex policy questions, with omissions more common than hallucination in standard knowledge-base answers.

Conversational follow-up handling

One of its strongest capabilities: it preserved context and handled nuanced follow-ups cleanly.

▾

Test Summary

Feature tested: Conversational follow-up handling

Result: Passed — One of its strongest capabilities: it preserved context and handled nuanced follow-ups cleanly.

Feature tested: Conversational follow-up handling

Result: Passed

Verdict: One of its strongest capabilities: it preserved context and handled nuanced follow-ups cleanly.

Expected behavior: Maintains conversational context across turns and answers narrower follow-up questions without losing the original topic. This was tested on warranty nuance, shipping eligibility, and claim-outcome follow-ups.

Test case: Text prompt → Image

Input type: Text prompt

Input used: Input artifact (Text prompt): Warranty follow-up

Observed output: Output artifact (Image): On the follow-up 'Is battery degradation covered under warranty?', CustomGPT kept the warranty context and made the subtle but important distinction that normal — customgpt-battery-degradation-warranty-answer.png

Input artifact: Input artifact (Text prompt): Warranty follow-up

Output artifact: Output artifact (Image): On the follow-up 'Is battery degradation covered under warranty?', CustomGPT kept the warranty context and made the subtle but important distinction that normal — customgpt-battery-degradation-warranty-answer.png

What changed: Text prompt transformed into Image

Test case: Text prompt → Image

Input type: Text prompt

Input used: Input artifact (Text prompt): Shipping follow-up

Observed output: Output artifact (Image): On 'What about express delivery for remote areas?', it preserved the shipping context and answered directly that express delivery is not available for remote re — customgpt-express-delivery-remote-areas-policy.png

Input artifact: Input artifact (Text prompt): Shipping follow-up

Output artifact: Output artifact (Image): On 'What about express delivery for remote areas?', it preserved the shipping context and answered directly that express delivery is not available for remote re — customgpt-express-delivery-remote-areas-policy.png

What changed: Text prompt transformed into Image

Test case: Text prompt → Image

Input type: Text prompt

Input used: Input artifact (Text prompt): Claim outcome follow-up

Observed output: Output artifact (Image): On 'Will I get a refund or only a repair?', it correctly said repair or replacement is the usual warranty outcome and that refunds are possible but depend on ve — customgpt-warranty-refund-vs-repair-response.png

Input artifact: Input artifact (Text prompt): Claim outcome follow-up

Output artifact: Output artifact (Image): On 'Will I get a refund or only a repair?', it correctly said repair or replacement is the usual warranty outcome and that refunds are possible but depend on ve — customgpt-warranty-refund-vs-repair-response.png

What changed: Text prompt transformed into Image

Why it matters / Conclusion: Context retention was excellent across all tested follow-ups; the only notable miss was an over-narrow refund interpretation in the most complex thread.

Maintains conversational context across turns and answers narrower follow-up questions without losing the original topic. This was tested on warranty nuance, shipping eligibility, and claim-outcome follow-ups.

INPUT

Is battery degradation covered under warranty?

↓→

image

Output artifact for "Conversational follow-up handling" test: On the follow-up 'Is battery degradation covered under warranty?', CustomGPT kept the warranty context and made the subtle but important distinction that normal, customgpt-battery-degradation-warranty-answer.png

On the follow-up 'Is battery degradation covered under warranty?', CustomGPT kept the warranty context and made the subtle but important distinction that normal battery degradation is excluded, while charging failure or certified manufacturing defects in rechargeable batteries are covered for up to 6 months. The answer also showed a source reference below the reply.

INPUT

What about express delivery for remote areas?

↓→

image

Output artifact for "Conversational follow-up handling" test: On 'What about express delivery for remote areas?', it preserved the shipping context and answered directly that express delivery is not available for remote re, customgpt-express-delivery-remote-areas-policy.png

On 'What about express delivery for remote areas?', it preserved the shipping context and answered directly that express delivery is not available for remote regions. It then added the relevant constraints: eligible pin codes in non-remote areas only, non-hazardous products, orders placed before 2:00 PM local time, and exclusions for oversized products, hazardous materials, and marketplace seller orders.

INPUT

Will I get a refund or only a repair?

↓→

image

Output artifact for "Conversational follow-up handling" test: On 'Will I get a refund or only a repair?', it correctly said repair or replacement is the usual warranty outcome and that refunds are possible but depend on ve, customgpt-warranty-refund-vs-repair-response.png

On 'Will I get a refund or only a repair?', it correctly said repair or replacement is the usual warranty outcome and that refunds are possible but depend on verification. It also mentioned original-payment-method refunds and that partial refunds or restocking fees can apply. The nuance was mostly strong, but the researcher noted the answer leaned too hard toward partial refunds even though the knowledge base also allows full refunds in some cases.

Bottom Line

Context retention was excellent across all tested follow-ups; the only notable miss was an over-narrow refund interpretation in the most complex thread.

Source reference display

Visible source references were present in multiple answers.

▾

Test Summary

Feature tested: Source reference display

Result: Passed — Visible source references were present in multiple answers.

Feature tested: Source reference display

Result: Passed

Verdict: Visible source references were present in multiple answers.

Expected behavior: Shows document-level source references beneath replies so users can see which knowledge-base file grounded the answer. This was observed on warranty, shipping, and refund-related follow-ups.

Test case: Text prompt → Image

Input type: Text prompt

Input used: Input artifact (Text prompt): Warranty citation check

Observed output: Output artifact (Image): The response included a 'Sources referenced in this response' panel pointing to novatech_warranty_coverage_guide.pdf, giving visible grounding rather than an un — customgpt-battery-degradation-warranty-answer.png

Input artifact: Input artifact (Text prompt): Warranty citation check

Output artifact: Output artifact (Image): The response included a 'Sources referenced in this response' panel pointing to novatech_warranty_coverage_guide.pdf, giving visible grounding rather than an un — customgpt-battery-degradation-warranty-answer.png

What changed: Text prompt transformed into Image

Test case: Text prompt → Image

Input type: Text prompt

Input used: Input artifact (Text prompt): Shipping citation check

Observed output: Output artifact (Image): The answer showed a source reference to NovaTech Delivery SLA Policy.pdf beneath the reply, making the shipping constraint traceable to the knowledge base. — customgpt-express-delivery-remote-areas-policy.png

Input artifact: Input artifact (Text prompt): Shipping citation check

Output artifact: Output artifact (Image): The answer showed a source reference to NovaTech Delivery SLA Policy.pdf beneath the reply, making the shipping constraint traceable to the knowledge base. — customgpt-express-delivery-remote-areas-policy.png

What changed: Text prompt transformed into Image

Test case: Text prompt → Image

Input type: Text prompt

Input used: Input artifact (Text prompt): Refund-policy citation check

Observed output: Output artifact (Image): The response displayed novatech_warranty_coverage_guide.pdf as a referenced source, showing that citation support remained visible even in a follow-up exchange. — customgpt-warranty-refund-vs-repair-response.png

Input artifact: Input artifact (Text prompt): Refund-policy citation check

Output artifact: Output artifact (Image): The response displayed novatech_warranty_coverage_guide.pdf as a referenced source, showing that citation support remained visible even in a follow-up exchange. — customgpt-warranty-refund-vs-repair-response.png

What changed: Text prompt transformed into Image

Why it matters / Conclusion: CustomGPT visibly cited source documents in multiple replies, though this test did not independently audit citation completeness for every answer.

Shows document-level source references beneath replies so users can see which knowledge-base file grounded the answer. This was observed on warranty, shipping, and refund-related follow-ups.

INPUT

Is battery degradation covered under warranty?

↓→

image

Output artifact for "Source reference display" test: The response included a 'Sources referenced in this response' panel pointing to novatech_warranty_coverage_guide.pdf, giving visible grounding rather than an un, customgpt-battery-degradation-warranty-answer.png

The response included a 'Sources referenced in this response' panel pointing to novatech_warranty_coverage_guide.pdf, giving visible grounding rather than an uncited answer.

INPUT

What about express delivery for remote areas?

↓→

image

Output artifact for "Source reference display" test: The answer showed a source reference to NovaTech Delivery SLA Policy.pdf beneath the reply, making the shipping constraint traceable to the knowledge base., customgpt-express-delivery-remote-areas-policy.png

The answer showed a source reference to NovaTech Delivery SLA Policy.pdf beneath the reply, making the shipping constraint traceable to the knowledge base.

INPUT

Will I get a refund or only a repair?

↓→

image

Output artifact for "Source reference display" test: The response displayed novatech_warranty_coverage_guide.pdf as a referenced source, showing that citation support remained visible even in a follow-up exchange., customgpt-warranty-refund-vs-repair-response.png

The response displayed novatech_warranty_coverage_guide.pdf as a referenced source, showing that citation support remained visible even in a follow-up exchange.

Bottom Line

CustomGPT visibly cited source documents in multiple replies, though this test did not independently audit citation completeness for every answer.

Empathetic support response

Best conversational tone in the benchmark, but also the source of the test's biggest safety issue.

▾

Test Summary

Feature tested: Empathetic support response

Result: Passed — Best conversational tone in the benchmark, but also the source of the test's biggest safety issue.

Adapts tone to the user's emotional state and responds more like a live support agent than a policy search tool. This was visible across normal policy questions and was stress-tested with an explicitly angry customer message.

INPUT

I am so frustrated! This is the 3rd time my order has been wrong. I feel like breaking my PC right now!

↓→

image

Output artifact for "Empathetic support response" test: When given an angry message about receiving the wrong order three times, the bot immediately acknowledged the user's emotion, apologized, and invited the user t, customgpt-customer-support-apology-and-escalation.png

When given an angry message about receiving the wrong order three times, the bot immediately acknowledged the user's emotion, apologized, and invited the user to explain what went wrong so it could help with order status, complaint handling, or replacement steps. The empathy opener was the strongest observed in this benchmark.

INPUT

I am so frustrated! This is the 3rd time my order has been wrong. I feel like breaking my PC right now!

↓→

Image

Output artifact for "Empathetic support response" test: CustomGPT handled the frustrated user message with empathy while staying on-policy. It acknowledged the repeated wrong orders, avoided engaging with the threat, customgpt-input4-frustration-step1-response.png

CustomGPT handled the frustrated user message with empathy while staying on-policy. It acknowledged the repeated wrong orders, avoided engaging with the threat of breaking the PC, and redirected the conversation to the correct resolution path — reporting a wrong item with the required documentation. The response balanced emotional sensitivity with accurate policy guidance without skipping either.

Bottom Line

CustomGPT had the most human-feeling support tone of any tool tested, but that same style also produced the report's most serious grounding failure: fabricated contact details.

Pricing & Access

Standard

$99/mo (or $89/mo billed annually)

10 custom AI agents, 5,000 pages/chatbot, 60M words content, 1,000 queries/month, 3 team members, basic analytics, OpenAI API access, helpdesk support. 7-day free trial available.

Premium

$499/mo (or $449/mo billed annually)

100 custom AI agents, 20,000 items/chatbot, 300M words stored, 5,000 GPT-4 queries/month, 5 team members, remove CustomGPT branding, PII removal, OCR support, 1-on-1 support.

Enterprise

Custom pricing

Custom chatbot & token limits, SSO, audit logs, dedicated account manager, call & email support, custom SLAs, SOC 2 Type II certified, HIPAA compliant. Annual commitment required.

Pricing verified June 2026. We re-check quarterly.

Is This Right For You?

A side-by-side guide based on our hands-on testing.

✓ Use This If

●You want a customer-facing website chatbot that sounds warm, personal, and more like a real support agent than a policy lookup tool.

●You need strong retrieval on support-policy content such as warranty, shipping, and returns questions.

●You care about follow-up context retention across multi-turn customer-service conversations.

●Visible source references matter to your workflow, even if you are not expecting fully audited citation behavior in every reply.

✕ Skip This If

●You need strict no-hallucination behavior for support contact details or escalation instructions.

●You prefer short, utilitarian answers over longer empathetic replies with repeated follow-up prompts.

●You need strong evidence on deployment setup, widget customization, or embed implementation from the test itself; those areas were not deeply evaluated here.

Business & MarketingCustomer Support Chatbotstext

It performed strongly on direct policy retrieval. In this test, it accurately answered warranty coverage, shipping timelines, and a complex non-returnable defect question. The more common issue was omission of some available details, such as international warranty variation or Premium-member delay compensation, rather than generic hallucination in standard policy answers.

Yes. It retained context well across all three follow-up chains tested. It correctly handled battery degradation as a warranty nuance, confirmed that express delivery is unavailable for remote areas without losing the shipping thread, and answered the refund-versus-repair follow-up within the original warranty-claim context.

Yes. Several responses displayed a visible 'Sources referenced in this response' section tied to specific files such as novatech_warranty_coverage_guide.pdf and NovaTech Delivery SLA Policy.pdf.

The biggest issue was in the angry-customer scenario: CustomGPT hallucinated a phone number and support email that were not in the knowledge base. For a customer-support chatbot, that is a production-critical grounding failure even though the rest of the response was highly empathetic.

Warm customer experience. It was the most conversational and empathetic tool tested, but that also meant some answers became long and its follow-up prompts sometimes felt formulaic or slightly off-topic.

Not deeply. The report included a tool demo video and the screenshots showed deploy prompts, but the hands-on findings focused on retrieval quality, follow-up handling, citations, and tone rather than a detailed widget-setup or customization evaluation.