There is an enormous gap between using artificial intelligence and building a product with artificial intelligence. The first one is something anyone with a ChatGPT subscription can do. The second requires hundreds of architecture decisions, most of them invisible to the end user.
In this article, we open the workshop door. We are going to share the five most important technical decisions we made while building PropPilot’s AI agent, the mistakes we made along the way, and what we would do differently if we started from scratch today.
We are doing this for three reasons: because we believe in transparency as a product value, because we want to contribute to the conversation about agentic AI in real estate (a topic we already explored in our article on agentic CRMs), and because documenting what we have learned forces us to reflect with rigour.
Decision 1: The LLM Is the Brain, Not the Product
The first temptation when building with language models is to treat the LLM as the product. “We have GPT-4, we are done.” This is a mistake we saw repeated across dozens of startups throughout 2024 and 2025.
Our foundational decision was to treat the LLM as a replaceable component. The model is the brain, but the product is the entire nervous system: the orchestration, tools, memory, safety guardrails, and business logic that surround it.
In practice, this means we can swap model providers --- and we have done so several times --- without touching the user experience. When a new model is faster, cheaper, or more accurate for a specific task, we substitute it in. The agent continues to behave the same way because the product’s intelligence does not reside solely in the model.
This decision had a high upfront cost: building an abstraction layer takes time. But every time we have needed to migrate, upgrade, or combine models, the return has been massive.
The lesson: if your product breaks when the model changes, you do not have a product. You have a wrapper.
Decision 2: Tool-Based Architecture
PropPilot’s agent does not “know” anything about our clients’ properties. It does not have the database memorised, nor the agents’ schedules, nor up-to-date pricing. What it has are tools.
We adopted a tool-calling architecture from day one. When a lead asks “Do you have three-bedroom apartments in Dubai Marina?”, the agent does not answer from memory. It invokes a tool that queries the developer’s property database, filters by the relevant criteria, and returns current results.
The agent’s core tools include:
- Inventory lookup: searches the property database with filters for location, price, typology, and availability.
- Calendar management: checks availability and creates appointments with sales agents.
- Lead profile: reads and updates the contact’s profile in the CRM.
- Conversation history: retrieves context from previous interactions.
- Project documentation: accesses sales materials, floor plans, and technical specifications.
This approach has one fundamental advantage: the data is always current. The agent cannot claim a unit is available if it was sold that morning, because the tool queries the data in real time.
The lesson: an agent that “knows” things is an agent that can lie. An agent that looks things up is an agent that can be accurate.
Decision 3: Memory as a First-Class Citizen
Most chatbots treat memory as a chat log: a list of previous messages sent to the model as context. We decided that was not enough.
In PropPilot, memory has structure. It is not just “what the lead said” but an enriched profile that includes:
- Explicit preferences: “I am looking for three bedrooms, maximum AED 2.5 million, near the metro.”
- Implicit preferences: if a lead consistently asks about the view, they probably care about floor level and orientation.
- Conversation state: where the lead sits in the funnel, what information has already been shared, what objections have been raised.
- Interaction summaries: we do not store every literal message from long conversations. We generate structured summaries that capture the essentials.
This structured memory enables something that chat history alone cannot: real continuity across channels. If a lead messages on WhatsApp on Monday and fills out a web form on Wednesday, the agent picks up the conversation where it left off, with full context.
The lesson: memory is not a log. It is a model of the customer that gets richer with every interaction.
Decision 4: Guardrails from Day One
We knew that a real estate agent that hallucinates is worse than having no agent at all. If the agent invents a price, a floor area, or a contractual condition, the damage is real: legal, financial, and reputational.
That is why we designed a safety guardrail system from the very first iteration:
- Factual data only from verified sources: the agent never generates numerical data (prices, areas, handover dates) on its own. It always retrieves them from tools that query the developer’s database.
- Explicit behavioural constraints: the agent has a set of rules defining what it cannot do. It cannot negotiate prices. It cannot make delivery promises. It cannot give legal or tax advice.
- Response validation: before sending certain types of responses, a second process verifies that the data mentioned matches the database records.
- Out-of-scope intent detection: if the lead asks something outside the agent’s domain (specific mortgage products, complex legal matters), the agent acknowledges it and escalates.
What is interesting is that these guardrails do not make the agent slower or less useful. On the contrary: by knowing exactly what it can and cannot do, the agent is more decisive about the things that are within its scope.
The lesson: trust is built with clear boundaries, not infinite capabilities.
Decision 5: Human-in-the-Loop by Design
This was perhaps the most counterintuitive decision. In a world that celebrates full automation, we decided that escalation to humans is not a system failure. It is a feature.
PropPilot’s agent is designed to detect when a lead needs to speak with a person and to facilitate that transition without friction. Escalation scenarios include:
- Complex objections: the lead has concerns that require human empathy and negotiation skill.
- Advanced purchase decisions: when the lead is ready to reserve, a sales agent closes the deal.
- Emotional situations: buying a home is a life decision. There are moments that call for human connection.
- Out-of-scope requests: any request the agent cannot resolve with its available tools.
The agent does not disappear when it escalates. It prepares a conversation summary for the sales agent so the lead does not have to repeat information. And after the human interaction concludes, the agent can resume follow-up.
The lesson: the best agent is not the one that does everything. It is the one that knows when to step aside.
The Mistakes We Made
It would be dishonest to share only the good decisions. Here are the three most significant mistakes:
Overengineering the First Version
The first architecture of the agent was unnecessarily complex. We had multiple orchestration layers, an overly abstract plugin system, and routing logic that tried to cover cases that did not yet exist. We spent weeks building a system we later simplified radically.
The cause was clear: we designed for the product we imagined in two years, not the one we needed in two months. Starting simple and adding complexity when real usage demands it would have been far more efficient.
Too Many Channels at Once
We tried to launch the agent on WhatsApp, web chat, email, and SMS simultaneously. Every channel has its quirks: message length limits, media formats, expected response times, and different conversational tones.
We should have mastered one channel first --- WhatsApp, where the bulk of real estate interaction happens in the UAE and broader Middle East market --- and then expanded. The result of launching everything at once was that no single channel worked really well in the first weeks.
Underestimating the Importance of Tone
We spent months on the technical architecture and days on the tone of voice. Mistake. In real estate, particularly in the UAE’s multicultural market where leads come from dozens of nationalities, the way you communicate matters enormously. A tone that is too formal can feel cold and distant. A tone that is too casual can erode trust, especially for high-value off-plan purchases.
We had to iterate extensively on the prompt system to find a tone that was professional without being stiff, approachable without being disrespectful, and adaptable to the lead’s own communication style and cultural context.
What We Would Do Differently Today
If we started from scratch today, with everything we have learned:
-
Start with evaluation, not development. Before writing a single line of agent code, we would build the evaluation framework: how we measure whether the agent is doing its job well. Without clear metrics, you cannot improve systematically. This is precisely what we will discuss in our next article on metrics that matter in agentic AI.
-
One channel, one use case, done well. WhatsApp, initial lead response. Nothing else until the metrics confirm it works.
-
More time on prompts, less on abstractions. Prompt engineering is real engineering. It deserves the same rigour as API design.
-
Involve sales agents from day one. The best inputs for training the agent did not come from us. They came from the sales professionals who have spent years speaking with property buyers.
Conclusion
Building an AI agent for real estate is not an AI problem. It is a product problem that uses AI as a tool. The most important decisions are not which model to use, but how to design the system around it: its tools, its memory, its boundaries, and its relationship with people.
We hope this look behind the code is useful for other teams building agents in real estate or any other sector. The lessons are quite universal: start simple, measure everything, respect boundaries, and never underestimate the human factor.
Want to see how PropPilot’s agent works in practice? Try our Mystery Shopper: send a test lead to your own agency and compare how your team responds versus how PropPilot would. No commitment, no cost, in under two minutes.
PropPilot.ai