Zencity's Dynamic Topic Model enables accurate, nuanced categorization of open-ended survey responses. Unlike traditional topic modeling, which applies a fixed set of generic topics that may not reflect the unique characteristics of a local community and are rarely updated, the Dynamic Topic Model identifies highly specific topics tailored to each community and continuously adapts by adding new ones as they emerge.
The model is built on a combination of deterministic ML algorithms and large language models (LLMs), chosen for their explainability and high output quality. This architecture delivers higher coverage of labeled responses, topic names that are organized into a clear taxonomy, and a significantly lower mislabeling rate compared to earlier approaches. The model analyzes the semantic meaning of survey responses, assigning them to one or more relevant topics and subtopics based on the collected answers for each question -- ensuring your survey data is labeled with precision and uncovering deeper trends and insights from resident comments.
How Does the Dynamic Labeling Process Work?
When a survey response is submitted, it is analyzed and categorized using Zencity’s dynamic topic and subtopic tree. This process follows these steps:
- Each response is carefully analyzed and broken down into keywords and phrases that reflect its meaning.
- The analyzed responses are used to identify the most frequently mentioned topics within each community. These topics may include broad themes, such as housing or law enforcement, and specific local issues, like an elected official, a policy change, or a community event.
- The identified topics are structured into an organized taxonomy of labels, which are then applied to tag individual responses that align with each topic.
- The system is periodically updated to incorporate new and emerging topics within each community, ensuring both new and past responses are labeled with the most relevant and up-to-date information.
The model requires a dataset of at least 400 valid, labelable responses to start labeling responses. When your survey is launched, or a new open-text question is added, there will be an initial training period before the model generates topics and starts labeling responses. Once this threshold is reached, the model labels all previously collected data and begins analyzing new responses as they come in.
Ensuring Accurate and Relevant Topics
To maintain precision and relevance, Zencity combines automated processes with human expertise:
- Human Annotation: Skilled professionals review and annotate sample responses to teach the model to recognize patterns and themes more effectively. This process plays a key role in improving the model's accuracy.
- Manual Adjustments: The Zencity Professional Services team can correct labeling errors and manage the topic tree directly within the platform -- adding or removing labels on individual responses and renaming, merging, deleting, or moving topics and subtopics as needed. Each correction is logged with structured metadata and a reason for the change, ensuring transparency and creating a reliable feedback loop for ongoing model improvement.
- Iterative Refinement: The model is continuously retrained using new data and human corrections, which are captured as structured ground truth and feed directly into model evaluation. This ensures that topics stay accurate and align with evolving trends over time.
By combining these efforts, we ensure your insights remain reliable and actionable.
Key Benefits of the Model
- Tailored to Your Community: The Dynamic Topic Model is designed to reflect the unique characteristics of each community. For example, the same survey question might generate different categories in different cities based on the local context and priorities.
- Higher Coverage: The model is designed to label a greater proportion of incoming responses, reducing the share of responses that go uncategorized.
- Improved Naming and Taxonomy: Topics are organized into a clear, consistent structure with names that are intuitive and easy to interpret -- making it simpler to navigate results and share insights with stakeholders.
- Lower Mislabeling Rate: The model's architecture significantly reduces incorrect label assignments, so the topics surfaced in your dashboard more reliably reflect what residents are actually saying.
- Multi-Label Assignment: Each response can be assigned multiple labels, allowing for a more comprehensive set of topics and ensuring detailed responses are linked to all relevant themes.
- Customizability: The model adapts to your community's unique needs and evolves alongside your data. It can identify connections between topics, such as linking a Police Chief's name to their role, ensuring that responses mentioning them by name are associated with broader themes like policing.
Where Can I Find the Model’s Output in my Dashboard
The output of the Dynamic Topic Model, which focuses on text-based feedback, is available in the "Feed" tab of your dashboard. Within this section, you can explore individual comments from open-text questions, each categorized into relevant topics and subtopics. For clarity, each comment is labeled with its assigned subtopics. Additionally, a summary of the topics and subtopics is conveniently displayed on the right side of the comment feed.
FAQs
How often is the model updated for improvement purposes?
The model is periodically updated and retrained to enhance accuracy and insights. Updates might recalibrate historical data to ensure all results reflect the latest advancements
What happens to responses when the model is updated?
When the model is updated, new and improved topic labels are applied going forward. For significant model upgrades, topic and subtopic names on open-text answers will change starting with the first cycle on the new version. The immediately prior cycle is also re-labeled to enable a clean topic-level comparison in your first report on the updated model. Cycles older than that retain their original labels. Raw response text, closed-ended questions, aspect scores, sentiment, and numeric results are never affected by model updates.
Can responses have more than one label?
Yes! Responses can be assigned multiple subtopics, providing a richer understanding of the themes within your data.
How often is data enriched with labels?
We label incoming responses to all surveys three times a day.
Will my trend comparisons still work?
Cycle-over-cycle comparisons in your first report on the updated model will work cleanly, because the prior cycle is re-labeled alongside the current one. However, trend views that extend further back — two or more cycles before the upgrade — cross the model version boundary and are not analytically valid at the topic level. Raw scores, sentiment, and other non-topic data are unaffected and can still be trended across any time period.
How are percentages for each topic calculated?
Percentages for each topic are calculated based only on labeled responses. Unlabeled responses that don’t fit into any topic are excluded. A topic's percentage is the number of responses labeled under it divided by the total labeled responses. These percentages, shown in dashboards and reports, provide users with a clear picture of how open-text responses are distributed across topics. Since unlabeled responses are excluded, the percentages represent the distribution of labeled responses only, ensuring that the data is focused on meaningful and categorized insights.
I use Blockwise — does this affect me?
Yes, in a meaningful way. Previously, Blockwise surveys applied a fixed, predefined set of topic categories to open-text answers. With the V5 model, that predefined list is replaced by topics generated dynamically from what residents actually wrote — the same approach already used in other survey types. This means the topic categories you were used to seeing in Blockwise open-text results will look different going forward, and better reflect your community's specific concerns.
Comments
0 comments
Please sign in to leave a comment.