[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$f3BCmTovqqazUpqAplniNXoDoElT23qROQY6VQks4RLM":3},{"slug":4,"category":5,"category_label":6,"category_color":7,"featured":8,"author":9,"hero_image":10,"reading_time":11,"title":12,"excerpt":13,"meta_title":14,"meta_desc":15,"tags":16,"products":24,"content":26,"images":27,"related_posts":28},"civic-reporting-duplicate-detection-civicreport","architecture","Architecture","teal",false,"Fimula","/storage/blog/heroes/civic-reporting-duplicate-detection-civicreport.jpg",7,"Duplicate Detection in CivicReport: Architecture of a Municipal Deduplication Engine","When citizens report the same pothole twelve times, a municipality wastes resources investigating duplicates. CivicReport runs an automated duplicate detection system that merges reports before staff ever see them.","Duplicate Detection in CivicReport: Municipal Deduplication Architecture | Fimula Blog","How CivicReport detects duplicate municipal reports using a three-layer pipeline: spatial proximity, temporal relevance, and semantic similarity. Self-hosted, GDPR-compliant, and fully configurable.",[17,20],{"slug":18,"name":19,"color":7},"self-hosting","Self-Hosting",{"slug":21,"name":22,"color":23},"open-source","Open Source","cyan",[25],"civicreport","\u003Cp>A pothole on Main Street gets reported on Monday by a driver. On Tuesday, a pedestrian reports the same pothole from the sidewalk. On Wednesday, a cyclist files a third report. By Friday, there are six reports for the same hole, three of them assigned to different staff members, two of them already marked \"In Progress\" independently.\u003C/p>\n\n\u003Cp>This is not a hypothetical edge case. In our data from municipalities running \u003Ca href=\"/products/civicreport\">CivicReport\u003C/a>, duplicate reports account for 15-30% of all submissions. Left unchecked, they waste staff time, inflate issue counts, and make resolution metrics unreliable. CivicReport's duplicate detection engine is designed to catch these before they reach the admin dashboard. Here is the architecture.\u003C/p>\n\n\u003Ch2 id=\"why-municipal-deduplication-is-hard\">Why Municipal Deduplication Is Hard\u003C/h2>\n\n\u003Cp>Duplicate detection in a municipal context is different from deduplicating a customer database. Customer records have structured fields: email, phone, company name. Municipal reports have unstructured text descriptions, user-submitted locations with varying accuracy, and photos taken from different angles at different times of day.\u003C/p>\n\n\u003Cp>The challenges are specific:\u003C/p>\n\n\u003Cul>\n\u003Cli>\u003Cstrong>Location imprecision:\u003C/strong> Two citizens reporting the same issue might place their map pins 50 meters apart. GPS accuracy on mobile phones varies. Indoor reports are even less precise.\u003C/li>\n\u003Cli>\u003Cstrong>Language variation:\u003C/strong> One person writes \"pothole near the bus stop.\" Another writes \"road damage on Oak Street.\" A third writes \"hole in the asphalt.\" All three describe the same issue.\u003C/li>\n\u003Cli>\u003Cstrong>Temporal spread:\u003C/strong> The same issue can be reported days or weeks apart. A pothole reported in January and reported again in March might be the same unrepaired hole or a new one nearby.\u003C/li>\n\u003Cli>\u003Cstrong>Category ambiguity:\u003C/strong> A broken streetlight could be categorized as \"Streetlight,\" \"Infrastructure,\" \"Roads,\" or \"Public Safety\" depending on the citizen and the municipality's category setup.\u003C/li>\n\u003C/ul>\n\n\u003Cp>Any system that relies on exact matches will miss most duplicates. We needed something that handles fuzzy data across multiple dimensions.\u003C/p>\n\n\u003Ch2 id=\"the-three-layer-detection-pipeline\">The Three-Layer Detection Pipeline\u003C/h2>\n\n\u003Cp>CivicReport's duplicate detection runs as a three-layer pipeline. Each layer applies a different matching strategy, and a report must pass through all three layers before it enters the admin queue.\u003C/p>\n\n\u003Ch3 id=\"layer-1-spatial-proximity\">Layer 1: Spatial Proximity\u003C/h3>\n\n\u003Cp>The first filter is geographic. When a new report comes in, the system queries all existing open issues within a configurable radius. The default is 100 meters for point issues like potholes and graffiti, and 500 meters for linear issues like road damage or water leaks.\u003C/p>\n\n\u003Cp>The radius is configurable per category because issue footprints vary. A single pothole is a point. A burst water main affects a stretch of road. Using the same radius for both would either miss linear duplicates or flag too many false positives for point issues.\u003C/p>\n\n\u003Cp>The spatial query uses a PostgreSQL GiST index on the report location column, which keeps lookups fast even with tens of thousands of active issues. A spatial proximity query runs quickly thanks to PostgreSQL GiST indexing.\u003C/p>\n\n\u003Ch3 id=\"layer-2-temporal-relevance\">Layer 2: Temporal Relevance\u003C/h3>\n\n\u003Cp>Once we have a set of spatially nearby issues, we filter by time. A report from six months ago is almost certainly not a duplicate of a report today, even at the exact same location. Most municipalities resolve issues within 30 days, so the default time window for duplicate matching is 60 days.\u003C/p>\n\n\u003Cp>The temporal filter also considers the issue status. A report cannot be a duplicate of an issue that is already \"Completed\" or \"Archived\" unless the citizen explicitly references it. If someone reports a pothole that was repaired last month, that is a new issue, not a duplicate.\u003C/p>\n\n\u003Cp>CivicReport's \u003Ca href=\"/products/civicreport\">8-status lifecycle\u003C/a> (Application, Review, Approved, Active, Report, Verify, Completed, Archived) makes this straightforward. We only compare against issues in the first six statuses. Completed and Archived issues are excluded from the duplicate check.\u003C/p>\n\n\u003Ch3 id=\"layer-3-semantic-similarity\">Layer 3: Semantic Similarity\u003C/h3>\n\n\u003Cp>This is where the system gets interesting. The spatial and temporal filters give us a small set of candidate issues. The semantic layer determines whether the new report actually describes the same problem.\u003C/p>\n\n\u003Cp>We use a two-stage comparison. First, we compare the category assignments. If the new report is categorized as \"Roads\" and all nearby candidates are categorized as \"Parks,\" we can immediately rule them out as duplicates. Category mismatch is a strong negative signal.\u003C/p>\n\n\u003Cp>Second, we compare the text descriptions. Rather than simple keyword matching, we use a combination of techniques:\u003C/p>\n\n\u003Cul>\n\u003Cli>\u003Cstrong>Keyword extraction:\u003C/strong> Pull significant nouns and adjectives from both the new report and candidate descriptions. \"Pothole,\" \"hole,\" \"asphalt,\" and \"damage\" are semantically related for road issues.\u003C/li>\n\u003Cli>\u003Cstrong>Phrase matching:\u003C/strong> Look for shared location references. \"near the bus stop on Oak Street\" in two reports is a strong signal, even if the rest of the description differs.\u003C/li>\n\u003Cli>\u003Cstrong>Description length ratio:\u003C/strong> If one report is three sentences and another is two words, they are less likely to be duplicates than two reports of similar length and detail level.\u003C/li>\n\u003C/ul>\n\n\u003Cp>The similarity score is a weighted combination of these factors. If it exceeds a configurable threshold (default: 0.7 on a 0-1 scale), the system flags the new report as a potential duplicate.\u003C/p>\n\n\u003Ch2 id=\"the-merge-workflow\">The Merge Workflow\u003C/h2>\n\n\u003Cp>When the system detects a potential duplicate, it does not silently merge the reports. Instead, it routes the new report to the approval queue with a \"Potential Duplicate\" flag and a link to the existing issue.\u003C/p>\n\n\u003Cp>The admin reviewer sees both reports side by side: the original with its full timeline and the new submission with its description, photos, and location. The reviewer has three options:\u003C/p>\n\n\u003Cul>\n\u003Cli>\u003Cstrong>Confirm duplicate:\u003C/strong> The new report is merged into the existing issue. The citizen who submitted the duplicate gets an automatic notification that their report has been linked to an existing issue, with a link to track its progress on the \u003Ca href=\"/products/civicreport\">public map\u003C/a>.\u003C/li>\n\u003Cli>\u003Cstrong>Not a duplicate:\u003C/strong> The new report enters the normal workflow as a standalone issue. The system logs this as a false positive, which is used to tune the similarity thresholds over time.\u003C/li>\n\u003Cli>\u003Cstrong>Related but separate:\u003C/strong> The reports are linked as related issues but processed independently. Useful when two reports describe nearby but distinct problems.\u003C/li>\n\u003C/ul>\n\n\u003Cp>This human-in-the-loop approach is deliberate. Automatic merging without review risks suppressing legitimate reports. A citizen who reports a pothole and gets no acknowledgment because the system silently merged their report will report it again, or worse, lose trust in the platform entirely.\u003C/p>\n\n\u003Ch2 id=\"image-comparison\">Image Comparison\u003C/h2>\n\n\u003Cp>CivicReport accepts photo uploads with reports. When a potential duplicate is detected, the system also compares the submitted photos against photos attached to the existing issue.\u003C/p>\n\n\u003Cp>Full image similarity analysis would be computationally expensive and unnecessary for this use case. Instead, we compare image metadata (GPS coordinates embedded in EXIF data, if available) and basic visual features like dominant colors and aspect ratio. A pothole photo taken from a car and a pothole photo taken on foot will have different angles, but they will share similar dominant colors (gray asphalt, dark hole) and similar aspect ratios.\u003C/p>\n\n\u003Cp>Image comparison is used as a supplementary signal, not a primary one. It increases confidence in a duplicate match but does not override the text and location analysis.\u003C/p>\n\n\u003Ch2 id=\"performance-and-scaling\">Performance and Scaling\u003C/h2>\n\n\u003Cp>The duplicate detection pipeline runs synchronously during report submission. This means the citizen sees the result immediately: either their report enters the queue, or they get a message saying \"This may be a duplicate of an existing report\" with a link to track the original.\u003C/p>\n\n\u003Cp>The three-layer approach keeps the computation fast. The spatial filter narrows candidates from thousands to a handful. The temporal filter narrows further. The semantic comparison runs on at most 5-10 candidate issues, which takes milliseconds. The three-layer approach keeps detection fast — spatial narrows the field, temporal narrows further, semantic comparison runs on only a handful of candidates.\u003C/p>\n\n\u003Cp>For larger municipalities with high submission volumes, the spatial query benefits from PostgreSQL's GiST indexing. The spatial query benefits from PostgreSQL's GiST indexing, which scales well with large datasets. The bottleneck is not the database; it is the semantic comparison, which is bounded by the small candidate set.\u003C/p>\n\n\u003Ch2 id=\"self-hosting-and-data-sovereignty\">Self-Hosting and Data Sovereignty\u003C/h2>\n\n\u003Cp>CivicReport runs on the Fimula Platform with two deployment options. \u003Ca href=\"/products/civicreport\">Fimula Lite\u003C/a> uses shared infrastructure with row-level security for tenant isolation. \u003Ca href=\"/products/civicreport\">Fimula Core\u003C/a> provides a dedicated PostgreSQL instance per tenant.\u003C/p>\n\n\u003Cp>The duplicate detection engine runs entirely within the tenant's database. No report text, photos, or location data is sent to an external service. This is important for municipalities because civic reports often contain location data that reveals citizens' daily patterns. Under GDPR, this is personal data, and sending it to a third-party API for analysis would require explicit consent and a data processing agreement.\u003C/p>\n\n\u003Cp>By keeping the detection pipeline local, CivicReport avoids this problem entirely. The municipality's data stays in their database, on their infrastructure (or the shared EU-hosted infrastructure for Lite tenants).\u003C/p>\n\n\u003Ch2 id=\"open-source-considerations\">Open-Source Considerations\u003C/h2>\n\n\u003Cp>CivicReport is available as a self-hosted solution. Municipalities that want full control over their data and the ability to inspect and modify the duplicate detection logic can deploy on their own infrastructure.\u003C/p>\n\n\u003Cp>The detection thresholds (spatial radius, temporal window, similarity score) are all configurable per tenant through the admin dashboard. For municipalities that want to go further, the scoring weights and category-specific rules are stored in database tables that can be modified directly.\u003C/p>\n\n\u003Cp>This configurability matters because duplicate patterns vary by municipality. A dense urban center gets more spatially close reports than a rural municipality. A tourist-heavy city gets more reports in multiple languages. The system needs to adapt to these differences without requiring code changes.\u003C/p>\n\n\u003Cp>Duplicate detection is not the most visible feature in a civic reporting platform. Citizens don't see it. It doesn't appear on the public map. But for municipal staff who process hundreds of reports per week, it is the difference between a manageable workload and an overwhelming one. If you are evaluating civic reporting platforms, ask how they handle duplicates. The answer tells you a lot about how well the system understands municipal operations.\u003C/p>",[],[29,46,60],{"slug":30,"category":5,"category_label":6,"category_color":7,"featured":8,"author":9,"hero_image":31,"reading_time":11,"title":32,"excerpt":33,"meta_title":34,"meta_desc":35,"tags":36,"products":44},"ai-lead-scoring-sales-core-crm","/storage/blog/heroes/ai-lead-scoring-sales-core-crm.jpg","AI Lead Scoring in Sales Core: How We Built It and Why It Works","Most CRMs score leads with static rules. Sales Core uses AI to score every contact based on activity, engagement signals, and deal behavior. Here is the architecture behind it.","AI Lead Scoring in Sales Core CRM | Fimula Blog","How Sales Core implements AI-powered lead scoring with recency, frequency, pipeline, and engagement signals. Architecture decisions and implementation details for GDPR-compliant CRM lead scoring.",[37,41],{"slug":38,"name":39,"color":40},"ai-llm","AI & LLM","emerald",{"slug":42,"name":43,"color":40},"b2b-saas","B2B SaaS",[45],"sales-core",{"slug":47,"category":5,"category_label":6,"category_color":7,"featured":8,"author":9,"hero_image":48,"reading_time":49,"title":50,"excerpt":51,"meta_title":52,"meta_desc":53,"tags":54,"products":58},"self-hosted-ai-customer-support-ollama","/storage/blog/heroes/self-hosted-ai-customer-support-ollama.jpg",8,"Self-Hosted AI Customer Support with Ollama: Zero Per-Token Costs, Full Data Privacy","How we built dual AI engine support into our customer support platform — OpenAI for maximum accuracy, Ollama for self-hosted free processing. Full GDPR compliance.","Self-Hosted AI Customer Support with Ollama | Fimula Blog","How to run AI ticket classification with zero per-token costs using Ollama. Dual engine support with OpenAI and GDPR-compliant self-hosted AI for EU companies.",[55,56,57],{"slug":38,"name":39,"color":40},{"slug":21,"name":22,"color":23},{"slug":18,"name":19,"color":7},[59],"ai-customer-support",{"slug":61,"category":5,"category_label":6,"category_color":7,"featured":8,"author":9,"hero_image":62,"reading_time":49,"title":63,"excerpt":64,"meta_title":65,"meta_desc":66,"tags":67,"products":70},"multi-llm-ai-sales-outreach-salesagent","/storage/blog/heroes/multi-llm-ai-sales-outreach-salesagent.jpg","Why Multi-LLM Architecture Beats Single-Provider Lock-in for AI Sales Outreach","SalesAgent uses 12+ LLM providers behind a unified API layer. Here is why we built a multi-LLM architecture for AI sales outreach instead of committing to a single provider.","Multi-LLM Architecture for AI Sales Outreach | Fimula Blog","Why SalesAgent uses 12+ LLM providers behind a unified API layer for AI sales outreach. Cost optimization, GDPR compliance with Ollama, and provider resilience.",[68,69],{"slug":38,"name":39,"color":40},{"slug":42,"name":43,"color":40},[71],"salesagent"]