
The Digital Trowel CaRE (CRF-Assisted Relationship Extraction) platform integrates cutting-edge technologies for the harvesting of data (structured and unstructured) from the Internet ("crawling"), the extraction of relevant information ("hybrid computational linguistics model") and the refining of information into actionable intelligence ("match and merge").
The unsurpassed flexibility of our platform enables us to achieve a new level of completion, accuracy and relevance in the intelligence that we deliver, and gives our customers a unique rapid-application-development capability.
Information ExtractionDigital Trowel's Hybrid Computational Linguistics Model |
|
![]() |
"What we've done here is to crystallize the most advanced mathematics into a dream pattern-generation machine." Our CaRE platform includes the industry's first implementation of the Hybrid Computational Linguistics Model for Information Extraction. Based on proprietary technologies developed by our co-founder, Professor Feldman, and DT's senior development team, the Hybrid Model is a disciplined integration of the Pattern Matching Computational Linguistics approach and Supervised Machine Learning (CRF - Conditional Random Fields) approach to textual analytics. By crystallizing these disparate approaches into a unified mathematical model, CaRE is able to utilize superior patterns. In addition, systematic, state-of-the-art discourse analysis techniques are used to resolve anaphora (indirect references). As a result, CaRE is able to achieve a much higher level of precision and efficiency in its recognition and extraction of relevant data. "At Digital Trowel, elegant and simple translates into faster and more accurate." The hybrid approach makes CaRE a much more flexible engine than one-dimensional platforms. Using simple syntax, we are able to define target relations with a reduced number of rules, slashing the effort required to create applications and solutions in new domains. This enables us to scale the development process, to enhance accuracy and to speed the delivery of needed business intelligence in a dynamic environment. |
Information HarvestingThe Digital Trowel "Crawlers" |
|
![]() |
"Google gives you millions of 'hits,' and you 'mine' them to find which 10 are relevant. Our crawlers are much more focused: we give you tens of 'hits,' and all are spot-on." Our hybrid approach extends to our information harvesting technologies. By using a variety of "crawlers" and integrating them with our prioritizing rules and targeting algorithms, we are able to bring a new efficiency to the data collection process, further enhancing our ability to pinpoint relevant data. One of our proprietary developments is a Visual Data Scraping Paradigm that makes it easier for CaRE to identify relevant information on corporate websites. This has been one of the secrets to our success in gathering voluminous deep-profile corporate data during the short period since we founded Digital Trowel. |
Information AnalysisRefinement, Match and Merge |
|
![]() |
"Wouldn't it be nice if Google could tell the difference between my brother and that New York restaurant reviewer with the same name...?" Once we have collected and extracted relevant data, Digital Trowel refines it through a number of processes that we refer to cumulatively as "Refinement, Match and Merge." |
Refinement Our refinement process includes nearly 100 steps designed to progressively clean and standardize the raw data. Underlying these steps are state-of-the-art techniques used to validate and standardize terms against standard and/or proprietary sources, enabling us to accurately identify the people who work in a company and to provide their names, addresses, titles, education, employment history and much more. |
|
Match & Merge Using a combination of proprietary similarity algorithms and matching rules, we are able to accurately group disparate data regarding the same entity and to eliminate duplication. As part of the process, we use vast "nickname" and "alias" tables that we have created to enable us to standardize biographical references, corporate data and other specific nomenclature. In addition, when possible, we validate the data against standard and/or proprietary sources. This painstaking process enables us to achieve the highest accuracy rate in the industry and to deliver uniquely complete data regarding target people, companies, etc. |
|
Scalability CaRE is built on scalable technologies and cloud-based resources that remove all practical limits to the amount of data that can be harvested, stored and analyzed. |
|
Time Stamping CaRE's ability to add a time factor to collected data enhances the accuracy and relevance of its intelligence. |
|