Grooming Your Data!

Realising the intrinsic value of Data often include some Data validation, Data enhancement, Data normalisation, Data standardisation and over all Data grooming.

Pamper your Data

Data Validation

Valid Domain Name Extraction

Take any raw text input that includes URLs and/or e-mail addresses and return unique domain names with valid DNS lookup.

Validate NCM Codes

Takes any 8 digits and validate against a updated version of the Mercosur Common Nomenclature database to verify if the NCM Code provided is valid or not.

Data Enhancement

International Telephone Number Detection

Uses input digits, formatting combined with hints about country like IP address and timezone in order to return an ITU-T E.123 formatted international telephone number.

Rebuild Missing Data

Adds missing data to the database based on cues obtained from other data. State and city can be obtained using the ZIP code, websites can be extrated based on e-mail addresses, and phone codes can be obtained based on the city.

Data Normalisation and Standardisation

Canonical URL Identification

The service aim to find the Canonical URL from the supplied URL. This is done by extraction of tracking parameters, session information and tracking domains from any web URL.

Hostname Classifier

Takes any hostname and identify user registrable path based on business logics. The goal is to understand that the content of for example support.squarespace.com and www.squarespace.com are controlled by the same owner while the content of matthew-painter.squarespace.com has no relation to other squarespace.com domains.

Date Identification

Takes any time and date string and return an ISO 8601 compatible date specification. Currently supported input languages are English, Portuguese and Norwegian.

***

Are you looking for pre-groomed Data?
Check our Data Sets.