“If I have seen a little further it is by standing on the shoulders of Giants.”
A modern data platform needs to be flexible to handle various workloads corresponding to the diverse business objectives, be extensible to support existing and emerging data technologies, and to some extent be future-proof.
Our BigStream framework – Serendio’s implementation of the Lambda Architecture, was designed to accelerate your big data development.
Key Features:
DisKoveror is a Text Analytics framework developed by Serendio. Built on top of other open source packages, DisKoveror provides a flexible and extensible way to extract Entities, Topics, Categories, Sentiments, and Keywords from unstructured text.
The key advantage of DisKoveror over the numerous open source options is it provides access to the best-of-breed components through a plug and play approach and a unified programming interface.
DisKoveror has also improved the output quality, in some cases, through Training sets, domain specific ontology, and folksonomy.
DisKoveror has been used to mine brand sentiments from social media, understand customer satisfaction from emails, extract topics from Tweets, compute social influence score, computer-assisted metadata and taxonomy creation, and much more.
DisKoveror Highlights
DisKoveror can be accessed through Java APIs or a RESTful interface Download
BigSim is designed to provide flexibility and control in generating large data sets through templates and minimal coding. Users just need to provide the data specifications in an XML template defining the semantic type, range, volume, velocity, and shape. These simulated data sets could be used for capacity planning, what-if scenario testing, extrapolate small data sets with certain amount of randomness so as to simulate real-world data sets, fill in missing data in incomplete data sets and such.
Data pre-processing is an important step in the data mining process. Data-gathering methods are often loosely controlled, resulting in out-of-range values, impossible data combinations (e.g., Sex: Male, Pregnant: Yes), and missing values.
Analyzing data that has not been carefully screened for such problems can produce misleading results. Thus, the representation and quality of data is first and foremost before running an analysis.
Highlights of PreMod, our Data Pre-Processing Package:
All the above functions are available in R and Python. Download