How to do data integration, BRD example (part3)

Business Layer

This layer contains derived data needed by Presentation/Delivery layer. We can build components like associations, groupings or hierarchies defined by Business Rules, and also do data cleansing to fix issues found in our raw data.

Building new association: Similar Reviews

Let's say we're required to find similar reviews written on Work. This could be useful for:

  • Identify duplication issues
  • Identify users duplicating reviews within or across sites
  • Identify spam where reviews are written to bias opinion
  • Find plagiarism among reviewers

How do we do that? Data processing on unstructured text is efficiently done using NoSQL ...

more ...

How to do data integration, BRD example (part2)

Physical Data model

This post presents the physical data model. Compare to logical model, it contains a lot more tables. Relational databases are less flexible than schema-less NoSQL environments and highly normalized model is one technique used to mitigate rigidity through extension. We accommodate changes by adding new structures as we discover new attributes and relationship relevant to our evolving needs. Interested reader can check methods like Data Vault or Anchor Modeling.

To explain some details of physical data model, we'll look at the code. Although SQL is not well suited for self-documented code, most DB engines support explicit ...

more ...

How to do data integration, BRD example

Data Integration: one of the main BI functions

BI environment architecture is often left as an after-thought. Business is pressuring technical teams for delivery, so they quickly jump into designing star schema or dimensional models (the Presentation Layer), and neglect the Integration Layer. End result: no separation of concerns will exist between the integration AND presentation aspects.

Integration and Presentation are critical functions that must be decoupled into separate layers (at least logically) reflecting their independent goals and specifications. Integration is concerned with capturing raw and untransformed data originating from sources, while Presentation applies transformation and business rules to derive ...

more ...