Our Blog


Pentaho Reporting Export Options


Jagdeesh SS

This is a part of a series of blogs that provide quick tips and tricks to enable better usage of Pentaho. We start with this piece on improvising your Pentaho reporting export options. Have a look at the below GrayMatter‘s Pentaho screenshot.

Pentaho Reporting

This is a snapshot of a PRPTI report exported to excel. As you can see the date and timestamp reserve one entire row before the column headings come in. Certain consumers of the report may not want this display. This is because they want to economize real estate as well as to keep a focused span of attention on the data. Now have a look at the screenshot below.

Pentaho Reporting

As you can see the page count consumes an entire row of display, which is again inefficient usage of real estate. What if you do not want to have the date, timestamp and page count displayed?

Pentaho Reporting


Tags :

PDI best practices – Why avoid insert/update step


Sandeep Kemparaju

We are looking at a transformation designed to use insert/update step to perform data load on the target table. Pentaho Kettle step Insert/Update works as follows

Let us take an example of loading a target table. Assume that there is a daily load of 100k records into a target table with 10 million records and every incoming row from the source table looks up against all the 10 million records in the target table. This process continues for all the 100k input records.
 
Pentaho Kettle
 
Insert/Update step involves multiple round trips to the database depending on the commit size.

Performance of most steps depends heavily upon the number of round trips and speed of round trip. Speed depends on a combination of speed of network, the latency on the network and the performance of database.

Find below the wiki link to explain the Insert/Update step in Pentaho Kettle

http://wiki.pentaho.com/display/EAI/Insert+-+Update

As there is an extra lookup process involved in this step, it definitely slows down the process as it needs to run through the entire lot of records. The look up is to check for a matching record (to update) and if none matches (to perform insert).

This step is also slower than the regular “Table Output step”.

There is very little one can do on the network latency side. Reduction of number of round trips to the database is the first thing that you should consider. This can be accomplished by having a mechanism to load lookup data in memory (cache)

In Pentaho Kettle, most Lookup steps have options to cache data. Steps such as “Stream Lookup” & “Merge Join” are some of them.

In the above scenario, design the transformation with “Merge join” step to perform operations within a single statement. By doing this, you will minimize the number of times data in the source and target tables are processed. This helps in less memory consumption and good performance. You need to ensure that the input data is sorted to perform the “Merge join” step.

Find below the wiki link to explain the Merge Join step.

http://wiki.pentaho.com/display/EAI/Merge+Join


Tags :

Crosstabs De-mystified in Pentaho Reporting


Gangadhara Boranna

Cross-tabs have been on the horizon for several years now. They lived a happy, undisturbed life along with the unicorns and gnomes guarding the pot of gold at the end of the rainbow. In Pentaho BI Implementation, Cross-tabs are an easy and relatively easy way to visualize tabular data along two or more axis. They are an indispensable part of a modern Pentaho Reporting engine.

Although cross-tabs are still an experimental feature, and as such are not up to par with their counterparts in BO reports and the likes. They will not make it into 5.0.X as a stable feature, but even being experimental, they are already powerful enough to create some really useful reports.

How to create a Cross-tab Report?

Step 1: In the report designer go to edit->Preferences. Here you need to enable the “Enable” (unsupported) experimental features

Pentaho BI Implementation and Reporting

Step 2: Once clicked on ‘Apply’, you will be able to see the “crossTab” option in the left hand side features

Pentaho BI Implementation

Step 3: Select, drag & drop the crosstab element into the reporting canvas. You can select the “Inline” or the “banded” option. Similar to Sub report.

Pentaho Reporting Services

Step 4: Now, create the data source for required for the crosstab report.

Pentaho Reporting

Step 5: Once done, select the required fields at the appropriate places.

Pentaho Reporting Process

Step 6: The fields will be populated on the reporting canvas. We can now apply the formatting on the reporting fields for a better view of the report.

Pentaho Reporting Designer

Sample report:

Pentaho Reporting Solution
 
Pentaho Reporting
 
Benefits of Cross-Tabs in Pentaho BI Implementation and Reporting

Cross-tabs deliver data in a familiar spreadsheet format. They also summarize both vertically and horizontally, have a grid format, and can change size depending on the data.

  • Several of the most compelling reasons for using cross-tabs are
  • Making better use of space
  • Leveraging experience with the spreadsheet format
  • Horizontal expansion
  • Custom formatting

Because in Pentaho BI Implementation cross-tabs are grouped and summarized both vertically and horizontally, they are incredibly efficient at saving space as compared to a typical grouping report. They are very good at showing key information if the information required has at least two levels of grouping.


Tags :

Pentaho Launches Community Edition 5.0


Suresh Narayanappa

Pentaho Corporation recently announced the immediate availability of its Open Source Pentaho Community Edition 5.0 (Pentaho CDE). It is the latest version of business analytics and open source data integration platform. The launch event also included the Pentaho Marketplace, where members of the community can download and explore all available plug ins developed by the Pentaho Community. It has extended the capabilities of the open source platform and permits community members to work together, share feedback and submit or create new plug ins to broaden Pentaho functionality.

According to the company, the new edition offers an economical entry point to people or companies during their first brush with business analytics when they want to visualize and act upon data. It is also an excellent tool-set for practiced developers, users and consultants who prefer an open code base to extend the borders of Pentaho and also business analytics.
 
Pentaho Community Edition
 
New features in Pentaho Community Edition 5.0

Pentaho, in collaboration with its community of developers, has made a powerful tools suite that offers an open source option for data analysts and developers to meet their goals. The latest edition includes:

    • Business Analytics Platform: The modern, interactive and simplified approach of Pentaho helps business users to discover, access and blend all sizes and types of data. Users can take advantage of a wide range of advanced analytics, that range from simple reports to predictive modeling, and can analyze and also visualize data through multiple dimensions, while at the same time with minimum IT dependence.

 

    • Data Integration: Pentaho Data Integration, better known as Pentaho Kettle, delivers powerful transformation, loading and extraction capabilities. This is a stand-alone application and is utilized to visually design jobs, and aid easier reporting and analysis.

 

    • Report Designer: Pentaho Report Designer is a graphic design tool which has the capability to generate reports from the data streamed through Pentaho Data Integration engine with no requirement for any kind of intermediate staging tables. Output reports can be in PDF, HTML, XML, CSV, Excel and rich-text file.

 

  • Auxiliary Tools: Users can download various types of auxiliary tools, like Pentaho Aggregation Designer for a simple interface to first create and then deploy aggregate tables that improve the performance of the Pentaho OLAP Cubes. Mondrian Schema Workbench is the open source designer for the visual creating and testing of the Mondrian OLAP cube schemas. The Pentaho Metadata Editor offers a simplified tool that you can use to create reports, build domains of Pentaho Metadata or the relational data models.

Tags :