All posts by Gangadhara Boranna


Clustering Pentaho BA Server 5.0.x version


Gangadhara Boranna

Clustering means that 2 or more instances of Pentaho share a common repository. Pentaho 5.0.X now uses the Jackrabbit Content Repository (JCR) for the BA Repository. Pentaho reporting related content, for instance, about reports that you create, examples we provide, report scheduling data, and audit data in the BA Repository. The BA Repository resides on the database that you installed. The BA Repository consists of three repositories: Jackrabbit, Quartz, and Hibernate.
 
– Jackrabbit contains the solution repository, examples, security data, and content data from reports that you use Pentaho software to create.

– Quartz holds data that is related to scheduling reports and jobs.

– Hibernate holds data that is related to audit logging.
 
Pentaho Consultation
 
You can choose to host the BA Repository on the PostgreSQL, MySQL, or Oracle database (by default, Pentaho software is configured to use the PostgreSQL Database). As already mentioned above that each node must have a shared repository, please find the instructions below for initializing and configuring your solution repository,

Initializing: http://infocenter.pentaho.com/help/topic/install_pdi/task_prepare_rdbms_repository.html

Configuring: http://infocenter.pentaho.com/help/topic/install_pdi/task_configure_rdbms_repository.html
 
You will need to add a section of the code to the repository.xml file found in \biserver-ee\pentaho-solutions\system\jackrabbit directory to allow each node to have a shared journal. Please note that each node must have a Unique ID. This will be explained in detail below. Configuring Each Node to have a Shared Journal: Before we start configuring shared journal, we would need to delete the files mentioned in the below directories,
 
– delete the contents of tomcat\work and tomcat\temp directories.

– Navigate to biserver-ee\pentaho-solutions\system\jackrabbit\repository directory and remove all files and folders from the final repository folder.

– Navigate to biserver-ee\pentaho-solutions\system\jackrabbit\repository directory and remove all files and folders from the workspaces folder.
 
Now, in order to configure nodes for a shared journal, we would need to edit the repository.xml file found in \biserver-ee\pentaho-solutions\system\jackrabbit directory. Add the below section of the code at the end.

<!–

Run with a cluster journal

–>

<Cluster id=”Unique_ID “>

<Journal class=”org.apache.jackrabbit.core.journal.DatabaseJournal”>

<param name=”revision” value=”${rep.home}/revision.log”/>

<param name=”url” value=”jdbc:postgresql://HOSTNAME:PORT/jackrabbit”/>

<param name=”driver” value=”org.postgresql.Driver”/>

<param name=”user” value=”jcr_user”/>

<param name=”password” value=”password”/>

<param name=”databaseType” value=”postgresql”/>

<param name=”janitorEnabled” value=”true”/>

<param name=”janitorSleep” value=”86400″/>

<param name=”janitorFirstRunHourOfDay” value=”3″/>

</Journal>

</Cluster>
 
You would need to replace the JDBC connection strings(URL, USERNAME, PASSWORD, DATABASE TYPE etc.,) to match to your specific database. Now Jackrabbit journalling is configured. Quartz will also need to be configured to avoid duplicate schedules created on each node.
 
Configuring Quartz for Cluster :
 
Navigate to \bi-server\pentaho-solutions\system\quartz and edit the quartz.properties file using a text editor. You will need to make the following changes in order to configure Quartz for cluster,

1. Org.quartz.scheduler.instanceId = AUTO

You will need to set it as AUTo because you can add multiple instances. The default value which would be set is 1.

2. org.quartz.jobStore.isClustered = true

The default value would be false.

3. org.quartz.jobStore.clusterCheckinInterval = 20000

 
You would need to explicitly add this in quartz properties file.


Tags :

6 Key features which can be reasons to migrate to Pentaho 5.0


Gangadhara Boranna

We strongly encourage you to move to Pentaho’s latest environment that provides advanced Pentaho Migration features. Listed below are 6 top reasons why Pentaho 5.0 could be your ideal solution.

1. Integrated Administration in PUC & Merge of enterprise console into Pentaho user console

This aspect gives a completely fresh look to the user console. The Pentaho 5.0 version has improvised navigation through perspectives and home page with integrated administration. The user console also simplifies the permission control mechanism using ‘Manage Control’ interface.

2. New Scheduler Interface & Rest API interface for OEM integration

The new simplified scheduling on Pentaho emphasizes on the power and ease-of-use for end users. It gives the user, the ability to specify output location, full parameter support, extracting the reports optionally via Email and also provides the scheduled block times.

3. Manage your solution content on a more stable repository

The content management on the new Pentaho 5.0 has become easier with the addition of expanded file browsing, hide files and folders, move to trash, rename, cut, copy & paste file options. This helps to manage the solution content more efficiently.
 
Pentaho Migration Services
 
4. Extended Job monitoring in PDI

New 5.0 version helps in extended monitoring of sub jobs and transformations in the Carte- and DI- server via Operational –mart

5. New Instaview Templates

Pentaho 5.0 also comes with the new instaview use case templates for Hadoop and Splunk with expanded NoSQL integration

6. Better Big Data Integration

Pentaho 5.1 adds YARN support to support predictive analysis, transforms JSON for analysis by Big Data integration and analytics with enhanced NoSQL support from MongoDB.

The Pentaho Migration utility provided enables the easier transition from 4.8 to 5.0 and provides an automated way of moving data and settings from one system to another. This includes moving of Reports, Analyzer Files, Cubes, Metadata, ACL on each of the Solution FIles, User and Roles.


Tags :

Need customized performance tuning of Pentaho dashboards? Know How


Gangadhara Boranna

There are several Performance tuning methodologies in Pentaho. However, Pentaho Dashboard designs are a combination of individual components such as reporting and analyser.

Any Pentaho Dashboard performance tuning would need to impact all such components. Two such areas are

1) Size of Data to be transferred from server to client environment (To optimize network bandwidth)
2) Rendering of Static content such as images etc.

Pentaho Dashboard

By default Tomcat is not configured to use compression when serving internal content from its webapps. Compression reduces size of all known mime types on files commonly used by the BA Server.

For the first one, we can use Tomcat gzip compression on tomcat. Compression reduces the size of transferred Data to be transferred between the client and server environment. By default tomcat is not configured to use compression when serving internal content from its webapps. Compression reduces size of all known mime types on files commonly used by the BA Server.

For the second point, Static Content can be cached. These do not change as these are related to the plugin used. Enabling caching of static content or files to speed up performance of browser has to be done for each of the plugins deployed in the BA Server.  By default caching of static content is enabled to be done only for 1 hour, you can change these setting on each plugin independently to improve performance.


Tags :

Crosstabs De-mystified in Pentaho Reporting


Gangadhara Boranna

Crosstabs have been on the horizon for several years now. They lived a happy, undisturbed life along with the unicorns and gnomes guarding the pot of gold at the end of the rainbow. Crosstabs are an easy and relatively easy way to visualize tabular data along two or more axis. They are an indispensable part of a modern Pentaho Reporting engine.

Although crosstabs are still an experimental feature, and as such are not up to par with their counterparts in BO reports and the likes. They will not make it into 5.0.X as a stable feature, but even being experimental, they are already powerful enough to create some really useful reports.

How to create a Crosstab Report?

Step 1: In the report designer go to edit->Preferences. Here you need to enable the “Enable” (unsupported) experimental features

Pentaho Reporting

Step 2: Once clicked on ‘Apply’, you will be able to see the “crossTab” option in the left hand side features

Pentaho Reporting Designer

Step 3: Select, drag & drop the crosstab element into the reporting canvas. You can select the “Inline” or the “banded” option. Similar to Sub report.

Pentaho Reporting Services

Step 4: Now, create the data source for required for the crosstab report.

Pentaho Reporting

Step 5: Once done, select the required fields at the appropriate places.

Pentaho Reporting Process

Step 6: The fields will be populated on the reporting canvas. We can now apply the formatting on the reporting fields for a better view of the report.

Pentaho Reporting Designer

Sample report:

Pentaho Reporting Solution
 
Pentaho Reporting
 
Benefits of Cross-Tabs in Pentaho Reporting

Cross-tabs deliver data in a familiar spreadsheet format. They also summarize both vertically and horizontally, have a grid format, and can change size depending on the data.

  • Several of the most compelling reasons for using cross-tabs are
  • Making better use of space
  • Leveraging experience with the spreadsheet format
  • Horizontal expansion
  • Custom formatting

Because cross-tabs are grouped and summarized both vertically and horizontally, they are incredibly efficient at saving space as compared to a typical grouping report. They are very good at showing key information if the information required has at least two levels of grouping.


Tags :