Friday, March 23, 2018

Apache NiFi - Say goodbye to data flow hassles with NiFi

After a long time, I thought of writing a new blog post series regarding a beautiful platform I came across. (https://nifi.apache.org/)

For those who already played with NiFi do know what it is capable of. So this first post is targeting the people who work in the software industry but doesn't have a clue about NiFi.

Read these tweets before get started :).


Interesting right? Well simply if you are a software developer and if you are not aware of NiFi, then you are probably doing too much of work.

How did I get to know about this?

The first project I got with my new job was a Big data project. We had to accept a huge amount of data from different sources and move them into different places and process those using programs running on Hadoop. Not only that the data delivery guarantee should be there. Once the customer gives us the data, no data can be lost.

So simply we had following requirements to be implemented in our system.
1. Accept data from various sources
2. Write the original content of the data to hdfs
3. Write some of the selected content to Hbase
4. Write some other content to MongoDb
5. Those data should be processed using scheduled jobs
6. Any data must not be lost.

So we tried to implement simple java programs(Modules) to accept data from sources and route those data in given paths according to the requirement. So we realized that if we did that in that manner that would take months. So my project architect came up with this nice little solution called Apache NiFi.
So we were able to complete whole data ingestion flow within a couple of days. All we had to do left was to create some jobs to process those data( Later we realized that even processing the data can be done by using Nifi).

So months of workes narrowed down to days. What we did was just dragging and dropping some icons and configuring some parameters. The full data flow was completed and the data delivery was also guaranteed.

I'll write more about Apache nifi in future. So meanwhile anybody who wants to get rid of data ingestion hassles try nifi!

https://nifi.apache.org/



Wednesday, August 12, 2015

Google Compute Engine Load Balancer support for Apache Stratos - GSOC 2015 - Post 2

In previous post I described about the load balancers and Apache Stratos a little bit. So this post would help those who need to run Google compute engine load balancer in order to access their application.

In order to do this, you should deploy stratos in Google Compute Engine first. For that stratos wiki might help. After installing the stratos in Google Compute Engine you can deploy applications there. Then you need to configure a load balancer in order to access the application.


Google compute engine load balancer has been integrated to stratos as an extension. So it runs as a stand alone application. So we can run it in a seperate node or in the same node which the message broker runs.


In order to get the latest distribution zip file we need to download the stratos latest source and build it. For more details refer to stratos wiki. At the moment the extension code has not been merged to the master branch. So you can clone my fork of the Stratos git repository.


https://github.com/asankasanjaya/stratos


After building you can find the distribution zip file(org.apache.stratos.gce.extension-4.1.0.zip) in this directory.


/extensions/load-balancer/gce-extension/target/


Copy it and extract it to a proper directory. Now we have to do some configurations.


The extension should have the authority to access to Google compute engine behalf of you. For that we use OAuth 2.0 authentication method provided by GCE.


Login to your GCE account and do following.


a) Navigate to credentials section. Click on Create a new Client ID button if you don’t have a service account client ID yet.




b)  Select service account from pop up window and click on create Client ID. It will download a new json key to your computer. We don’t need that json key for this extension. Instead we need a P12 key.




c) Click on Generate new P12 key button. It will download a new P12 key to your computer. Store this key in a secure place. If you are using a node in GCE to run the extension, you need to upload this key to that node and store in a proper place.




d) Note the Email address provided for the client ID you created. You need to include this Email address later in gce-configuration.xml file.



e) Navigate to the extracted extension folder. Inside the conf folder open jndi.properties file using a text editor and update message broker IP and port information.

f) Configure google compute engine related properties. Specify the project name, project ID, region name(You should deploy your applications in this region), path to downloaded key file in step 3.4.c , gce account Id(This is the email address relevent to your client id.) and network name.


g) Update healthcheck related properties. Health checks are used to monitor the healthiness of nodes. if the health check fails to connect to the node at a given threshold we configure here, the node will be marked as unhealthy. You can keep these settings as defaults if you want. For more about health checks and related properties in google compute engine you can read here.
https://cloud.google.com/compute/docs/load-balancing/health-checks


h) Configure operation completion timeout. Each GCE API call which is done by extension, will be monitored until it get completes.  Within a defined time period, if the operation is not completed, further monitoring will be canceled and moved to the next API call.  We can configure that timeout in milliseconds.


i)  Configure the name prefix. Name prefix is used when new objects such as forwarding rules, target pools and health checks are being created in GCE. The prefix will be added before each object name. By using this prefix it will be easy to identify the objects created by stratos. First character must be a lowercase letter.




Now we can run the extension. Navigate to the bin folder and run gce-extension.sh script as root user.
sudo ./gce-extension.sh


After receiving the complete topology message to the extension it will create the relevant target pools, health checks and forwarding rules. So that you will be able to access the application as follows.

relevant-forwarding-rule-ip:port

This is the port you define at the application deployment stage. If you have defined several ports, you can use any port from them.

In next post I'm going to discuss about how I created this extension and the algorithm I used. That post may helpful for any one who is willing to create an extension for apache stratos load balancing. 

Monday, August 10, 2015

How to process only a part of the video using FFMPEG

A famous tool to process videos is FFMPEG. I'm using FFMPEG for my final year project in order to process videos. We can specify the input video by using -i option. That will take the video and process whole video. Today I had to process only a part of the video by using ffmpeg. So this is how I did that.

fmpeg -ss 50 -t 10 -i input.mp4 [rest of the code] [output file]

The task was done by -ss and -t options. This will process only from 50S to 60S of the video,

-ss is used to seeks in this input file to position. This should be called before -i option. -t option is used to  limit the duration of data read from the input file. This also should be called before -i option.

We can use hh:mm:ss format also instead of using seconds directly.

Wednesday, July 29, 2015

Google Compute Engine Load Balancer support for Apache Stratos - GSOC 2015 - Post 1

In this year I was able to participate in Google summer of code program and that is one of the most valuable opportunity I ever had. My project is to create an extension for apache stratos in order to use Google compute engine load balancer for accessing applications deployed in stratos. With the experiences during the Google summer of code I thought of sharing few blog posts about how to use the Google Compute Engine Load balancer with Apache stratos.


It is important to get an idea about what apache stratos is all about. Apache stratos is a  platform as a service. When someone needs to deploy an application in cloud, the easiest way is to use a platform as a service like apache stratos since it manages everything related to IaaS side.


Basically we use IaaSes like AmazonEC2Google Compute Engine (GCE), CloudstackOpenstack etc in order to create virtual machines and manage networks. In addition to that some IaaSes like GCE provide the Load balancing facility too.


In cloud computing, Load balancing plays a major role. Since we use cluster of computers (VMs) to deploy the same application, it is very important to distribute the traffic among those VMs. There are lots of algorithms available in order to do that. Read more here in Azeez's blog.


Apache stratos is consists of its own load balancer. User is always allowed to decide whether to use stratos load balancer or not when deploying an application. If a particular user requires another load balancer instead of stratos load balancer, he may have lot of options like HAProxyngixGoogle compute engine,  AWS. User can use any of those load balancers as a separate extension to apache stratos. It enables the user to run the extension in a separate VM. Currently, HAProxy and ngix extensions development is completed and ready to use. Google compute engine Load Balancer extension and EC2 extension is under development. My project is to create GCE extension as GSOC project this year. (Related jira issue is here). Most of the functionalities are completed by now. The AWS extension is also being implemented as a separate GSOC project parallel by Swapnil Patil (Related Jira issue is here).  In near future those two extensions will be available for use.


The important thing is stratos provide a rich load balancing API. It enables anyone to add any load balancer as an extension through that API by following the same way as existing extensions developed. By implementing a few methods we can get the extension up and running.


In the next post I'm going to discuss about how to use the GCE load balancer extension with apache stratos.

Saturday, December 27, 2014

Image processing Practicals - Part 3 (Capture a video from a file and analysing frames)

This post is about video processing basics using OpenCV. If you still haven't configured OpenCV in your computer, follow the first post of this post series.
Basically, a video is a series of images/frames. So most of the time, first we have to break the video in to series of frames in order to manipulate the video. We don't need to save all the frames in hard disk and take one by one. Instead of doing like that, OpenCV provides a way to capture the video frames real time. The following example is an example scenario.

Breaking a video into frames using OpenCV

This program is working as a simple video player.
There are two ways to capture the video in OpenCV. Capture from a file and capture from a camera. Following code captures a video from a file.
If you need to capture from a camera you need to change only the parameter given in line 7.
In this case line 7 will be changed to:

 VideoCapture capture(0);  

The parameter '0' indicates that we are using the default camera for capturing.

Run this code:

1:  #include <opencv2/core/core.hpp>  
2:  #include <opencv2/highgui/highgui.hpp>  
3:  #include <iostream>  
4:  using namespace std;  
5:  using namespace cv;  
6:  int main() {  
7:       VideoCapture capture("movie.mp4");  
8:       Mat frame;  
9:       if (!capture.isOpened())  
10:            throw "Error when reading file";  
11:       namedWindow("window", 1);  
12:       for (;;) {  
13:            capture >> frame;  
14:            if (frame.empty())  
15:                 break;  
16:            imshow("window", frame);  
17:            waitKey(20);  
18:       }  
19:       waitKey(0);  
20:  }  

Using a loop we can take frame by frame and display it in the window. So that makes a video player. By changing the parameter value for waitKey function (line 17),  we can change the speed of the video. So we can use that captured frame for image processing tasks.






Wednesday, December 10, 2014

Image processing Practicals - Part 2 (Getting pixel information of an image using opencv)


This is the second post in this post series. Before read this, please refer to my first post  if you haven't configured opencv in your pc .

Before do any image processing task it is important to identify how an image has been constructed. Basically a raster image is a collection of pixels(A matrix of pixels). One pixel has three channels. Those are called as Red,Green and Blue. By using opencv we can see how an image has been constructed. Normally each channel is represented by 8 bits. So all together its 24 bits(3 Bytes).

Let's run this code and see the output in the console. Program 1 uses a for loop to go through pixels. But if you only need to see pixels(Not to analyse) just use cout function for display the pixel information(See program 2).

Program 1


1:  #include <opencv2/core/core.hpp>  
2:  #include <opencv2/highgui/highgui.hpp>  
3:  #include <iostream>  
4:  using namespace std;  
5:  using namespace cv;  
6:  int main() {  
7:   Mat image;  
8:   //Reading the image  
9:   image = imread("lena.jpg", 1);  
10:   //If image not found  
11:   if (!image.data) {  
12:   cout << "No image data \n";  
13:   return -1;  
14:   }  
15:   //Display image  
16:   namedWindow("Display Image");  
17:   imshow("Display Image", image);  
18:   //Print pixel values  
19:   for (int i = 0; i < image.rows; i++) { //take row by row  
20:   for (int j = 0; j < image.cols; j++) { //take each and every pixel in current row  
21:    Vec3b pixel = image.at<Vec3b>(i, j);  
22:    cout << pixel;  
23:   }  
24:   cout << endl;  
25:   }  
26:   waitKey(0);  
27:   return 0;  
28:  }  



Sample output: 


So here we have B,G,R values printed.

Ex; [65,32,93] ----> Blue value = 65, Green value = 32, Red value = 93 of first pixel


Program 2

1:  #include <opencv2/core/core.hpp>  
2:  #include <opencv2/highgui/highgui.hpp>  
3:  #include <iostream>  
4:  using namespace std;  
5:  using namespace cv;  
6:  int main() {  
7:       Mat image;  
8:       //Reading the image  
9:       image = imread("lena.jpg", 1);  
10:       //If image not found  
11:       if (!image.data) {  
12:            cout << "No image data \n";  
13:            return -1;  
14:       }  
15:       //Display image  
16:       namedWindow("Display Image");  
17:       imshow("Display Image", image);  
18:       cout<<image;  
19:       waitKey(0);  
20:       return 0;  
21:  }  

This program is just using a cout function, without using a for loop to go through pixels.

Sample output:




Compare the first program output and second program output. We can see that the second program does not have separated three channels(BGR). So in the second program, first pixel is correspond to first three values.



Sunday, December 7, 2014

Image processing Practicals - Part 1 (Configure OpenCV 2.4.9 and C++ with Eclipse IDE using CMake and MinGW)

Today onwards I'm going to write a new post series about image processing practicals. Let's start from the basics.

First we have to select a proper tool/language to work with image processing. There are number of languages and tools available for image processing. Each one have unique advantages and disadvantages. Please refer to this for more details. I prefer using  OpenCV computer vision library  with C++.

If you are using linux you can follow this tutorial to configure OpenCV with Eclipse IDE.

If you are using windows, configuring Opencv might be a challenging task some times.So follow these steps.

1. Download OpenCV and extract it. In this tutorial I use OpenCV-2.4.9. (But this tutorial may valid for  other recent  versions of OpenCV as well.)(http://opencv.org/downloads.html)


2. Download and Install MinGW. You can use the installer provided by MinGW. Add MinGW bin folder to the system path. Ex: C:\MinGW\bin.
(Its always a good practice to restart the computer after editing the system path) .(http://sourceforge.net/projects/mingw/files/)

3. There are pre-build OpenCV binaries available in the pack that you have downloaded in step 1. But some times those are not going to work properly with MinGW. So we are going to create our own OpenCV binaries. For that you need to download and install CMake  and add CMake bin folder to the system path.

4. Open CMake GUI and set source path to opencv/sources folder.

5. Create a new folder named "myBuild" inside opencv folder and set build path to that folder.




6. Click on configure button and from the drop down menu select MinGW Makefiles and click finish.


7. At this time you can see the progress bar is moving. When it's completed you will see a check box area with a red background. Here you can select what to build. In my case I didn't change any of the check box values. Press configure button again.


8. Then above mentioned red background will be changed to white. Then click generate button.


9. Download Eclipse IDE  for C/C++ developers and install it.(http://www.eclipse.org/downloads/packages/eclipse-ide-cc-developers/heliossr2)



10. Open Eclipse IDE and go create a new C/C++ project. In Toolchains section select MinGW GCC.


11.In order to verify that your MinGW environment works properly, Build hello world C++ code and make sure it builds properly.

12. Now we are going to build the contents in myBuild folder. Open a command line window and go to myBuild  folder. Type the command "mingw32-make" without quotation marks and press enter. This will take some time. After this process completion set the system path to myBuild/bin folder. So now you have generated your own OpenCV  binaries.

13. Now we are going to create a test project and make sure our binaries are working properly. In Eclipse IDE, Go to project --> properties --> C/C++ build --> settings.

14. In Tool Settings section under the GCC  C++ Compiler branch select the includes folder. Add the opencv include directory to Include paths (-I). Ex: "F:\OpenCV\build\include\"



15. Go to MinGW C++ Linker section and add myBuild\lib folder path to the Library search path (-L) section under Libraries folder. Ex: F:\opencv\myBuild\lib

16. Go to MinGW C++ Linker section and add followings  to the Libraries (-l) section under Libraries folder(One line at a time). Note: Each and every line is ending with a number which corresponds to the OpenCV version that your are using. So you may have to replace it corresponding to your OpenCV version. You can check it by going to the lib folder used in 15th step. In this tutorial we are using OpenCV 2.4.9. So the corresponding number is 249.

 opencv_calib3d249  
 opencv_contrib249  
 opencv_core249  
 opencv_features2d249  
 opencv_flann249  
 opencv_gpu249  
 opencv_highgui249  
 opencv_imgproc249  
 opencv_legacy249  
 opencv_ml249  
 opencv_objdetect249  
 opencv_ts249  
 opencv_video249  


Note: All these are not necessary to run a basic OpenCV program. In most cases you will only need opencv_core249 and opencv_highgui249.



17. Create a folder named src and create a C++ source file in it( In my case its opencv.cpp).

18. Add this code to the C++ file.

1:  #include <opencv2/core/core.hpp>  
2:  #include <opencv2/highgui/highgui.hpp>  
3:  #include <iostream>  
4:  using namespace std;  
5:  using namespace cv;  
6:  int main() {  
7:   Mat image;  
8:   image = imread("lena.jpg", CV_LOAD_IMAGE_COLOR);  
9:   if (!image.data) {  
10:   cout << "No image data \n";  
11:   return -1;  
12:   }  
13:   namedWindow("Display Image", CV_WINDOW_AUTOSIZE);  
14:   imshow("Display Image", image);  
15:   waitKey(0);  
16:   return 0;  
17:  }  



19. Download the test image(lena.jpg.) and place it in the project folder. (In the root of the project - See my project structure in 17th step).

20. Build and run the C++ code. If you get following output, congratulations!!!.