How to use MongoDB with Node.js - MAFIADOC.COM

0 downloads 0 Views 3MB Size Report
4. Postgre Sql etc …. NoSQL Family of Database Servers. 1. MongoDB. 2. ... variety of data types which gives the designer the freedom from fixed schema ...
Contents BIGDATA Management Using

By Niladri Bihari Mohanty [email protected] [email protected]

Page No. 1

Data is a very vital parameter for many research programs in the field of ICT and Business Intelligence, Non Linear Forecasting as well as Data Mining are notably very hot areas of research till date. Storage of Tera bytes of data and processing of that is always the major concern of researchers and for which a new area of research is keeping pace giving birth to the term BIG DATA. In this article a well blended flavor of NoSQL, BIGDATA, MongoDb with ASP.net can be found.

Development of light-weight, highly scalable web apps

By D.Madan Prabhu, [email protected] A.K.Hota, [email protected]

9

As more and more number of people started using internet, the servers are heavily flooded with their requests and busy with their responses. To advert the situation, the server are highly equipped with the higher end hardware like maximum number of core/processor, memory, enhanced I/O devices & etc., But, we are very curious to know, whether these arrangements are quite enough to handle the increasing active real-time communication even with the traditional web development platforms like ASP.Net, Java, Python, Ruby & etc. NodeJS is the new platform developed to address all these issues summarised above with less and concise coding for light-weight, highly scalable, I/O non-blocking web application development.

PERCEPTRON LEARNING The ‘ABC’ of Machine Learning

By Niladri Bihari Mohanty [email protected] [email protected]

20

In this article we will learn how a Neural Network works as well as the extension of it in the form of artificial Neural Network. The objective of Artificial neural Network, Perceptron learning will be in this issue where as code implementation in various platform (.net, Aforge.net, Python) will be discussed in next issue.

News & Events      

India has third highest infection rate of Zero Access botnet HP unveils cyber security solutions for enterprises in India Now a cyber security device to protect women from violence Automatic Speaker Tracking in Audio Recordings Engineers Invent Programming Language to Build Synthetic DNA Engineers Invent Programming Language to Build Synthetic DNA

Big Data Management

November, 2013

BIGDATA Management Using

By Niladri Bihari Mohanty [email protected] [email protected]

Data is a very vital parameter for many research programs in the field of ICT and Business Intelligence, Non Linear Forecasting as well as Data Mining are notably very hot areas of research till date. Storage of Tera bytes of data and processing of that is always the major concern of researchers and for which a new area of research is keeping pace giving birth to the term BIG DATA. In this article a well blended flavor of NoSQL, BIGDATA, MongoDb with ASP.net can be found. Codds in 1970 given birth to RDBMS and many databases have consequently come to the world following this revolutionary concept. The RDBMS is basically concern for ACID property and the relational nature of data makes it heavy in terms of performance for BIGDATA. NoSql with contrary to RDBMS come to rescue from this hell of BIGDATA and provides a platform which is very intentionally designed to store & access BIGDATA with very light weight nature. This is not at all a replacement of RDBMS but only a alternative to handle BIGDATA in more light weight and high performance way. No-solution-Fits All-Case RDBMS: Suitable for: 1. Medium size database 2. High relation is maintained 3. ACID property is very important 4. Fixed Schema (All row in single table has fixed field name and data type) 5. Huge and complex query system involved NoSql: Suitable for: 1. Huge Size database (BIGDATA) 2. Low relation is maintained 3. ACID is less important than the faster performance & redundancy is not a concern. 4. Schema Less (Every row in single table can have variety of field name & data type from each other) 5. Efficiency in Data storage and retrieval is very important RDMS Family of Database Servers 1. 2. 3. 4.

Sql Server Oracle MySql Postgre Sql etc ….

NoSQL Family of Database Servers 1. 2. 3. 4.

MongoDB Cassendra CouchDB RavenDB etc ….

What we will learn in this article?? 1. Briefing fundamental of one of the most popular NoSQL database i.e MongoDB. 2. Understand the revolutionary features of NoSQL. 3. Performing CRUD operation in MongoDB from an ASP.net web application with codes. To Bullet points: Why NoSql is faster than RDBMS for BIGDATA management?? 1. Relational nature of the RDBMS makes the data retrieval process is slow although it provides the solution to data redundancy, so for the faster performance NoSQL uses no relation in the data model although the facility is available but it is not promoted as using of this will loss the motive of NoSQl i.e fast performance. Page No.3

Big Data Management

November, 2013

Organization Database- In RDBMS System Table_Group

Table_Employee

Group_Code: Head_Name: Team_Size: Reporting_Auth: Tech_area:

Emp_Code: Proj_Code: Proj_Name: Start_Date: Client_Name:

Emp_Code: Name: Designation: Group_Code:

Sample Data:

Table_Employee

Table_Project

Emp_Code: 5962 Name: Niladri B. Mohanty Designation: Scientist-B Group_Code: 01

Emp_Code: 5962 Proj_Code: 113 Proj_Name: “Agrisnet” Start_Date: 02/02/2009 Client_Name: Agriculture Dept =================== Emp_Code: 5962 Proj_Code: 114 Proj_Name: “Online Assembly” Start_Date: 02/02/2009 Client_Name: Odisha Assembly

Table_Group Group_Code: 01 Head Name: Sarita Sahoo Team Size: 5 Reporting_Aut: SIO Tech area: Asp.net

Organization Database- In NoSQL System Table_Employee Emp_Code: Name: Designation: Group: Group_Code: Head_Name: Team_Size: Reporting_Auth: Tech_area: Project: Proj_Code: Proj_Name: Start_Date: Client_Name:

Table_Project

Can be an array of records inside the single parent record.

Sample Data: Emp_Code: 5962 Name: Niladri B. Mohanty Designation: Scientist-B Group: Group_Code: 01 Head Name: Sarita Sahoo Team Size: 5 Reporting_Aut: SIO Tech area: Asp.net Project: Proj_Code: 113 Proj_Name: “Agrisnet” Start_Date: 02/02/2009 Client_Name: Agriculture Dept ====================== Proj_Code: 114 Proj_Name: “Online Assembly” Start_Date: 02/02/2009 Client_Name: Odisha Assembly

Page No.4

Big Data Management

November, 2013

2. Fixed Schema in the RDBMS forces the designer to contemplate database pattern in more fragmented manner resulting low performance. But NoSQL has no fixed schema for any collection and every row/document in a single Collection can have different data structure. To understand more deeply, example is the best way- so let jump in to it. If we want to represent an employee database in RDBMS with two categories of employees as regular employee and outsourced employee then we may need to think hard because both employee categories may have different type of information. We have two options to represent this in RDBMS concept. a. Use two tables with two different schemas to store regular and outsourced employee’s detail. Name

Emp_Code

Group_code

Designation

Niladri

5962

01

Scientist-B

Name

Outsourced_agency

Project_code

Debasis

HCL

0111AC2

b. Keep extra column and fill with Null which are not required for a specific category of employees. Name

Emp_Code Group_code Designation Outsourced_agency Project_code

Niladri

5962

01

Scientist-B

NULL

NULL

Debasis

NULL

NULL

NULL

HCL

0111AC2

If we want to represent the same employee database in NoSQL format then here in a single Collection (In NoSQL there is no concept of Table and similar term Collection is used instead to describe a group of records.) different documents (similar to the term Record in RDBMS) can store different number of data fields as well as variety of data types which gives the designer the freedom from fixed schema concept. Document 1: Collection (Table in RDBMS) Name: Employee Name Emp_Code Group_code Designation Niladri 5962 01 Scientist-B Document 2: Collection (Table in RDBMS) Name: Employee Name Outsourced_agency Debasis HCL

Project_code 0111AC2

MongoDB: It is one of the very popular open source NoSQL database in the industry and due its good support base available it is used across all the domains (PHP, .Net, Java and many other platform developers uses it). MongoDB provides all the flavors of an ideal NoSql database with a huge community support base. This is the reason that, It this article we are going to represent the features of NoSql using the MongoDB as database and ASP.Net as front end programming platform. [But still every other option is available as each of them has some features which make them unique in the nature.]

Installation MongoDB: Step1: Download MongoDB in zip format from http://www.mongodb.org/downloads Step2: Unzip the file and copy all the content of the bin folder of the unzipped file to C:\\mongodb [Can be any name] folder. Step3: Create a folder named “data” in the C:\\ drive and then a sub folder “db” inside the “data” folder. This is the default search path for the mongodb to store data but this can be reset to any other path using “--dbpath” attribute.

Page No.5

Big Data Management

November, 2013

Step4: Go to windows command prompt and go to the C:\\mongodb [or the installation folder named by you] by giving the following command.

Strep5: Run following screen prompts.

to start mongodb database server and conform the successful start by the

Installing Robomongo Management Studio: Robomongo is an open source management studio for mongodb database server but you are having various option to choose from like mongoVUE and others. Step1: Download robomongo from http://robomongo.org/ Step2: Install as any other regular installation. Note: When ever you want to do some operation in robomongo management studio you first need to start C:\\mongodb\mongod in “cmd” prompt.

MongoDB Operation through Robomongo Management Studio: Connect to database from Robomongo: Start the robomongo.exe from program files or desktop shortcut and click on “create” on the first screen there the default port and server details (localhost or IP of the remote server) can be provided to get the connection. Create database: Right click on the root of the server directory and then Click of Create database. Give the name to your database in the pop up and click on “Create” button.

Page No.6

Big Data Management

November, 2013

Create Collection: Collection is similar to the Table in the RDBMS for user but internally it is far different than each other so even if the term database is same in both concepts (RDBMS & NoSQL) the tern “Table” is replaced by “Collection”. Expand the database and right click on Collection and choose the Create Collection. Give the Collection Name.

Insert document: Group of Documents is called as Collection in the NoSQL concept. The Document is the real physical storage where the data is stored and each document is unique not only within current collection but also within the entire database through _ObjID column against every document. To insert a new Document, just right click on the Collection “Employee” then choose “Insert Document” option from context menu. MongoDB stores the documents as JSON format and that is why it is so easy to make iterative document (A field of one document can have one or array of more documents).

Page No.7

Big Data Management

November, 2013

CRUD Operation using C#.Net: MongoDB provides Client API for C# developers in ASP.net and in this article we are going to provide some basic CRUD operations using C# programming language in ASP.net. Before doing any thing we have first to download the client API .dll files of Mongodb for ASP.net (C#) from http://. The Client API dll files need to be stored in the bin folder of the .Net project folder. The name space also needs to be added to the code behind page before using the API. using using using using using using

System.Collections.Generic; MongoDB.Bson; MongoDB.Driver; System.Web.Configuration; MongoDB.Driver.GridFS; MongoDB.Driver.Builders;

Insert Document: protected void btn_insert_Click(object sender, EventArgs e) { MongoServer mongo_server; MongoClient mongo_client = new MongoClient(WebConfigurationManager.ConnectionStrings["mongoCS"].ToString()); mongo_server = mongo_client.GetServer(); MongoDatabase mongo_db = mongo_server.GetDatabase("employee"); MongoCollection per = mongo_db.GetCollection("personal1"); info inf = new info(); inf.name ="niladri"; inf.id1 = "5962"; per.Insert(inf); }

Read Document: protected void btn_Read_Click(object sender, EventArgs e) { MongoServer mongo_server; MongoClient mongo_client = new MongoClient(WebConfigurationManager.ConnectionStrings["mongoCS"].ToString()); mongo_server = mongo_client.GetServer(); MongoDatabase mongo_db = mongo_server.GetDatabase("employee"); MongoCollection per = mongo_db.GetCollection("personal1"); foreach (info per_each in per.FindAll()) { this.txt_show.Text = this.txt_show.Text + per_each.name; } }

Page No.8

Big Data Management

November, 2013

Filter Document: protected void btn_filter_Click(object sender, EventArgs e) { IMongoQuery query = Query.EQ("id1", "5962"); MongoServer mongo_server; MongoClient mongo_client = new MongoClient(WebConfigurationManager.ConnectionStrings["mongoCS"].ToString()); mongo_server = mongo_client.GetServer(); MongoDatabase mongo_db = mongo_server.GetDatabase("employee"); MongoCollection per = mongo_db.GetCollection("personal1"); info inf =new info(); inf= per.Find(query).FirstOrDefault(); this.txt_show.Text =inf.name; }

Update document: protected void btn_update_Click(object sender, EventArgs e) { IMongoQuery query = Query.EQ("id1", "5962"); MongoServer mongo_server; MongoClient mongo_client = new MongoClient(WebConfigurationManager.ConnectionStrings["mongoCS"].ToString()); mongo_server = mongo_client.GetServer(); MongoDatabase mongo_db = mongo_server.GetDatabase("employee"); MongoCollection per = mongo_db.GetCollection("personal1"); IMongoUpdate updt = Update.Set("name", "Sagarika").Set("id1", "6962"); SafeModeResult rest = per.Update(query, updt); }

Delete Document: protected void btn_remove_Click(object sender, EventArgs e) { IMongoQuery query = Query.EQ("id1", "5962"); MongoServer mongo_server; MongoClient mongo_client = new MongoClient(WebConfigurationManager.ConnectionStrings["mongoCS"].ToString()); mongo_server = mongo_client.GetServer(); MongoDatabase mongo_db = mongo_server.GetDatabase("employee"); MongoCollection per = mongo_db.GetCollection("personal1"); per.Remove(query); }

In all the above CRUD operation example one common line of code which may create confusion is that MongoCollection per = mongo_db.GetCollection("personal1"); The meaning of this line is that to fetch the collection named “personal1” and put in per variable which stores in “info” class object format. The “info” class needs to be in designed in proper manner so that all the documents can be accommodated in the “info” format. The class needs to be stored in App_code system folder of .net Lets see the “info” class for our example given above. using using using using using

System; System.Collections.Generic; System.Linq; System.Web; MongoDB.Bson;

Page No.9

Big Data Management

November, 2013

using MongoDB.Driver; /// /// Summary description for Class1 /// public class info { public ObjectId _id { get; set; } public string name { get; set; } public string id1 { get; set; } }

In this class, public ObjectId _id { get; set; } is common as every document in collection will have this field as a mandate field to uniquely identify each document separately. Other Alternative for .Net Developers: There are many NoSQL databases available for use by developer community with their pros and cons but few of them actually touched the industry bench mark in their performance and features. Ravendb, CouchDB, RaptorDB & Cassandra are very popular database server in this specific domain. One of the unique features found in the RavenDB is “Embedded Server” concept. In the scenarios where remote server does not provide support for any NoSQL database server then RavenDB can be embedded inside the web application itself and runs as part of the application. I hope reader may used the information provided in this article to explore further more and can effectively use this technology for BIGDATA management.

Page No.10

Node JS November, 2013

Development of light-weight, highly scalable web apps

A.K.Hota, [email protected] D.Madan Prabhu, [email protected]

Today, Internet is not just the place for consumption of information. It has been used as the platform for communication and real-time interaction between them. As more and more number of people started using internet, the servers are heavily flooded with their requests and busy with their responses. To advert the situation, the server are highly equipped with the higher end hardware like maximum number of core/processor, memory, enhanced I/O devices & etc., But, we are very curious to know, whether these arrangements are quite enough to handle the increasing active real-time communication even with the traditional web development platforms like ASP.Net, Java, Python, Ruby & etc. NodeJS is the new platform developed to address all these issues summarised above with less and concise coding for light-weight, highly scalable, I/O non-blocking web apps development.

Node.js is a platform built on Chrome's V8 JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices. Before, we digging into the Node, we might want to see, how Node differs in web request & response cycles handling from traditional web platforms. In traditional web serving techniques, each request/connection spawns a new thread, eating some RAM and may do context-switch while serving maximum requests. Whereas Node.js works pretty interesting on single thread model instead of multi-threaded in traditional way. From the following figure, we can easily understand that mechanisms.

Page No.11

Node JS November, 2013

A quick calculation: assuming that each thread potentially has an accompanying 2 MB of memory with it, running on a system with 8 GB of RAM puts us at a theoretical maximum of 4000 concurrent connections, plus the cost of context-switching between threads. That’s the scenario you typically deal with in traditional webserving techniques. By avoiding all that, Node.js achieves scalability levels of over 1M concurrent connections (as a proof of concept). Now, we are very curious to know, how JavaScript popularly known as 'Toy language’ has been chosen as a programming language in Node platform. Over the past few years, JS has gone through different phases from client-side data validation till XHR & JS libraries. In addition to the highly deployed content in web compared with other languages, following facets builds the concrete fundamental block for JS as favourite server side language.  Asynchronous - JavaScript is naturally asynchronous with event model well suited to building highly scalable web applications through call-backs.  Less Learning curve - A huge base of developers are already familiar with both JavaScript and asynchronous programming from years developing JavaScript in web browsers.  Lighting Fast Script engine – Huge advances in execution speed has made it practical to write server side software entirely in JavaScript.  Code Sharing - Developers can write web applications in one language, which helps by reducing the "context" switch between client and server development, and allowing for code sharing between client and server.  Code Transformation - JavaScript is a compilation target and there are a number of languages that compile to it already.  Support for NoSQL - JavaScript is the language used in various NoSQL databases (e.g. CouchDB/MongoDB), so interfacing with them is a natural fit.  JSON - It is a very popular data interchange format today and it is native JavaScript.

Node.js Architecture: Node.js platform categorized their internal works into 3 layers as shown in the figure. The base layer contains all of the core components, middle layer acts as a middle-ware by establishing communication from lower to top layer. The final top layer consists of all JavaScript API for programming. V8 JavaScript Engine is an open source JavaScript engine developed by Google for the Google Chrome web browser. It compiles JavaScript to native machine code before executing it, instead of more traditional techniques such as executing byte code or interpreting it. The compiled code is additionally optimized (and re-optimized) dynamically at runtime, based on heuristics of the code's execution profile.

Page No.12

Node JS November, 2013

Libev – It implements the event loop and abstracts the underlying specific technologies use (such as select,epoll & etc) libeio - full-featured asynchronous I/O library uses a thread pool to execute blocking calls in the background. c_ares - library for asynchronous DNS requests helps in that regard by implementing a nonblocking DNS resolution library http_parser - parser for HTTP messages includes requests and responses written in C to be used in performance HTTP applications. It does not make any syscalls nor allocations, it does not buffer data, it can be interrupted at anytime.

NPM : Node Package manager The Node Package Manager (npm) is a utility bundled with Node.js that offers a set of publicly available, reusable components, available through easy installation via an online repository, with version and dependency management. A full list of packaged modules can be found on the NPM website https://npmjs.org/ , or accessed using the NPM CLI tool. The module ecosystem is open to all, and anyone can publish their own module that will be listed in the NPM repository. Some of the most popular NPM modules today are: express

Express.js, a Sinatra-inspired web development framework for Node.js, and the de-facto standard for the majority of Node.js applications out there today.

connect

Connect is an extensible HTTP server framework for Node.js, providing a collection of high performance “plug-in” known as middleware; serves as a base foundation for Express.

socket.io & sockjs

Server-side component of the two most common websockets components out there today.

Jade

One of the popular templating engines, inspired by HAML, a default in Express.js.

mongo & mongojs

MongoDB wrappers to provide the API for MongoDB object databases in Node.js.

redis

Redis client library.

coffeescript

CoffeeScript compiler that allows developers to write their Node.js programs using Coffee.

Multi-node

Spawns child processes sharing listeners

forever

Probably the most common utility for ensuring that a given node script runs continuously. Keeps your Node.js process up in production in the face of any unexpected failures.

Step

use to implement synchronization logic in our apps

Node- inspector Visual debugger for Node.js

Page No.13

Node JS November, 2013

Where does Node.js Fit : Node is best suited for data-intensive real-time (DIRT) applications. Since Node itself is very lightweight on I/O, it is good at shuffling/proxying data from one pipe to another, data streaming, push notification and more. It allows a server to hold open a number of connections while handling many requests and keeping a small memory footprint. It is designed to be responsive like the browser. The single threaded event model of Node doesn't fit for heavy computation process, CPU intensive application will block the node responsiveness with current connection, mean while rest of the connection kept in the queue to serve later. Real time applications are best use cases for Node. In order to account with any kind of new technologies, plainly reading doesn't hold you on the track, we may move to the next level by coding some things and see those works in the real application. Before, we try out the programming, we need to install Node.js. Instead of repeating the process here, I kindly ask you to visit the official installation instructions. Please come back once you are up and running.

Hello World HTTP Server n Node, the application and the server to host is the same. It may seem odd when compared to other platforms like PHP as scripting language and Apache uses as web server. So, Node makes easy to create different type of servers. Here is an example of an HTTP server that simply responds to any request with 'Hello World': var http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello World\n'); }).listen(3000); console.log('Server running at http://localhost:3000/');

helloWorld.js

To create HTTP server, we include the third party package/library http by using require function. Next, we can create the app instance by calling createServer function with anonymous function, which in turn creates HTTP server. The anonymous function a.k.a callback, wraps the response to be taken once we received the request. The really interesting (and, if your background is a more conservative language like PHP, odd looking) part is the function definition right there where you would expect the first parameter of the createServer() call. Turns out, this function definition IS the first (and only) parameter we are giving to thecreateServer() call. Because in JavaScript, functions can be passed around like any other value. This kind of coding makes the asynchronous programming possible and these are known as callback. Due to this nature, server is not blocked with the current requests for the I/O and it proceeds on the rest of the flow of the server response, these are somewhat same kind of asynchronous behaviours in web browsers achieved in web server with the Javascript. In web browsers, through XHR (AJAX) we request some data from server, mean-while the user events like click, mouse move handled by the single threaded web browser engines. To see the output, Give the following command in your command prompt.

Page No.14

Node JS November, 2013

In browser, simply type the localhost:3000 and you will get the Hello World as output. The same asynchronous can be handled in another way in Node is event emitters and listeners. var http = require('http'); var server = http.createServer(); server.on('request', function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello World\n'); }); server.listen(3000); console.log('Server running at http://localhost:3000/'); Setting up an event listener for 'request'

Event-driven asynchronous server-side JavaScript with call-backs in action :-) While observing this coding style, we are about to ask how we can handle real time applications with consists of maximum line of code. This can be addressed by the module system. Here Node.js, the server.js file may contain server specific functions and index.js contains all the module references and the entire work-flow, finally all the required modules as mentioned in the index.js will be available in separate file. In this way, the dependency will be injected loosely through the execution time. Have a look the following program that consists of the same pattern as mentioned above to handle modularity. Let's first extend our server's start() function in order to enable us to pass the route function to be used by parameter: var http = require("http"); var url = require("url"); function start(route) { function onRequest(request, response) { var pathname = url.parse(request.url).pathname; console.log("Request for " + pathname + " received."); route(pathname); response.writeHead(200, {"Content-Type": "text/plain"}); response.write("Hello World"); response.end(); } http.createServer(onRequest).listen(8888); console.log("Server has started."); } exports.start = start;

server.js

Page No.15

Node JS November, 2013

The following defines the route function and exports that function to call by required areas. function route(pathname) { console.log("About to route a request for " + pathname); } exports.route = route;

router.js

And let's extend our index.js accordingly, that is, injecting the route function of our router into the server: var server = require("./server"); var router = require("./router"); server.start(router.route);

index.js

Again, we are passing a function, which by now isn't any news for us. If we start our application now (node/nodejs index.js, as always), and request an URL, you can now see from the application's output that our HTTP server makes use of our router and passes it the requested pathname. Through this modularity, we can design real application by keeping specific functionalities in the separate files and injected at the application as per requirement dynamically at the execution.

Node.js Vs PHP benchmarking: My Testing box:  Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz ( 4 cores)  4GB DDR3 RAM  Linux Mint 14 Nadia  Nodejs – v0.10.22  Apache/2.2.22  PHP 5.4.6

Page No.16

Node JS November, 2013

My Scripts : var http = require("http"); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/html'}); var buffer = '\n\nSpeed test\n\n\n'; for (var i = 0; i < 10000; i++) { buffer += '

Hello world

\n'; } buffer += '\n'; res.end(buffer); }).listen(8080); test.js

test.php

To know, how our application may perform during peak load occasions. We tested our scripts with 100 simultaneous active connections over 10 seconds using siege tool (http/https regression testing and benchmarking utility). From the following performance reports ensure that, how efficient Node.js handles the requests and responses cycles during high load.

How to use MongoDB with Node.js : In order to use mongodb, we need to install mongodb packages. Here, we are going to use mongojs package. This package can easily installed by giving the command 'npm install mongojs' in terminal.

Page No.17

Node JS November, 2013

var databaseUrl = "localhost:27017/test"; var collections = ["test"]; var db = require("mongojs").connect(databaseUrl, collections); var http = require("http"); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/html'}); var names = ""; db.test.find({}, function(err, users) { users.forEach(function(tUser){ names += tUser.name+'
'; }); res.write(names); res.end('-----------List of Users-----------'); }); }).listen(3000); console.log('Server running at http://localhost:3000/');

app.js

In this test, we are already connected to the test by default. After, we connected to the test DB, we select 'test' collection and then querying all the objects. Once, we retrieved all the objects, we append the 'name' attributes to the string variable. At the end of the call-back, we write the resultant string into the response stream. This is very important to note, we are not directly writing into the stream from DB querying. Because, DB querying is blocking, high latency process and writing into the response by res.write() results in more expensive since the http object is not Javascript but C++, and therefore crosses CPU contexts to execute.

Where does Node.js Fit: Node is best suited for data-intensive real-time (DIRT) applications. Since Node itself is very lightweight on I/O, it is good at shuffling/proxying data from one pipe to another, data streaming, push notification and more. It allows a server to hold open a number of connections while handling many requests and keeping a small memory footprint. It is designed to be responsive like the browser. The single threaded event model of Node doesn't fit for heavy computation process, CPU intensive application will block the node responsiveness with current connection, mean while rest of the connection kept in the queue to serve later. Real time applications are best use cases for Node. The following are the few areas, where we can utilize the Node capabilities for implementation.  Real time communication systems like CHAT, MAIL, Quick SMS, Team collaboration application & etc.,  Data streaming & proxy  Train ticket booking system  Stock brokerage systems  System & Application Monitoring Dashboards  Active real time Dashboards of several web service/REST  Used to enable real time communication to current web viewers for further discussing the service. May be used in dial.gov.in like sites.  Result publication sites using redis/memcache(NoSQL) database as backend

Page No.18

Node JS November, 2013

Challenges When creating node application basically asynchronous style of coding you have to pay close attention to how your application flows and keep a watchful eye on application state: the conditions of the event loop, application variables, and any other resources that change as program logic executes and exceptions it may thrown. Because of this asynchronous nonblocking IO by using call-backs, we have no overview of where the error occurred; in addition to that handled exceptions will be bubbled up to the core Node.js event loop, which will cause the Node.js instance to terminate (effectively crashing the program). Handling first time of multiple call-back style coding really pushes developers to some confusion and awkwardness on big projects. This is the price, we need to spare if we try this new technologies.

Conclusion Node.js is not a silver bullet new platform that will dominate the web development world. Instead, it’s a platform that fills a particular need. Node.js was never created to solve the compute scaling problem. It was created to solve the I/O scaling problem, which it does really well. So, give it some thought: if your use case does not contain CPU intensive operations nor access any blocking resources, you can exploit the benefits of Node.js and enjoy fast and scalable network applications. Welcome to the real-time web.

Page No.19

Perceptron Learning

November, 2013

PERCEPTRON LEARNING The ‘ABC’ of Machine Learning By Niladri Bihari Mohanty [email protected] [email protected]

In this article we will learn how a Neural Network works as well as the extension of it in the form of artificial Neural Network. The objective of Artificial neural Network, Perceptron learning will be in this issue where as code implementation in various platform (.net, Aforge.net, Python) will be discussed in next issue. Quick Recap of what we know from previous Issues of FocusIT (FocusIT, July 2013)! Algorithms to solve any real life problem are basically classified in two types a. Deterministic (formula based) b. Probabilistic (experience and self learning based) 2. Probabilistic Algorithms are mostly (not all) inspired by biology and biological evolution. 3. Ant colony optimization (inspired by Ant) 4. Genetic Algorithms (inspired by Gene), Artificial Neural Network (inspired by neuron), Swarm Optimization (inspired by small entities) are few of the bio inspired probabilistic algorithms. 5. Artificial Neural Network is very powerful probabilistic model and it also includes many machine learning algorithms as part of it. An Artificial Neural Network is an information processing paradigm that is inspired by the way biological nervous system, such as the brain, process the information. The human brain contains about 10 billion nerve cells, or neurons. On average, each neuron is connected to other neurons through about 10000 synapses. Those synapses evolve as the links between other neurons as the learning process continues. Because of this nature, ANN does not need to program to perform tasks like other computer programs and it helps us to design evolutionary robots, that capable of programming themselves by their experience. The brain's network of neurons forms a massively parallel information processing system. This contrasts with conventional computers, in which a single processor executes a single series of instructions. As a discipline of BIC under Artificial Intelligence, Neural Networks attempt to bring computers a little closer to the brain's capabilities by imitating certain aspects of information processing in the brain, in a highly simplified way. An artificial neuron is a computational model inspired by the natural neurons. Natural neurons receive signals through synapses located on the dendrites or membrane of the neuron. When the signals received are strong enough (surpass a certain threshold), the neuron is activated and emits a signal though the axon. This signal might be sent to another synapse, and might activate other neurons. The complexity of real neurons is highly abstracted when modeling artificial neurons. These basically consist of inputs (like synapses), which are multiplied by weights (strength of the respective signals), and then computed by a mathematical function which determines the activation of the neuron. Another function (which may be the identity) computes the output of the artificial neuron (sometimes dependence of a certain threshold). ANNs relays on the collective behavior of interconnected neurons to process the information. Even though neurons processing speed is less compared to current transistors, this system succeeds in the computation because of their extreme parallel processing by high degree of basic unit - neurons.

Page No.20

Perceptron Learning

November, 2013

Perceptron Learning in Artificial Neural Network X1 W1

X2

X3

W2

Vk

Yk F(y)

K W3 W0

Activation Threshold Function

Wn

xn

-1

Understanding the entire biological neural system is very complex so a tiny cross section has been tried to be simulated in ANN. As biological neurons are inter connected with each other and sends as well as receives biological signal with an adjustable weight value associated with each of them, in ANN system also same pattern has been followed. X1, X2, X3,….. Xn (Colored in Blue) are the input signal from respective neuron to the Neuron K (Colored in Red) which is the centre of the study. Where as W1, W2, … Wn are the weight associated with respective input signal. W0 is the bias mainly used to regulate (Uniformly amplify or reduce the overall input signal) the overall input signal for mathematical comfort although this concept of bias value is not in biological neural system. Yk is the output from the neuron K feed into the activation threshold function to et crisp final output Vk. Yk= (X1 *W1 + X2 *W2 + X3 *W3 + ……+ Xn *Wn ) + W0 ………………………………….1 Vk = F(Yk) …………………………………………………………………………………….2 In this model the Inputs are Xi and Output is Vk. The model has also two adjustable weight variables Wi and bias W0. Objective of Artificial Neural Network: Develop a model which can provide the most probable output for a given set of Input values. For this ANN system must first choose the most appropriate weight values Wn and bias value W0. Training Module  Known Input Data  Know Desired Output Data  Carefully chosen Bias value and threshold function  Unknown Weight variables

Final module  Known Input Data  Known Weight variables  Known Bias value and threshold function  Unknown Desired Output Data

Page No.21

Perceptron Learning

Formula based / deterministic algorithms works on the basis on the particular formula and every output parameter is the function of the input parameters. But what if the input parameter doesn’t have any fixed or rigid formula based relation with the output, then the output need to be guessed on the basis of previous experience of given set of input and most probable output need to be formulated. For Ex: If import duty is X1 , oil price is X2 and Population X3 then what should be size of vehicle market taken as Vk. In this example there is no fixed relationship between the X and V but from the previous experience one training dataset (input value and output value corresponding to the input value gained from previous experience) can be feed to the ANN model from which the weight value W and other adjusted variable can be figured out by iterative learning process to match the desired output up to the maximum level. Once the adjustable weight values are found for each input and ANN model is trained then it will try to give the most accurate output/prediction for any given set of input by the basis of knowledge it gathers during the training. There are various training modules used in ANN but the most fundamental and basis learning model in Perceptron Learning model.

November, 2013

Start

Provide sample data (Input data & respective desired Output data)

Set weights, bias & frequency randomly

Process the I/P and get O/P

Find error = (Desired O/P - Actual O/P)

If error is accepted then

No

Adjust Weights, bias and threshold function.

Yes Save the weight & other adjustable variables for further forecasting

End

In this article we have provided you some basis idea to implement a simple self learning model using Artificial Neural network but there are many thing to learn from this starting point as we have said Perceptron Learning is the “ABC” of machine learning which can solve only linearly separable data (which can be separated by a single line in 2D). Our goal is to arm you with some of the core revolutionary technologies and to link those with industry requirement. These AI theories are very basic and easy but have never been well implemented in eGovernance application as a result of which it suffers due to lack of intelligence. We will provide you all the concepts about how to program the same Perceptron Self Learning mechanism using .Net and Python in our next coming December issue. We will also show the implementation of the same technique in e-Governance application.

Page No.22

CYBER SECURITY NEWS AND EVENTS ISSUE 5, NOVEMBER 2013 A controversial US surveillance programme that sweeps internet usage data had 700 snooping servers installed at 150 locations around the world, including one in India, according to a report. The XKeyscore programme, run by the National Security Agency (NSA), allowed analysts to search through vast databases containing emails, online chats and browsing histories of millions of individuals, Guardian reported, citing documents provided by whistleblower Edward Snowden.

'India has third highest infection rate of Zero Access botnet' New Delhi: ZeroAccess, which is one of the largest known botnets in existence, has infected more than 1.9 million computers globally on a given day in August this year with India having the third highest infection rate after the US and Japan, cyber security firm Symantec has said. Cyber criminals distribute malicious software or malware that can turn a computer into a bot, also known as a zombie or robot, which makes the computer perform automated tasks over the Internet, without one's knowledge. Criminals typically use bots to infect large numbers of computers. These computers form a network, or a botnet. They are used to send spam email messages, spread viruses, attack computers and servers and commit other kinds of frauds. ZeroAccess is a sophisticated and resilient botnet, which has been active since 2011, with an upwards of 1.9 million infected computers on a given day as observed in August 2013, Symantec said. "While 35 per cent of the infections were observed in the US, India had the third highest infection rate globally, just behind US and Japan. Nearly six per cent of ZeroAccess infections were observed in India, it added. Symantec takes the first step in successfully combating the ZeroAccess botnet by sinkholing more than half a million bots, thus making a serious dent to the number of bots under the control of cyber criminals, the firm said. The company is working with ISPs and CERTs worldwide to share information and help get ZeroAccess bot infected computers cleaned up, it added ."ZeroAccess leverages click-fraud and Bitcoin mining to carry out two revenue generating activities, potentially earning tens of millions of US dollars per year in the process," Symantec said. (Continued to page 2......................)

HP unveils cyber security solutions for enterprises in India Technology major Hewlett-Packard Thursday announced new cyber security solutions for enterprises in India to enable firms upgrade their security infrastructure against a growing and sophisticated threat landscape.The company said cyber intrusions have become more advanced due to adversaries forming sophisticated and collaborative marketplace through which they share information and advanced data theft tools. Quoting studies, HP said 92 percent of Forbes Global 2,000 companies reported data breaches in the last 12 months. Also, studies estimate that global cybercrime black market has a value of USD 104 billion per year. Besides, with concepts like bring your own device (BYOD), IT no longer controls the endpoint, thus offering adversary many more control points to attack. "Enterprises Thursday aren't facing a single attacker. They are fighting a well-organised, well funded adversary marketplace," HP India Country Manager (Enterprise Security Products)Ranndeep Singh Chonker said. (Continued to page 2...............) Page 23

CYBER SECURITY NEWS AND EVENTS ISSUE: 5, NOVEMBER 2013

'India has third highest infection rate of Zero Access botnet' ..............continued from page 1)

Click fraud Trojan downloads online ads onto the infected computer and generates artificial clicks on the ads as if they were generated by legitimate users, it added. Bitcoins is a type of digital currency and this holds a number of attractions for cybercriminals. The way each bitcoin comes into existence is based on the carrying out mathematical operations known as 'mining' on computing hardware, which has directed value to botmaster and cost to the unsuspecting victims.

HP unveils cyber security solutions for enterprises in India ..............Continued from page 1) HP's Threat Central, which the firm claims is industry's first community-sourced security intelligence platform, facilitates automated, real-time collaboration among firms in the battle against active cyberthreats, he added. "HP TippingPoint Next-Generation Firewall addresses risks introduced by cloud, mobile and BYOD by delivering easy to use, reliable, high-performance security effectiveness with granular application visibility and control," HP India Director Infrastructure Technology Outsourcing Portfolio,Enterprise Services Susanta Bhattacharya said. Similarly, HP ArcSight and HP Fortify offer data-driven security technologies that empower security operations teams to run more effectively with accelerated and real-time application-level threat detection, he added. HP Managed Security Services (MSS) help internal security teams accelerate threat identification, response and remediation by providing expertise and advanced security intelligence, HP ESP (South Asia) Chief Solutions Architect Damanjit Uberoi said. With HP MSS, intrusions are detected within 11.8 minutes, and 92 percent of major incidents are resolved within two hours of identification, he added.

India, Japan to collaborate in ICT India and Japan have decided to set up a working group to identify specific areas in information and communication technology (ICT) where they can collaborate - in areas such as cyber security. Masahiro Yoshizaki, vice minister for policy coordination, ministry of internal affairs and communications, Japan, and Anil Kaushal, member of Telecom Commission of India, signed a joint statement Thursday, which said the thrust area would be development of technology and standards. "We wish to cover many topics in ICT under this partnership. The working group will implement the intent of the joint statement," Kondo Masanori, Director for International Cooperation Affairs, Ministry of Internal Affairs and Communications, Japan, told IANS in an interview. Masanori said each country has its own expertise and they are looking at a mutual collaboration. "Basically we have identified three key areas of specific work - cyber security, disaster management and capacity building," added Kaushal. Masanori said Japan would look forward to Indian collaboration in combating cyber attacks. "Every day there are numerous cyber attacks. We are looking forward to combating those." Talking separately to IANS, Kaushal said Japan is much ahead in broadband technology compared to India. "We have much to learn from them." There are over 50 Japanese companies in India in the ICT filed. Some renowned names include Fujitsu India, Olympus (India), Ricoh India, (continue to page 3..............) Page 27

CYBER SECURITY NEWS AND EVENTS ISSUE: 5, NOVEMBER 2013

India, Japan to collaborate in ICT (……………….continued from page 2)

Panasonic Industrial Asia, Sony India and Sumitomo Electric Industries.There are also many Indian companies operating in Japan, mostly in software development.Both the countries are keen on private sector collaboration as well, he said. Talking about the importance of ICT in disaster management, Masanori said India has evinced interest to learn disaster mitigation through ICT from Japan. " The Indian government is keen to adopt our disaster mitigation strategy through ICT. We are also ready to cooperate and share." Japan, which is very prone to earthquake, manages to mitigate natural disaster to an extent using ICT. It collects data through censor-linked function, do data analysis and sends out alert to people. "Data dissemination is very important. We have to build the entire eco-system." Kaushal said it is important for India to learn disaster management, especially after the Uttarakhand floods in which thousands died. The two countries will also look forward to mutually beneficial collaboration on international platforms like the International Telecommunication Union and the Asia Pacific Telecommunity.

The Amrita Centre for Cyber Security is making efforts to ensure that this device is securely lodged in an earring or a ring, she added. "We are designing it so that it can be affordable. The device will soon be equipped with technology that can videotape events," she said. A distinct feature of APSS is that it can function even in rural areas where the speed of communication is minimal. It can function indoors and outdoors with minimal power consumption. The device integrates more than 15 features pertaining to women's safety and security. "Significantly, it can also be used as a safety device for mentally challenged people with many functionalities to quickly identify their whereabouts." The APSS prototype will be formally unveiled during 'Amritavarsham60' ? the 60th birthday celebrations of spiritual and humanitarian leader Mata Amritanandamayi Devi, from September 26-27.She also is the Chancellor of Amrita University.

Now a cyber security device to protect women from violence Give a break to chilli powder and pepper sprays. Amrita Centre for Security has developed a new device to protect women from sexual offenders and other forms of violence.The Amrita Personal Safety System (APSS) is an inconspicuous, wearable and easy-to-operate electronic device that will help girls and women to trigger communication with family and police when in distress, Krishnashree Achuthan, Director of Kollam-based Amrita University's Centre for Cyber Security Systems & Networks', said in a release. "The device will remain invisible to the offender and yet can easily be triggered by its user with multiple option to ensure stealthy and secure communication, she said. The device also offers automated information to the nearest police station,hospitals and fire stations about the victim so that she can get immediate help, she said. Page 26

MACHINE INTELLIGENCE NEWS AND EVENTS ISSUE: 5, NOVEMBER 2013

Automatic Speaker Tracking in Audio Recordings To date, the best diarization systems have used what's called supervised machine learning: They're trained on sample recordings that a human has indexed, indicating which speaker enters when. In the October issue of IEEE Transactions on Audio, Speech, and Language Processing, however, MIT researchers describe a new speakerdiarization system that achieves comparable results without supervision: No prior indexing is necessary.Moreover, one of the MIT researchers' innovations was a new,compact way to represent the differences between individual speakers' voices, which could be of use in other spoken language computational tasks."You can know something about the identity of a person from the sound of their voice, so this technology is keying in to that type of information," says Jim Glass, a senior research scientist at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and head of its Spoken Language Systems Group. "In fact, this technology could work in any language. It's insensitive to that." To create a sonic portrait of a single speaker, Glass explains, a computer system will generally have to analyze more than 2,000 different acoustic features; many of those may correspond to familiar consonants and vowels, but many may not. To characterize each of those features, the system might need about 60 variables, which describe properties such as the strength of the acoustic signal in different frequency bands.

Engineers Invent Programming Language to Build Synthetic DNA A team led by the University of Washington has developed a programming language for chemistry that it hopes will streamline efforts to design a network that can guide the behavior of chemical-reaction mixtures in the same way that embedded electronic controllers guide cars, robots and other devices. In medicine, such networks could serve as "smart" drug deliverers or disease detectors at the cellular level. The findings were published online this week (Sept. 29) in Nature Nanotechnology. Chemists and educators teach and use chemical reaction networks, a century-old language of equations that describes how mixtures of chemicals behave. The UW engineers take this language a step further and use it to write programs that direct the movement of tailor-made molecules.

E pluribus tres The result is that for every second of a recording, a diarization system would have to search a space with 120,000 dimensions, which would be prohibitively time-consuming. In prior work, Najim Dehak, a research scientist in the Spoken Language Systems Group and one of the new paper's co-authors, had demonstrated a technique for reducing the number of variables required to describe the acoustic signature of a particular speaker, dubbed the i-vector. To get a sense of how the technique works, imagine a graph that plotted, say, hours worked by an hourly worker against money earned. The graph would be a diagonal line in a two-dimensional space. Now imagine rotating the axes of the graph so that the x axis is parallel to the line. All of a sudden, the y-axis becomes irrelevant: All the variation in the graph is captured by the x-axis alone. Similarly, i-vectors find new axes for (continue to page 2................

"We start from an abstract, mathematical description of a chemical system, and then use DNA to build the molecules that realize the desired dynamics," said corresponding author Georg Seelig, a UW assistant professor of electrical engineering and of computer science and engineering. "The vision is that eventually,you can use this technology to build general-purpose tools." Currently, when a biologist or chemist makes a certain type of molecular network, the engineering (cont to page 2....................... Page 25

MACHINE INTELLIGENCE NEWS AND EVENTS ISSUE: 5, NOVEMBER 2013

Automatic Speaker Tracking in Audio Recordings ................continued from page 1) describing the information that characterizes speech sounds in the 120,000-dimension space. The technique first finds the axis that captures most of the variation in the information, then the axis that captures the next-most variation, and so on. So the information added by each new axis steadily decreases. Stephen Shum, a graduate student in MIT's Department of Electrical Engineering and Computer Science and lead author on the new paper, found that a 100-variable i-vector -- a 100- dimension approximation of the 120,000-dimension space – was an adequate starting point for a diarization system. Since ivectors are intended to describe every possible combination of sounds that a speaker might emit over any span of time, and since a diarization system needs to classify only the sounds on a single recording, Shum was able to use similar techniques to reduce the number of variables even further, to only three. Birds of a feather For every second of sound in a recording, Shum thus ends up with a single point in a three-dimensional space. The next step is to identify the bounds of the clusters of points that correspond to the individual speakers. For that, Shum used an iterative process. The system begins with an artificially high estimate of the number of speakers -- say, 15 -- and finds a cluster of points that corresponds to each one. Clusters that are very close to each other then coalesce to form new clusters, until the distances between them grow too large to be plausibly bridged. The process then repeats, beginning each time with the same number of clusters that it ended with on the previous iteration. Finally, it reaches a point at which it begins and ends with the same number of clusters, and the system associates each cluster with a single speaker. "What was completely not obvious, what was surprising, was that this i-vector representation could be used on this very, very different scale, that you could use this method of extracting features on very, very short speech segments, perhaps one second long, corresponding to a speaker turn in a telephone conversation," Kenny adds. "I think that was the significant contribution of Stephen's work."

Engineers Invent Programming Language to Build Synthetic DNA .............cont from page 1) process is complex,cumbersome and hard to repurpose for building other systems. The UW engineers wanted to create a framework that gives scientists more flexibility. Seelig likens this new approach to programming languages that tell a computer what to do. "I think this is appealing because it allows you to solve more than one problem," Seelig said. "If you want a computer to do something else, you just reprogram it. This project is very similar in that we can tell chemistry what to do." Humans and other organisms already have complex networks of nano-sized molecules that help to regulate cells and keep the body in check. Scientists now are finding ways to design synthetic systems that behave like biological ones with the hope that synthetic molecules could support the body's natural functions. To that end, a system is needed to create synthetic DNA molecules that vary according to their specific functions. The new approach isn't ready to be applied in the medical field,but future uses could include using this framework to make molecules that self assemble within cells and serve as "smart"sensors. These could be embedded in a cell, then programmed to detect abnormalities and respond as needed, perhaps by delivering drugs directly to those cells.

Page 24