The world we live in is so big, thousand of millions of people live in here, they are busy, they do numerous activities and the most important thing for me is they do it on their computers. So when that happens, tera-bytes of data is generated every hour. If you think I am just fooling you around, let’s take an example, the most important thing that a person presumes when he wakes up in the morning is not brushing his teeth or reading magazines along with his early morning restroom stuff, but it is updating his/her twitter status and checking who uploaded last night party’s picture this morning. Then s/he starts liking/commenting these pictures, status etc. Just imagine, how facebook or twitter will store this data, manage the log files, replicate it for data backup; it’s just too much of data. Due to this huge amount of data, the problems regarding the storage, performance, analysis and adhoc-interactive data representation started to show off.
We did it!! We again proved that human is the most intelligent animal in this earth, Google, yahoo, Microsoft already predicted that the data will be massive in future, so they worked for technology to handle such problem. For example, Google released the papers of their big table, google file system (GFS) and map reduce framework. Big table is a huge column orientated database for most of Google’s applications, GFS is their own File System, and Map reduce is the framework which is used for most of the work they do, searching, indexing etc. With the help of Google’s papers, Open Source Community worked on these technology to develop technology such as Hadoop framework, and on top of that creating subprojects such as Hbase, CHukwa, ZooKeeper, Hive, Pig and Others. Along with Hadoop and it’s subproject where major contribution comes from Apache Software Foundation, Stumble Upon, Yahoo, Microsoft (can you believe it!!) , Streamy and other; other technology such as Cassandra which is based on Amazon’s Dynamo is also very popular. It is obviously very hard to incorporate every details in this article, however I would like to make a series of article where I will writing about Map Reduce, Hadoop, Cassandra, HBase and Hive; if possible Zooper and Chukwa as well. For now I will provide some references which you help you guys to read.
1. Map Reduce Framework
When I first tried to read this paper from Google, I felt like it is a rocket science. However, I tried to read it six to seven times without any success, so I tried to watch some of the videos of Map Reduce. And later I began to realize that, I did this thing when I was in grade 5, we used to play this game of organization all our words that we used to write in our Nepali text book and we used to create a dictionary out of it in the end. So my ambition is to save you time, and at the same time make your understanding of Map Reduce easier.
Let me assume that I am Google and I have entire web on my data centers. My data center is composed of millions of commodity hardware among th
en my three favorite servers are Foss (Node 1 in fig.1), Nepal (Node 2 in fig.1) and community (Node 3 in fig.1). These three servers contain the data of English Premier League season 2009. Now what I would like to know is who scored the highest number of goals in EPL 2009. We will use map reduce to find it out.
Considering the program definition, what we need to find out is players scoring goals. So what our Mapper task does is, it will search through the whole data (preloaded local input data in fig 1) and find out the players who scored goal, for example is Wayne Rooney have scored a goal in a match against West Ham United (Match 1 of the season) then our program will create a map, <Rooney, 1>. This means Rooney score 1 goal against West Ham United. So the output (Intermediate data from mappers) of the mapping process is the key value pair where we have key as the name of the player and the value the goal he scored. So we will create such key value pair for all the players who scored goals.
Note that, the data of Match 1-9 is in Foss server, 10-25 is in Nepal server and 26-38 is in Community server. This means, if Rooney scored 3 goals from Match 1-9, and he scored 5 goals in match 10-25 then we must communicate between two servers to get his goals from match 1-25, so what we do is, we write a reduce function. A reduce function will check each one of the players (keys) and try to add the number of goals they score. They take keys from all three servers and do the manipulation which gives the results as <Drogba, 29> turns out Didier Drogba is the one who scored the most goals.
This is how Map Reduce works. I am not sure how much you guys understood, but I am sure you had a pretty good knowledge of EPL now. Please do give me your feedbacks and for help email me at akashakya at gmail dot com. I will catch up in next session of init magazine where I will write about Hadoop.
With world market though to be drowning, there seems to be shift of balance in computing power. China have emerged as the owner of new fastest computer in the world called “Tianhe-1A”.
I was on rush to complete a web application project which I completely messed up. Spring Framework 2.5 suck. It was already 5:30 and Stallman may arrive there every second. I had to make a choice either to run down to Clancy or stay at K17. Not my mind but my instinct took me to Clancy and I didn’t even bother about losing 15 marks. I already have a RSVP so found my name, got the banner, fixed my camera and nervously stayed at the couch. Few friends came in, and went to this giant auditorium at Clancy.
Even if you know Stallman and read about him, seen his pictures or videos when he first appear in front of you. You will say “Oh Fuck!! He looks like the one in pashupati..”, and same thing happened in that stage, nobody even realised him coming in and going to take the podium. Even the MC was amazed. And he started with a glass of water with his medicine and criticized the taxi driver for over charging him. And there he goes, “What is Free Software?” – And I was like “What the fuck? Are you going to spend my next 3 hours listening what I have learned for years and years…” He talked about all old stuffs that we practised everyday at FOSS Nepal Community. Sometimes I left we are way ahead. And one thing that made me proud was the level of understanding that we have of free software. But I kept myself awake, I don’t know if it’s playing with my camera, coke or Stallman’s speech.
“Linus Torvalds is aganist free software, he says he want convenience.”
“Throw VLC in your bin!!”
“Open Source is another way to promote, proprietary software.”
“iBad on your cuffin”
“Slow poison got made, Windows”
“Go to GNU.org to find out more..”
Critics, critics, critics…. Stallman made me confused…
We were desperately trying to prepare a build in hudson and a mail popped up in my browser window, it was from Asia Open Source Software Community (AOSS). It said I was invited to speak on “Open Source Forensic Tools” at SIM University, Singapore. omg! I didn’t even know what that means or I have never ever used any of such computer forensic tools. Later I came to know that the mail was for a Srilankan guy, and we were asked to speak on Nirvikalpa, an open source OSS win CD. Also, we have a group discussion on ERP/CRM and Cloud Computing. God, I was so excited.
Now, If I tell you what happen in the airport and later in the airplane, I am sure all of you will be jealous. So let’s skip it, you can imagine something like a running scene in one of traditional Bollywood movie. So, let me directly take you to Singapore airport metro station. You might think how come a guy who have never been out of Nepal know about metro, you are right we didn’t knew about them, we were there searching for a taxi. Thank god! we met a friend mine who came to pick us up. We got a cab and payed it S$ 18.60 to reach our hotel. And as you can imagine with the increase in every cent, we’d convert it into Nepalese rupees. You might think what a sketchy guy I am. But it’s in our blood.
There were quite a few genius guys in the workshop, but it was really hard to remember their names. However, my name sound like Japanese so they tend to remember me. So to tell you more about them, there was this guy named red1 (redone) from Malaysia, he is a leader of an open source ERP software “Admpiere”, and he have also written a book on “Open Source ERP”. His session on Open Source ERP was really helpful, I promised to invited him to Nepal and show some old places in Kathmandu. I also had a short session with a guy from Japan, who was working in the field of grid and distributed computing. He actually did his bachelor’s in astrophysics, and his research was on simulation of the outward movements of the sun, the energy involved in this movements and… you see he had a very poor English, so it was really hard for me to communicate with him. So I hardly understood what he said, but he looked like a real geek to me. Than on second day of the workshop, there was this presentation from this big fat guy, he had a session on Kernel Security. The only thing I understood was the word “kernel”. Everything else was Chinese to me. And my eyes were on clock every minuted, by then a guy from red hat arrived. He brought some Fedora live CDs with him, and cool thing was he had no slides. His speech was really nice and interactive, and as you can imagine I was the one to ask him hell of a lot of questions. However, If you’d ask me best thing about the workshop, I’d say of course the lunch. We were served Chinese, Japanese, Malaysian and some Indian food. Also they used to serve ice plead tea and orange juice which was refreshing in extremely hot weather.
There are so many exciting things I need to tell you about Singapore. But I don’t want to make this blog a boring one to read. However, if you go to Singapore don’t miss Sentosa – an ultimate holiday destination, do go to Clarke if you are party guy but don’t forget to be in a formal dress code (You can take everything off once you get in), also Orchard road and City Hall for some shopping, don’t dare to miss the iconic Merlion by the CBD and Singapore flyer, do click a picture in front of Esplanade, it’s beautiful. I had a great time in Singapore, why don’t you guys pack your bags and run for a ultimate holiday destination.
Hudson is a build tool used to build Java project specially J2EE. It is quite a phenomenal tool which has great integration with maven, ant and many other handful tools. Furthermore, it is very easy to deployable and use. So if you want to learn more about hudson and use it. Please visit:Click here/
Life is so easy when you have hudson with you, because it does everything for you. If you are a java programmer and you love web application then hudson is just around the corner to save your time and mood, because it have a nice interface.
I am just back from a meeting with a lady from german at Java, thamel here in kathmandu, and the first question she asked me was,”What do you do Akash?”. I was not so sure about it, I told her I work. Then she was curious to know about my own but I was reluctant so I told her I use facebook and twitter a lot, and that’s how I work. However, may be she was from 2000 generation so, she asked me if I use IM, I told her I did and also asked her if she have a skype account. She said she had it, and asked me what do I do on skype. I talked to my girl friend!!! that’s what I told her, though.
Now I am back hom analysing my conversation with her.. and the truth is all I do these days is the things I mentioned. I work, I use fb, twitter, IM and I talk in Skype… every single day from moment I wake up till I sleep, my activities list is written above. Holy crap!! What a so called exciting life bounded by technology… So I am thinking of buying a smart-phone to encapsulated all these activities and personalize it… how does that sound??? And when I meet her again, I will say her.. I use
When we were growing up every kid talked about Disney, but I was a different kid, because I never watched cartoon when I was a kid. So when I actually had interest on animation, Pixar was one name whose work I have always admired. So today I had about 3 hours of free time and all I did was check some of Pixar best work, specially in short films in youtube. You can click here if you are interested to enjoy yourself.
Pixar is such a cool work force, they produce things which have meaning and at the same time they provide you lot of entertainment. So kids better be watching some cars, incredibles, up rather than micky mouse stuff.. though you must not avoid watching tom and jerry
I was trying to find a new life which would exciting, full of fun and most importantly “new”. So one day, I saw with my old laptop, sitting in my old couch but with new thought to browse some of newest video in youtube, and all I found was same old video from russel peter. This guys is quite cool and jokes like he is a real punk.. but he is really effective to make people laugh. So then, I started looking for his video and believe me or not I wasted 4 hours watching and downloading each one of them.. if you know russel peter, you must have known his dad’s dialogue”Somebody’s gonna hurt real bad”