Bequest Connecta’t a la Enginyeria per a joves

April 24th, 2014|

M'han demanat si podria ajudar a difondre la Beca Carnet Jove Connecta’t a la Enginyeria, una de les onze beques del Carnet Jove aquest 2014 amb l’objectiu de  promoure la participació i inserció dels joves al món professional. Només faltaria!  enginyers  i enginyeres és del que més falta a aquest país! La Beca Connecta’t a la Enginyeria ofereix una estada formativa d’un any en una empresa referent del sector amb una dotació de 12.000€. La convocatòria és oberta fins al 28 de maig de 2014. Més informació la podeu trobar en aquesta pàgina. Vinga, si tens el carnet jove, t'animes?

Spark Ecosystem

April 21st, 2014|

In a previous post  we introduced Spark, a framework that will play an important role in the Big Data area.  You can find a good starting point to understand what is Spark following this page from DataBricks, however let me reproduce an overview in this post. Spark runs on top of existing Hadoop clusters to provide enhanced and additional functionality. Although Hadoop is effective for storing vast amounts of data cheaply, the computations it enables with MapReduce are highly limited. MapReduce is only able to execute simple computations and uses a high-latency batch model. Spark provides a more general and powerful alternative to Hadoop's MapReduce, offering rich functionality such as stream processing, machine learning, and graph computations.  Spark provides out of the box support for deploying within an existing Hadoop [...]

Spark: Big Data Analytics Beyond Hadoop

April 20th, 2014|

Hadoop is definitely the de-facto standard for large scale data processing across nearly every industry and enterprise. However, while  "Volume", "Variety" and "Velocity" of data increases, Hadoop as a batch processing framework cannot cope with the requirement for real time analytics.  As we saw in our Technology Basics  for Data Scientist course, the scientific community is offering alternatives like Storm framework that provides event processing and distributed computation capabilities open sourced by Twitter. Storm uses custom created "spouts" and "bolts" to define information sources and manipulations to allow batch, distributed processing of streaming data.  A Storm application is designed as a topology of interfaces which create a "stream" of transformations. It provides similar functionality as a MapReduce job with the exception that the topology will theoretically run indefinitely until it is manually terminated. Hortonworks, one of the [...]

Drones: Google acquires a new startup

April 15th, 2014|

    Google acquired a maker of solar-powered drones, the start-up Titan Aerospace, a company that builds large drones that rely on solar power to stay in flight for years. As I read they can carry without problem equipment to carry out applications like mapping, tracking and communication.  It seems that Google will have Tital Aerospace work closely with  the project Loon, which uses high-altitude balloons to provide internet. I read at Gigaom that the drones can deliver internet at speeds up to 1 gigabit per second.  But the drones could also be a useful to Google Earth, because drones could generate images that are refreshed more frequently that images taken by sattelites , opening up new possibilities for Google Earth.  And anything else that we can not even imagine! His imagination is enormous, [...]

Hadoop distribution: Main Players-Actores principales

April 5th, 2014|

MAIN PLAYERS Apache Hadoop is the most popular framework used for processing large amounts of data in the Big Data arena. It is clear that Hadoop is here to stay. That is why I always suggest to my students that it is important to know how it works. For the courses I teach where we do not have lab sessions I produced this hands-on for a quick glimpse. If you are interested in learning more about Hadoop you can start with this hands-on that includes some bibliographic references. Some former students and friends who are already in the industry have asked me for a recommendation of some of the distributions available in the market. Each distribution is different and as a researcher I do not have an in-depth [...]