Google launches DataFlow (a successor to MapReduce)

2017-08-09T12:30:08+00:00 June 30th, 2014|

I'm in San Francisco ready to attend tomorrow to the 2014 Spark Summit. As I already mentioned in this blog Apache Spark is one technology that's emerged as a potential alternative to Mapreduce/Hadoop. But it seem that it is not the only one.  Last week, also here in San Francisco, at its Google I/O 2014 conference, Google unveiled their successor to MapReduce called Dataflow, which it’s selling through its hosted cloud service (equivalent to Amazon data pipeline service and  Kinesis for real-time data processing). Urs Holzle (Google’s senior vice president of technical infrastructure and a Google Fellow) introduces how Dataflow is used for Analytics during a keynote address at Google I/O 2014 conference  (minute 2:06:30 in this video of the keynote).  The service lets you construct an analytics workflow and then send it [...]

Barcelona Supercomputing Center starts to work on Deep Learning

2017-08-09T12:29:27+00:00 June 26th, 2014|

What is Deep Learning? We can consider Deep Learning as a new area of Machine Learning research with the objective of moving Machine Learning closer to Artificial Intelligence (one of its original goals).  Our research group has been working in Machine Learning for a long time thanks to Ricard Gavaldà who introduced us in this wonderful world. It was during the summer of 2006, also with Toni Moreno, Josep Ll. Berral, Nico Poggi. Unforgettable moments! However, after 8 years we will make a step forward and start to work with Deep Learning. It was during a group retreat held last September when I realise that "Deep Learning"  was an interesting topic thank you to  Jordi Nin. Deep Learning comes from Neural nets conceived in the 1940s, inspired by the [...]

BSC releases COMPSs software for large scale parallelisation

2017-08-09T12:29:31+00:00 June 24th, 2014|

The Grid Computer and Clusters team at the Barcelona Supercomputing Centre has released COMPSs, a set of tools designed to help developers run applications efficiently on distributed systems such as clusters, grids, and clouds. Our research group is using this programming model in some of our ongoing research work. COMPSs is a task based programming model known for notably improving the performance of large scale applications by automatically parallelising their execution. The new release includes PyCOMPSs, a new binding for Python which provides support to large number of scientific disciplines. It also includes some important features as a new tracing system using the Extrae tool and an Integrated Development Environment (IDE) for COMPSs applications that help in the development of the applications and in its deployment in the distributed [...]

Barcelona Spark Meetup: 200 members!!!

2017-08-09T12:29:33+00:00 June 23rd, 2014|

Hi, the Barcelona Spark Meetup  achieved the magical number of 200 sparkers!  Great new for all of us! Thank you! After the successful kickoff , the second meeting will feature David Rodriguez (CTO at Urbiotica) speaking about Apache Spark as a scalable and fault-tolerant environment for batch, speed and serving layer (talk in Spanish). We hope to see you next Thursday 10th July in this second meetup.   And if you have interest for other Spark meetup in the world here you can find the list: Spark User Meetup San Francisco, CA; USA Spark User Group - Hyderabad Hyderabad, India Spark Bangalore, India Seattle Spark Meetup Bellevue, WA; US Stamford Machine Learning / StatLearning Study Group Stamford, CT; USA Vancouver Spark Meetup Vancouver, BC; USA Spark London , United Kingdom, Europe Spark-NYC New York, NY; USA Shanghai Spark Meetup Shanghai, China Silicon Valley Cluster Computing Group San Jose, CA; USA San Diego Spark [...]

A Master on Smart Healthcare starts next September 2014

2017-08-09T12:29:38+00:00 June 19th, 2014|

A good colleague from the University of Girona , Beatriz López, told me about the Smart Healthcare Master. The Master aims to prepare technologists for the current revolution of health services based on the intensive use of data. Have you ever analysed how the use of mobile phones by chronic patients will impact the healthcare system? Is it possible to know how personalized medicine be implemented for dealing with the ageing society? The Master is designed to educate technologists that help on finding answers to these challenging questions. The Smart Healthcare Master is inter-disciplinary, joining disciplines such as Health Sciences (healthcare processes; decision-making in healthcare processes),  Artificial Intelligence (intelligent systems; machine learning), Data Analysis (intelligent data analysis; applied biostatistics),  and Organisation and Management (quality and standards; health informatics). [...]

Inicio del “Barcelona Spark Meetup” con Telefónica I+D

2017-08-09T12:29:42+00:00 June 10th, 2014|

El próximo jueves llega el primer encuentro mensual del Barcelona Spark Meetup. Impresionante la aceptación que ha tenido este grupo de Spark creado recientemente. ¡Gracias a todos!. Por ello nos hemos animado y ya hemos previsto las posibles actividades en los próximos meses. En este primer encuentro nos acompañaran Daniel Tapiador [1] y Ignacio Blasco[2] de Telefónica I+D para contarnos el rol que juega esta tecnología emergente en la I+D de Telefónica.  El próximo mes de Julio será  David Rodriguez CTO de Urbiotica quien nos acompañará. Y para pasado el verano, en Septiembre,  ya tenemos prevista la visita de Daniel Villatoro, Senior Data Scientist del BBVA  y Jordi Aranda, uno de los principales investigadores en Big Data del BSC, para Octubre. Y en Noviembre esperamos tener alguna sorpresa para todos aprovechando la coincidencia de tener en [...]

Apache Spark 1.0.0: Spark SQL replaces Shark

2017-08-09T12:29:45+00:00 June 2nd, 2014|

Apache Spark 1.0.0 released on May 30th. Alessandro Chacon, a former student, realized that there are a new addition into the Spark ecosystem called Spark SQL. Spark SQL is separate from Shark (the current systems used), and does not use Hive under the hood. With the advent of Hadoop and NoSQL databases, building a data warehouse for processing big data became easier, however it requires specialized development skills and a non-trivial amount of effort. Hive solved this problem by providing a familiar SQL querying engine on top of Hadoop, that translates SQL queries into MapReduce code. Spark provides a similar SQL querying engine called Shark. Shark still relies on Hive for query-planning, but uses Spark instead of Hadoop during the physical execution phase. In conclusion, Spark SQL, is an alternative SQL engine, one that is divorced from Hive!. It provides schema-aware [...]

Business-Driven Resource Allocation and Management for Data Centers in the Cloud Markets

2017-08-09T12:29:47+00:00 June 1st, 2014|

This week, Mario Macias, one of the researchers in our research group, did their PhD dissertation. The work is centered in the Cloud Computing arena. Cloud Computing markets arise as an efficient way to allocate resources for the execution of tasks and services within a set of geographically dispersed providers from different organisations. Client applications and service providers meet in a market and negotiate for the sales of services by means of the signature of a Service Level Agreement that contains the Quality of Service terms that the Cloud provider has to guarantee by managing properly its resources. Current implementations of Cloud markets have certain weaknesses at this level. Mario's work present  interesting solutions for them. I’m really proud of Mario’s work.  My sincere congratulations to Mario and also to his advisor, Jordi [...]

3rd International Workshop on Citizen Networks

2017-08-09T12:29:49+00:00 June 1st, 2014|

We are pleased to announce the organisation of the workshop ‘CitiNet 2014’ by BSC during ECCS 2014. CitiNet 2014 will be a one-day event that will take place on the 25th of September 2014 in Lucca (Italy). The topics addressed include geo-spatial analytics, urban analytics, urban modelling and simulation, and citizen sensor networks, among others. Detailed information about the topics of interest, the workshop programme and the call for papers can be found at the workshop website.     Important Dates * Submission Deadline: June 15, 2014 * Authors Notification: July 14, 2013 * Conference: September 15, 2014