Software Development Intern
Anmol Rajpurohit is a graduate student (MS, Computer Science) at UC, Irvine. He is former Software Development Intern at Salesforce. His areas of interest are data science, machine learning and information retrieval. His novel analytics solution for online education was the runner-up at UCLA Developer's Contest 2014. His research work on "Big Data for Business Managers: Bridging the gap between potential and value" was selected for IEEE BigData 2013 conference.
His strong technical background and analytical skills have led him to a number of international research projects and internships. In collaboration with UCLA, he developed components for the popular future internet architecture known as Named-Data Networking(NDN).
Besides research and development, he loves blogging on data science news and trends at kdnuggets.com. Anmol completed his Bachelor of Technology in Computer Science & Engineering from LNM Institute for Information Technology (LNMIIT), Jaipur, India in 2013.
University Of California, Los Angeles, USA
Information Cryptology & Construction Lab(ICCL), Taiwan
LNM Institute of Information Technology, India
Communication Empowerment Lab, IIT Kharagpur, India
LNM Institute of Information Technology, India
M.S. Computer Science
University of California, Irvine
B.Tech. Computer Science and Engineering
LNM Institute Of Information Technology, India
My research and development efforts are primarily inspired towards developing the next-generation of internet and internet based applications. Within that universe, my focus is particularly on three aspects of future internet – Data Science (understanding the information flow over internet), Network Architecture (designing protocols for future internet), and Video Coding (improving the performance of the most popular data type used over internet).
My work on Big Data for Business Managers was published in IEEE BigData 2013. Besides research work, I blog on Big Data trends at KDnuggets.com.
Paper(abstract & full-copy): http://bit.ly/1fiohcT
In case you do not have IEEE subscription and need my paper, email me with the subject "Request for IEEE paper: Big Data - Potential to Value".
Developed modules to facilitate the use and assessment of NDN (Named Data Networking) framework for real-life innovative applications such as server-less multi-user chat and environmental sensing.
Technical Report (which I contributed to): http://named-data.net/wp-content/uploads/2014/09/CCLTechReport.pdf
UCLA page: http://bit.ly/1finWHf
My research on video coding is currently on hold due to other time commitments. I am working on high performance low complexity alternatives to H.264/AVC, which would enable transmission of higher resolution videos to mobile devices(having low computational power) while also creating opportunities to boost security and protect IP rights.
Lab link: http://bit.ly/1bNvmDi
The realtion manager administers the database tables. It handles the creation/deletion of tables and other basic operations performed on top of a table. Relation manager (RM) also creates and initializes catalog to store all information about database.
Key concerns with the existing protocols were identified and listed as opportunities for future research.
Simulation tool: NS2
The page file manager facilities for higher-level client components to perform file I/O in terms of pages. The record-based file manager built on top of the basic paged file system handles record-based operations such as inserting, updating, deleting, and reading records.
Key concerns with the existing protocols were identified and listed as opportunities for future research.
Simulation tool: NS2
Researched the current popular approaches to enhance MANet (Mobile Ad hoc NETworks) performance through location services. A comprehensive qualitative comparison was performed and documented in the research paper.
Key concerns with the existing protocols were identified and listed as opportunities for future research.
Simulation tool: NS2
Designed an efficient algorithm to assist a web crawler through prioritization and optimization of the input URL set. The algorithm was tested against URL Reputation Data Set (a set of 16,000 URLs with over 3 million attributes). The design included data selection, preprocessing, entropy calculations and finally, information gain comparisons to deliver the list of selected URLs, sorted in order of importance.
Development platform: C/C++, Matlab
Developed a document management system that personalizes the document search results to the user’s activities, making the search results much more relevant. The software reads the PDF files, extracts keywords, prepares an index, records user activity and delivers search results through a smart integration of all the information.
Development platform: Java (Using DynamicPDF class library)
Built the E-R model, data schema and database application for tracking goods at shipping ports and during transportation. The application enabled real-time monitoring and management of goods across various stages of transportation – right from pick-up to the ultimate delivery. Different user roles were created with varying access to data and management tools.
Development platform: MySQL, PHP, HTML, CSS
Designed and developed a simulator program that runs different CPU scheduling algorithms and produces utilization matrices including CPU utilization, waiting time of each process and average waiting time, response time of each process and average response time, turn-around time of each process and turn-around waiting time.
Development platform: C++
I investigated why Big Data Return over Investment (ROI) lagged far behind its potential despite using the best technology and the best people. I did an extensive analysis of Big Data processes across industries from business as well as technology perspective. To bridge the gaps in my analysis, I interviewed a few business managers and academic researchers. It was amazing to observe how dismal ROI results are linked to very basic set of common errors prevalent across the various steps of data mining. Based on this research I developed a generic ROI focused framework for leveraging Big Data, which could be easily customized to particular industry needs. My research work was selected for IEEE Big Data 2013. Based on the industry response at the conference, I am now working on developing a toolkit comprising of templates, checklists and industry benchmarks that can be conveniently used by Big Data business leaders with little background in technology.
Given the surge of interest in research, publication and application on Big Data over the last few years, the potential of Big Data seems to be well-established now across businesses. However, in most of the business implementations Big Data still seem to be struggling to deliver the promised value (ROI). Such results despite using the market leading Big Data solutions and talented deployment team are forcing the business managers to think what needs to be done differently. This paper lays down the framework for business managers to understand Big Data processes. Besides providing a business overview of Big Data core components, the paper presents several questions that the managers must ask to assess the effectiveness of their Big Data processes. This paper is based on the analysis of several Big Data projects that never delivered and comparison against successful ones. The hypothesis is developed based on public information and is proposed as the first step for business managers keen on effectively leveraging Big Data.
I have significant work experience on research and development projects. Since my second year of UG degree program, I have contributed to several multi-disciplinary research projects as highlighted below.
Working within Monitoring Cloud I audited legacy software infrastructure and proposed measures to improve efficiency and flexibility with regards to future needs. I also collaborated with various remote/onsite teams for successful migration of production-support tests based on APIs from a leading network monitoring service. Identified and resolved bugs in infrastructure monitoring modules
In collaboration with Dr. Gregory Piatetsky-Shapiro, I researched Big Data trends and published articles on KDnuggets.com. I also did integration, mining and analysis of data from multiple sources such as social media (Twitter and LinkedIn) and web analytics for customer insights. Besides, I had developed interactive data visualization to assist quick progress from observing issue to root cause identification.
Since May 2012, I have been actively involved in the research and development of a new, better architecture for internet – Named Data Networking (NDN) at UCLA REMAP (Center for Research in Engineering, Media and Performance). The fundamental changes introduced by NDN to the internet communication paradigm call for extensive and multi-dimensional evaluations of various aspects of the NDN design, which comprised of majority of my work at UCLA.
During my research, I built several modules that facilitate the use and assessment of NDN framework for real-life innovative applications such as server-less multi-user chat and environmental sensing. I used these modules to do performance assessment and benchmarking of key components of the NDN architecture. The insights provided by my analysis led to the identification of architecture gaps, thus, providing direction to future research. My research and consequently developed applications clearly demonstrated how NDN’s embedding of application names in the routing system promotes efficient authoring of sophisticated distributed applications, reducing complexity and thus opportunities for error, as well as time and expense of design and deployment.
Implemented and integrateda side-matchprediction scheme into the state-of-the-art H.264/AVC framework to provide a new predictive approach. The proposed method generates a side-match predicted image from original image through the process of side match prediction. Side match prediction uses the neighboring known pixels to predict an unknownpixel.The prediction error sub-image generated by subtracting the side-match prediction image from original image is then encoded with intra prediction modes, quantization parameters and, scanning patterns.
Delivered lectures on a range of video encoding related (specifically H.264/AVC) topics to graduate students , research scholars and faculty. The lectures covered the latest research in the arena of efficient entropy coding scheme for H.264/AVC lossless video coding, reversible data hiding using side-match prediction on stegnographic images and reversible watermarking based on invariant image classification and dynamic histogram shifting.
Improved and expanded the functionalities of file browser for Sahaj Linux – a simple, minimalist and user-friendly OS for rural India. Developed and optimized the new file browser capabilities as a wrapper to the gnome interface of Linux using GTK+ library for windowing. The Sahaj Linux file browser lets users navigate multiple folders and applications in an intuitive and convenient way reducing browsing time significantly.
Designed and developed a mathematical visual learning tool supporting floating-point numbers and better error-handling capabilities. The tool visually explains how a complex mathematical expression is evaluated as per BODMAS rule, and lets children play with it to enhance their learning.
The blog is currently under construction. Until then you can access my following publications on KDnuggets, a leading site on Analytics, Big Data, Data Mining, and Data Science
We discuss large-scale data architectures in 2020, career path, open source involvement, advice, and more.
We discuss the startups landscape in Big Data, valuation of Big Data companies, recognition earned by Datameer, and why SQL on Hadoop is a bad idea.
We discuss common pain points in Big Data projects, evolution of Datameer technology, department specific solution – Datameer Professional, Datameer 5.0 Smart Execution, tacking over-simplicity and more.
We discuss the challenges in analyzing global economic datasets, impact of Big Data growth on economics, desired skills in data scientists, and more.
We discuss Analytics at Prevedere Software, understanding the impact of external factors on a company’s performance, features of in-memory correlation engine and economic intelligence by Prevedere.
We discuss the Big Data architecture at Toyota, executives’ perception of Analytics, Toyota Innovation Fair, advice, trends, and more.
We discuss Toyota’s Customer 360 Advanced Analytics and Insights platform, Product Quality Analytics system, Predictive Analytics use cases & performance assessment, and challenges in analyzing data from social media.
We discuss success factors with polyglot architectures, Big Data challenges, recommendations for using Big Data technologies, trends, advice, and more.
We discuss the role of Analytics at Art.com, the polyglot data architecture at Art.com, the use cases for Hadoop, vendor selection, supporting semantic search and experience with Avro.
Digital Trust is at a deficit – concludes the 2015 Accenture Digital Consumer Survey report “Digital Trust in the IoT Era”
We discuss the tools used for data science, competitive landscape, journey from astrophysics to data science, advice, skills sought in data scientists, and more.
We discuss the role of Analytics at Groupon, deciding factors for merchant priority, limitations of historical data, optimizing the efforts of sales force, data characteristics and dealing with Data Sparsity.
Music video featuring Big Data and Hadoop (and Map-Reduce and NoSQL) might be all you need to light up your day!
We discuss career advice, need for customer-focus, Analytics trends, desired skills in Data Science practitioners, and more.
We discuss Analytics at Visa, adapting to the Big Data world, gaps between expectations and delivery from Analytics, delivering Actionable Insights, and tools/technologies used.
We discuss securing data-at-rest and data-in-motion, security recommendations for data architectures, trends, advice, and more.
We discuss the security concerns in Big Data, challenges in securing Big Data locally and over cloud, and open source solutions – Knox and Ranger.
We discuss HP Security Voltage growth story, HP acquisition, assessing the state of current security standards, and the need for “data-centric” security
We discuss the critical success factor for open source projects, entrepreneurial lessons, advice, desired qualities in data scientists and more
We discuss the importance of enabling self-service analytics, partnership with Cask, Big Data vendor selection and competitive landscape.
We discuss Cloudera’s achievements, story behind the name ‘Cloudera’, CTO role, and key attributes of information architecture designed for future.
We discuss the origin of Apache Myriad, state of security in Big Data, MapR Quick Start Solutions, Hadoop vendor selection criteria, and more.
We discuss how analytics can impact the business “as-it-happens”, merging business analytics with production operations, transition challenges, and recently announced partnership with Teradata.
Highlights from keynote speeches delivered by various eminent big data technology leaders from industry and academia at Spark Summit 2015 Conference held in San Francisco.
Highlights from keynote speeches delivered by various eminent big data technology leaders from industry and academia at Spark Summit 2015 Conference held in San Francisco.
We discuss discovery vs. personalization, advice, trends, desired skills in data scientists, and more.
We discuss role of analytics in content acquisition, data architecture at Netflix, organizational structure, and open-source tools from Netflix.
We discuss the steps involved in Discovery process at Netflix, impact due to multitude of devices, system generated logs, and surprising insights.
We discuss advancements in the field of Personalization, lessons from winning sorting competitions, Data Science trends, career advice, and more.
We discuss categorization of e-commerce analytics, opportunities/ challenges of Big Data, Astro predictive model for Hadoop cluster management, and Apache Kylin.
We discuss the advantages of Phoenix, upcoming features, soon coming-up support for transactions, trends, advice, and more.
We discuss the beginning of Phoenix project, decision of making it open source, relational database layer on HBase, and key reasons for the superior performance of Apache Phoenix.
We discuss the evolution of Data Science expectations, Data Science as a career, advice, and more.
Review of the steps taken by White House over last six months to modernize police data systems to better fight crime as well as build trust between community and police.
We discuss the chief data officer role at CFPB, big data opportunities and challenges, ontology, vintage data, data governance trends, advice, and more.
We discuss lessons from implementing lambda architecture, impact of Big Data on recommender systems, trends, advice, and more.
We discuss the role of Data Science team at Ticketmaster, ecommerce data characteristics, analytics based on highly variant data flow, infrastructure challenges, and merits of lambda architecture.
We discuss Customer Lifetime Value (CLV) metric, maturity level for the CLV metric, different models for calculating it, challenges in designing strategy based on CLV and tackling attribution.
We discuss the impact of rapid growth in magnitude of data, programming skills for data science, major trends, advice, data science skills, and more.
We discuss Predictive Analytics projects at Sharp Labs of America, common myths, value of simplicity, tools and technologies, and notorious data quality issues
We discuss dealing with current gaps in healthcare data, challenges in using real world healthcare data, desired skills for data scientists in healthcare industry, advice, and more
We discuss the challenges and opportunities created by increased collection of healthcare data, state of data accessibility, and the value of Analytics to the drug development process.
Highlights from the presentations by Gaming Analytics leaders from Activision, Riot Games and Daybreak Game Company (formerly Sony Online Entertainment) on day 2 of Gaming Analytics Innovation Summit 2015 in San Francisco.
We discuss Analytics at ScoreBig, company’s business model, unexpected insights, challenges in customer value management, advice, and more.
Highlights from the presentations by Gaming Analytics leaders from Facebook, Turbine/Warner Bros Games, and Sega on day 1 of Gaming Analytics Innovation Summit 2015 in San Francisco.
We discuss the challenges in tracking social media sharing, advice, important trends, and more.
We discuss Mashable’s milestones, data-driven digital publishing, digital media tracking, viral prediction, and Mashable Velocity.
We discuss marketing analytics at Facebook, multi-channel performance assessment, success factors, lessons from Look Back feature, advice, and more.
We discuss the field of Big Data for Development, current projects and future plans for Data-Pop Alliance, public participation opportunities, advice, and more.
We discuss the 3 Cs of Big Data, state of ethics in the field of Big Data, and how to ensure that the benefits of Big Data reach the masses.
Highlights from the presentations by Big Data and Analytics leaders/consultants on day 3 of Big Data Bootcamp in Austin.
Highlights from the presentations by Big Data and Analytics leaders/consultants on day 2 of Big Data Bootcamp in Austin.
We discuss the emerging Big Data ecosystem, its key players, and the severe consequences of inadequate statistical capabilities across many African nations.
Highlights from the presentations by Big Data and Analytics leaders/consultants on day 1 of Big Data Bootcamp 2015 in Austin.
We discuss the founding story of Data-Pop Alliance, the applications and implications of Big Data on Human Rights and the need for penetration of Data Literacy.
We discuss the response from hiring companies, recommendations for aspirants, retaining data science talent, advice, and more.
Kaggle top ranker Xavier Conort shares insights on the “10 R Packages to Win Kaggle Competitions”
We discuss the launch of the Data Incubator, its business model, why we need data-driven hiring, selection process for the incubator program and alumni feedback
We discuss career advice, motivation, key qualities sought in Data Science practitioners, and more
We discuss recommendations for data-driven decision making, challenges and benefits of using unstructured data, managing expectations and key trends.
We discuss Predictive Analytics use cases at Verizon Wireless, advantages of a unified data view, model selection and common causes of failure.
We discuss the key lessons from shifting to Hadoop, data management in today’s world, future of Data Science, advice and more.
We discuss EDM at Time Warner Cable, data sources, complementing legacy data warehouses with Big Data solutions, vendor selection and build vs. buy decision.
We discuss challenges in analyzing text data, Big Data impact on translational bioinformatics, advice, desired skills in data scientists, and more.
We have a marketplace for almost everything – mobile apps, cabs, hotels, and what not. But, not for algorithms. Algorithmia takes up that challenge.
We discuss Analytics at AstraZeneca, prominent use cases, how NLP helped understanding patient treatment journey in diabetes, data sources, insights, and more.
We discuss the challenges of analyzing crowdsourcing data, tools and technologies, competitive landscape, advice, trends, and more.
Highlights from the presentations by Predictive Analytics leaders from eBay, LinkedIn and Facebook on day 2 of Predictive Analytics Innovation Summit 2015 in San Diego.
We discuss the dynamics of Ranker crowdsourcing platform, key factors for effectiveness, role of data science in crowdsourcing, and more.
Highlights from the presentations by Predictive Analytics leaders from The Data Incubator, Tamr, Sony and Facebook on day 1 of Predictive Analytics Innovation Summit 2015 in San Diego.
A long, categorized list of large datasets (available for public use) to try your analytics skills on. Which one would you pick?
We discuss recent events at Washington Post, growth initiatives, the growing pain of Dark Social, how to deal with it, audience analytics, advice and more
Highlights from the presentations/tutorials by Data Science leaders from VISA, Glassbeam, Unravel on day 3 of Big Data Developer Conference, Santa Clara
We discuss interesting trends, motivation, different aspects of data scientist job, advice, and more.
Highlights from the presentations/tutorials by Data Science leaders from Cloudera, LinkedIn, Intel, MapR, Locbit and others on day 2 of Big Data Developer Conference 2015.
Highlights from the presentations/tutorials by Data Science leaders from ElephantScale, SciSpike, Twitter and Informatica on day 1 of Big Data Developer Conference, Santa Clara
We discuss Analytics at Glassdoor, important lessons, major factors affecting job satisfaction, challenges of working on Twitter Data, indispensable components of Data Science education.
We discuss Predictive Analytics in Oil & Gas industry, Big Data analytics, key drivers of success,common reasons of failure, trends, advice, and more.
We discuss Analytics at Halliburton, Big Data challenges unique to Oil & Gas industry, and the 7 V’s of Big Data.
We discuss challenges in applying Data Analytics to sports, advice to beginners in the field of Sports Analytics, and more.
We discuss the success of Analytics in predicting sports injuries, recent progress in concussion management and the trends in data-driven evidence-based sports medicine.
We discuss how United States Olympic Committee uses Big Data, how athletes respond to Analytical insights, integration of sports medicine into sports performance and sports injury.
We discuss benefits and challenges of Data Lake, trends, life lessons, motivation, desired skills, and more.
We discuss the role of Analytics at GE, Industrial Internet and how it is different from consumer internet, and the key capabilities of Predix.
We discuss the challenges in making personal styling recommendations, unexpected insights, interesting trends, motivation, advice, desired qualities in data scientists and more.
We discuss StitchFix, how it leverages Analytics, understanding customer preferences, and pros-and-cons of involving human judgement in the recommendation process
We discuss common characteristics of games that achieved top ranking, career advice, trends, desired qualities in data scientists and more
We discuss key characteristics of social gaming data, ML use cases at King, infrastructure challenges, major problems with A-B testing and recommendations to resolve them.
We discuss data gravity and its implications, Riak Enterprise 2.0, Riak CS 1.5, competitive landscape, challenges and more.
We discuss the future of distributed storage for enterprise, Scale-up vs. Scale-out, software design patterns in Cloud era, microservices model and the place for legacy database in modern enterprise IT.
We discuss recommendations for Data Governance policies, advice, Big Data trends, qualities sought in Data Scientists, and more.
We discuss the responsibilities of Enterprise Data Strategy team at Equifax, why Data Lake, Equifax Decision360, how to set up Insights Culture and bottlenecks for value delivery from Big Data.
We discuss handling bias in data, other data quality concerns, advice, desired qualities, and more.
We discuss Analytics challenges at Activision, event data from games such as Call of Duty, balancing aesthetics and inference in visualization, problem with stacked charts and more
We discuss Analytics use cases, challenges in relating molecular/clinical data to real-life outcomes, Healthcare Analytics trends and more.
Strata + Hadoop World 2015 was a great conference, and here are key insights from some of the best sessions on day 2.
We discuss Big Data Analytics at Berg, making Healthcare effective through Big Data, impact of falling cost of DNA sequencing, Berg AI-Analytics Suite and more.
We discuss challenges in leveraging Big Data, important attributes while profiling employers and job seekers, competitive landscape, desired skills in data scientists and more.
We discuss analytics at ChinaHR, matching job seekers and employers, traditional job fairs vs online recruitment, key metrics and analytical insights.
We discuss different levels of Data Integrity, logical fallacies in Analytics, measures to boost accountability, role for human intelligence in Analytics and relevance of OCCAM framework.
We discuss Apache Mahout, its comparison with Spark and H2O, trends, advice, desired qualities in data scientists and more.
We discuss major Big Data developments in 2014, real-time processing, interactive queries, streaming systems, batch systems, MapR partnerships and challenges in scaling recommendation engines.
Here are the quick takeaways and valuable insights from selected talks at one of the most reputed conferences in Big Data – Strata + Hadoop World 2015, San Jose.
We discuss challenges of dealing with healthcare data, trends in healthcare analytics, important skills for data scientists and more.
We discuss how to establish credibility of data analytics, recommendations for a data-driven culture, analytics challenges in healthcare and more.
We discuss Big Data & Analytics at Geisinger Health System, CDO challenges, and impact of Big Data on decision making at executive level.
We discuss the impact of increasing amount of data on visualization, difference between Data Analysis and Data Analytics, motivation, trends, desired skills and more.
The Automatic Statistician project by Univ. of Cambridge and MIT is pushing ahead the frontiers of automation for the selection and evaluation of machine learning models. In general, what does automation mean to Data Science?
We discuss data visualization at Boeing, the importance of Visual Analytics, Aviation Safety improvement through Analytics and augmented reality.
We discuss the competitive differentiation of MapR, challenges in consumerizing Big Data, trends, strategy recommendations, desired skills and more.
We discuss the launch and evolution of MapR, achievements, key characteristics of MapR-DB, significance of Apache Drill, MapR use cases and more.
We discuss the journey of Business Analytics, definition of Right Data, competitive differentiation of Nuevora, challenges in the large-scale consumerization of analytics, and more.
We discuss the value proposition of Nuevora, founding story, CMO expectations from Analytics and the Nuevora nBAAP platform.
We discuss Agile Digital Transformation, Optimization vs Innovation trade-off, best innovations of 2014, trends, advice and more.
We discuss SAS Analytics Center of Excellence, trends, advice, desired skills in data science and more.
We discuss Agile Analytics, moving from traditional Analytics to Agile, challenges in operationalizing Analytics, SAS Enterprise Decision Management and SAS In-Memory Statistics.
We discuss the change in Big Data priorities, risks, Big Data ecosystem, rise of data culture in organizations, challenges, advice and more.
We discuss the best resources to learn Topology, career motivation, important qualities sought in data scientists and more.
We discuss examples of Topological Data Analysis (TDA) revealing new insights, recommended approach for creating Topological Summaries, Manual vs Automation approach and trends.
We discuss the definition of Topology, its relevance to Big Data and compare Topological Data Analysis (TDA) with other approaches.
We discuss Yahoo’s contributions to Big Data ecosystem, recommendation to Big Data vendors, predictions for Big Data, advice, and more.
We discuss the major Big Data uses cases at Yahoo, major challenges, trends in enterprise Big Data implementations, and advantages of using Spark.
We discuss the focus areas of Big Data strategy at SAP, how SAP is leading the competition, the kind of data scientists we need, advice and more.
We discuss the current perceptions of Big Data, challenges for Big Data consumerization, dealing with the talent gap, and business strategy for Big Data.
We discuss H2O use cases, resources to start using H2O for Deep Learning, evolution of High Performance Computing (HPC) and the future of HPC.
We discuss how Deep Learning is different from the other methods of Machine Learning, unique characteristics and benefits of Deep Learning, and the key components of H2O architecture.
We discuss curriculum development around Data Science, trends in Big Data arena, qualities sought in students and more.
We discuss Twitris—a tool for collective social intelligence, challenges in using social data to get actionable insights during emergency situations, managing Data Variety, and entrepreneurship.
We discuss the definition of Smart Data, how to derive Smart Data from Big Data, maturity assessment for Smart Data pursuit, computing for human experience and Kno.e.sis.
We discuss insights from the best paper at ACM AVI 2014, increasing interest in visualization, infographics, trends, challenges, advice and more.
We discuss the ClearStory Data’s competitive differentiation, client use case, Big Data trends, advice, desired soft skills in data scientists and more.
We discuss the founding story of ClearStory Data, progress since its launch, Collaborative StoryBoards, common pain points in business analytics and data harmonization.
Highlights from the presentations by Predictive Analytics leaders from Netflix, LinkedIn and Mashable on day 1 of Predictive Analytics Innovation Summit 2014 in Chicago.
We discuss Analytics at STATS, typical daily tasks, ICE Analytics platform, key challenges, response from coaches/players, career advice and more.
We discuss the challenges in implementing end-to-end solutions for Big Data, Platfora use cases, Big Data trends, advice and more.
We discuss the importance of self-service model for Big Data tools, Small Data vs. Big Data, unique advantages of Platfora, key enhancements in Platfora 4.0 and more.
We discuss the implications of Cloud Speed of technological advancement, significant trends in Internet of Things (IoT), future of cloud computing and more.
CrowdFlower infographic predicts the hot trends for data science in 2015 and which trends will fade away.
We discuss the shortcomings of football analytics, how San Francisco 49ers use analytics, future of football analytics, advice and more.
We discuss the role of analytics in football, the underrated challenges, evolution since the era of draft trade value chart and analytics-supported team selection.
We discuss the challenges in simultaneously managing asynchrony and partial failure, the problem of composition, research motivation, trends and more.
We discuss the performance limitations caused by treating datastore as black box, consistency as an application-level property, Dedalus and LDFI approach for testing.
SpaceCurve Spatial Data Platform will deliver strong spatial and temporal analytics capabilities. SpaceCurve CEO Dane Coyer and CTO Andrew Rogers tell us more.
We discuss the value of Big Data for SMBs, how Cognitive will impact Big Data, IBM’s distinction from competition, significant trends and more.
We discuss why not to focus on a single technology in Big Data, prevalent myths, what IBM & Twitter partnership means for the world, and current state of data governance.
We discuss the challenges in identifying the fair price of ad media, recommendations for building effective models for online marketing, unique challenges of Mobile channel, selection of Big Data tools, and more.
We discuss Analytics at Macys.com, comparison of advanced analytics with traditional BI, building data models for scalability, problem of data models becoming quickly obsolete and challenges in customer targeting.
We discuss Analytics at Gilt, unique Analytics challenges of a flash sales portal, consumer behavior across channels, interesting insights, advice and more.
We discuss how the increasing use of Analytics will change the game of basketball, the concern of Analytics ruining the game, significant trends, advice and more.
We discuss the emergence of Optical Analytics, its impact on NBA, challenges in integrating Optical Analytics into game strategy, the trade-off of analytical insights vs gut instinct and the impact on fan engagement.
We discuss what distinguishes Azure ML from its intense competition, the online machine learning university, current maturity level of Big Data solutions, important skills for data scientists and more.
Microsoft Corporate VP of Machine Learning, Joseph Sirosh talks about the recently released Azure ML, users’ feedback, favorite business use cases, how Azure ML fits in the Microsoft’s portfolio for Big Data solutions and more.
With advanced capabilities, free access, strong support for R, cloud hosting benefits, drag-and-drop development and many more features, Azure ML is ready to take the consumerization of ML to the next level.
In an exclusive interview with KDnuggets, Marten talks about HP’s Open Source strategy, evolution of Open Source production model, learning from the success of Open Source in Web, trends and more.
In an exclusive interview with KDnuggets, Marten talks about the future of Eucalyptus (recently acquired by HP), defines Hybrid Clouds and their importance, and gives some tips for vendor selection.
Import.io's new feature - 'Magic' allows users to instantly turn web pages into tables of data: No Plugins, No Training, No Setup. Learn More.
One of the biggest events at Big Data NYC 2014 was the insightful presentation by Jeff Kelly from WikiBon. We provide here the key takeaways.
Highlights from the presentations by Big Data leaders from Paypal, Huawei and Qantas on day 2 of Big Data & Analytics Innovation Summit 2014 in Sydney, Australia.
A short overview of Natural Language Processing tools and utilities developed by Prof. Noah Smith, CMU and his team to analyze Twitter data.
Highlights from the presentations by Big Data leaders from GE Capital, Datawatch and MapR Technologies on day 1 of Big Data & Analytics Innovation Summit 2014 in Sydney, Australia.
Big Data Boot Camp LA provided attendees a comprehensive understanding of Big Data and Hadoop technologies. Sujee Maniyam provided a good technical overview of Hadoop and current trends. We provide key takeaways.
Highlights from the presentations by Analytics leaders from San Francisco Giants, New York University and LA Dodgers on day 2 of Sports Analytics Innovation Summit 2014 in San Francisco.
Highlights from the presentations by Analytics leaders from San Francisco 49ers, United States Olympic Committee, and Chelsea FC on day 1 of Sports Analytics Innovation Summit 2014 in San Francisco.
Highlights from the presentations by Big Data leaders from The Hershey Company, Gongos, Clarks, and Mediacom on day 2 of Big Data & Analytics for Retail Summit 2014 in Chicago.
Highlights from the presentations by Big Data leaders from Sony Pictures Entertainment, Macy's and Nuevora on day 1 of Big Data & Analytics for Retail Summit 2014 in Chicago.
KDnuggets launches Spotlight initiative to bring attention to academic research. The journey begins with Prof. Eamonn Keogh and his student, Yanping Chen, who are applying data mining to save us all from insect-vectored diseases.
We discuss social media strategy at U-Haul, the key drivers of a social media campaign, identifying what data to focus on, important metrics, career advice and more.
Highlights from the presentations by Big Data leaders from Accenture, Analytics Media Group, SAS and Intel on day 2 of INFORMS The Business of Big Data.
Highlights from the presentations by Big Data & Analytics experts from Microsoft, Sears Holdings and Obama for America on day 2 of Customer Analytics Summit 2014.
We discuss the impact of Big Data advancements on business strategy, value proposition of Big Data, importance of partnerships, key risks and mitigation strategy, how to win sustained patronage for Big Data projects and more.
Highlights from the presentations by Big Data & Analytics experts from ShareThis, Netflix and Ancestry on day 1 of Customer Analytics Summit 2014.
We discuss the story of Alpine Data Labs, the recent recognition of Alpine, effect of YARN, major customer use cases, and challenges in consumerizing Big Data.
Import.io raises $3M round from Jerry Yang and David Axmark. Released a streamlined version of web data extraction tool with exciting new features.
We discuss how to tame Big Data through harnessing data and harvesting value, the top Big Data priorities in Insurance sector, short-term and long-term needs of Healthcare Analytics, and more.
We discuss the role of data science at StumbleUpon, the shift from search to discovery, metrics for user engagement, the art of collaborative filtering, how native ads improve user experience, major trends, advice and more.
We discuss Actionable Analytics start-up, enterprise challenges in Big Data, relationship with cloud computing, metrics vs. insights, Big Data expectations and more.
Highlights from the presentations by Big Data technology practitioners from Teradata, Booz Allen Hamilton, Databricks and ProbabilityManagement.org during INFORMS The Business of Big Data in San Jose.
We discuss the role of Analytics at ShareThis, the emergence of Social TV, better user behavior insights through Social TV, major challenges with Social TV analytics, interesting insights, future trends, recommendation and more.
We discuss the gamification of hiring, founding story of Knack, applications of Predictive Human Analytics, challenges, Big Data tools and technology used, key qualities sought in data scientists, career advice and more.
We discuss the role of data science at Blue Shell Games, the importance of "Lean Data", key metrics for online games, cross-product projects and optimizing meeting the data needs across an organization.
Highlights from the presentations by Data Science leaders from UC Berkeley, Clark Atlanta Univ, Florida Institute of Technology, Rober Bosh LLC and HP on day 4 of ASE Conference on Big Data Science 2014, Stanford.
Highlights from the presentations by Big Data leaders from Aviva, Canadian Imperial Bank, Royal College of Physicians and Surgeons of Canada, and University Health Network on day 2 of Big Data Innovation Summit 2014.
We discuss the startup - Elephant Scale, DIY Hadoop learning, best free online resources for learning Hadoop, getting a good job in Big Data, and the experience of authoring a book - Hadoop Illuminated (available for free).
Highlights from the presentations by Big Data leaders from TD Bank, Public Health Ontario and First Nations Education Steering Committee on day 1 of Big Data Innovation Summit 2014 in Toronto, Canada.
Highlights from the presentations by Data Science leaders from UC Davis, UT Dallas, Northrop Grumman Corp and NIST on day 3 of ASE Conference on Big Data Science 2014 held in Stanford University.
Highlights from the presentations by Data Science leaders from USC, YarcData and Revolution Analytics on day 2 of ASE Conference on Big Data Science 2014 held in Stanford University.
We discuss sentiment data models, significance of linguistic features, handling the noise in social conversations, industry challenges, important use cases and the appropriateness of over-simplified binary classification.
We discuss the priority order of data governance for Big Data initiatives, impact of increasing shift towards Hadoop and NoSQL, data quality, current trends, talent crunch, advice and more.
Highlights from the presentations by Data Science leaders from Pivotal, IBM Research, George Washington University, IARPA at ASE Conference on Big Data Science 2014 held in Stanford University.
We discuss the role of data science at Square, common machine learning use cases, transition to real-time architecture, major challenges, expectations from data science, key qualities for data scientists, and more.
Highlights from the presentations by Data Science leaders from MIT, Georgia Tech, Microsoft Research and CUHK during workshops at ASE Conference on Big Data Science 2014 held in Stanford University.
Highlights from the presentations by Predictive Analytics leaders from Spotify, ING, Quintiles, and Riot Games on day 2 of Predictive Analytics Innovation Summit 2014 in London, UK.
Highlights from the presentations by Market Research leaders on day 3 of Future of Consumer Intelligence 2014 in Los Angeles.
Highlights from the presentations by Predictive Analytics leaders from eBay, Skype, Yahoo and AbsolutData on day 1 of Predictive Analytics Innovation Summit 2014 in London, UK.
We discuss Big Data use cases at Plenty of Fish, insights from text mining of user profiles, using topic modeling for developing user archetypes, challenges and more.
Previous postNext post Highlights from the presentations by Market Research leaders on day 2 of Future of Consumer Intelligence 2014 in Los Angeles.
We discuss the Big Data architecture at StubHub, important factors in architecture design, hybrid approach of using Big Data along with traditional data warehouses, challenges, importance of meta-data and more.
We discuss the founding story of FindiLike, Opinion-driven Decision Support Systems (ODSS), challenges in analyzing user opinions, future of Sentiment Analysis, favorite books and more.
We discuss Behavior Analytics vs. Web Analytics, important metrics for user engagement, challenges of behavior insights domain, future of multi-screen analytics, key soft skill and more.
Highlights from the presentations by Healthcare Analytics leaders from Cigna, National Parkinson Foundation, Quintiles and NYU Langone Medical Center on day 2 of Big Data & Analytics in Healthcare Summit 2014 in Philadelphia.
Highlights from the presentations by Healthcare Analytics leaders from GlaxoSmithKline, Excellus BlueCross BlueShield, Adventist Health and Mayo Clinic on day 1 of Big Data & Analytics in Healthcare Summit 2014 in Philadelphia.
Highlights from the presentations by Business Intelligence leaders from Netflix, Hyatt, GE Capital and University of Texas on day 2 of Business Intelligence Innovation Summit 2014 in Chicago.
We discuss the relevance of qualitative research for customer intelligence, MetLife Infinity, and the increasing trend of behavior-based customer segmentation.
Highlights from the presentations by Business Intelligence leaders from Boeing, Salesforce, Wells Fargo, and Citibank on day 1 of Business Intelligence Innovation Summit 2014 in Chicago.
Highlights from the presentations by Market Research leaders from H(app)athon Project, Socratic Technologies, TNS, PepsiCo on day 1 of Future of Consumer Intelligence 2014 in Los Angeles.
We discuss challenges in designing recommendation and personalization systems, how to select the right metrics, and learning regarding presentation of recommendation on different channels.
Highlights from the presentations by Analytics leaders from World Fuel Services, Vigilent Corporation, Caterpillar and SunEdison on day 2 of Manufacturing Analytics Summit 2014 in Chicago.
We discuss role of analytics in healthcare payer firms, major challenges in leveraging healthcare data, shift to value-based payments, personal motivation towards analytics, career advice and more.
We discuss NodeXL impact stories, upcoming NodeXL features, importance of an open environment, future of social media analytics, advice for novice researchers and more.
Highlights from the presentations by Analytics leaders from McCormick, HP, Patheon and Boeing on day 1 of Manufacturing Analytics Summit 2014 in Chicago.
We discuss key principles for designing business intelligence tools, exploring causation based on correlation insights, attributes of future Analytics leaders, interesting Big Data trends, important qualities in data scientists and more.
Highlights from the presentations by Business Analytics leaders from State of Illinois, Navistar, BMO Harris Bank and McGraw Hill Education on day 2 of Business Analytics Innovation Summit 2014 in Chicago.
Highlights from the presentations by Data Governance experts from Visa, Bing, San Francisco County, and RS Investments at Chief Data Officer Summit 2014 in San Francisco, CA.
Highlights from the presentations by Business Analytics leaders from Bank of America, Northern Trust, AOL and Liberty Mutual on day 1 of Business Analytics Innovation Summit 2014 in Chicago.
We discuss traditional sentiment analysis vs. modern sentiment analysis, role of data science in Human Centric Intelligent Society, mainstream adoption of bio sensors and opportunities created by Big Data from ubiquitous continuous sensing.
We discuss typical sentiment analysis problems at eBay, underrated challenges, career motivation, important soft skills and more.
We discuss aspect-based opinion mining, major challenges, cold start items, the need for accurate opinion mining models for cold start items and how factorized LDA can be leveraged.
Highlights from the presentations by Data Governance experts from State of Colorado, IBM, Informatica and Sony Pictures Entertainment on day 1 of Chief Data Officer Summit 2014 in San Francisco, CA.
We discuss the relevance of "Purchase Graph", Slice platform, analytical insights from mining all activity around a customer's purchase, experimentation strategy, experience of working as a data scientist and more.
We interview the co-chairs of INFORMS Conference The Business of Big Data 2014 (June 22-24, 2014) on Big Data maturity, opportunities assessment, analytics for operations research, conference agenda and more.
We discuss the capabilities of Looker, data democratization across organization, change in the tools being used by analytics-savvy business managers, front-line analytics, competitive landscape and more.
We discuss trends in location analytics, evolution of HERE's analytics architecture, infrastructure challenges, data governance and more.
We discuss BigInfo Labs, the concept of "Data Relevance" in Big Data, experience of partnership with Intel, and BigInfo Labs' strategy for competitive differentiation.
Highlights from the presentations by HR leaders from Caterpillar, Coca-Cola, Pfizer, and Marriott International on day 2 of HR & Workforce Analytics Innovation Summit 2014 in Chicago.
Recent survey on Big Data outlook reports increasing interest in Big Data for more accurate and timely decision-making; and concerns about project costs and ability to scale.
We discuss the role of Data Governance, establishing Big Data accountability, impact of Data Governance on Data Quality, and assessing the education available for Data Governance.
Highlights from the presentations by HR leaders from Wells Fargo, Sears Holdings, Johnson Controls, Trulia on day 1 of HR & Workforce Analytics Innovation Summit 2014 in Chicago.
We discuss the role of Risk Analytics at Paychex, strategic importance of Sales Anticipation Model, optimizing business processes by leveraging Big Data, and advice for companies thinking about Big Data as well as aspiring students.
Highlights from the presentations by Big Data technology practitioners from Sears Holdings, Microsoft, Ticketmaster during Big Data Innovation Summit 2014 in London.
We summarize the key findings in the recently released US Open Data Action Plan, highlighting the principles, commitments, datasets released and future outlook.
Highlights from the presentations by Gaming Analytics experts from Ubisoft, Electronic Arts, Sega on Day 2 of Gaming Analytics Summit 2014.
We discuss how to build the best data models, significance of correlation and causality in Predictive Analytics, and impact of Big Data on Astrophysics.
Highlights from the presentations by Big Data experts from McKinsey Solutions, SAP, Techfetch, Weather Analytics on Day 2 of Big Data for Executives 2014.
Highlights from the presentations by Gaming Analytics experts from Activision, Valve, Microsoft and Broken Bulb Studios on Day 1 of Gaming Analytics Summit 2014.
We discuss how HP views Big Data, capabilities of HP HAVEn, leveraging Big Data for improving customer experience, Analytics challenges, outsourcing criteria and current trends.
We discuss the mission of Skytree, product strategy, complimentary consulting programs, recent trends, and current expectations from Machine Learning.
Highlights from the presentations by Big Data experts from Sears Holdings, PWC, Oracle, Altamira, Tesora on Day 1 of Big Data for Executives 2014.
We discuss the last mile of the execution path of Analytics projects, five critical pillars of success and data-driven decision making through advanced analytics.
We discuss Talksum data stream router and cross-domain networking with real-time data management using data streams.
Non-stop 24 hours of coding at the Code for India 2014 hackathon leads to creative solutions for major social problems of India through interesting software applications.
We discuss Big Data vs. Fast Data, Data Visualization trends, Jaspersoft acquisition, factors differentiating future leaders of Big Data and more.
Highlights from the presentations by opinion mining experts from Fujitsu, FindiLike and Stanford University on Day 2 of Sentiment Analysis Innovation Summit 2014 in San Francisco.
Recent study highlights the increasing market perception that Predictive Analytics leads to competitive advantage. The report also outlines current trends and challenges for Predictive Analytics.
We discuss the rising medical costs, how can Big Data help, key features of Quintiles Inforsario and Topological Data Analysis.
Highlights from the presentations by analytics experts from Youtube, Evernote and Wikia on day 2 of Social Media & Web Analytics Innovation Summit 2014 in San Francisco.
Highlights from the presentations by opinion mining experts from Twitter, eBay and Samsung on Day 1 of Sentiment Analysis Innovation Summit 2014 in San Francisco.
Highlights from the presentations by experts from Google, CapitalOne, StubHub and Social Media Research Foundation on day 1 of Social Media & Web Analytics Innovation Summit 2014 in San Francisco.
We discuss Big Data architecture, fast multi-attribute searches, database sharding and scaling challenges at eHarmony.
Survey results highlight the importance of Analytics capability in media industry and the consumer beliefs on privacy vs. personalization benefits.
Highlights from the presentations by big data technology practitioners from Hortonworks, Intel, Rackspace, SciSpike, and Yahoo at Big Data Bootcamp 2014 in Santa Clara.
We discuss significance of YARN for Hadoop 2.0 platform, unique benefits of RedPoint Convergent Marketing Platform and Master Key Management for Customer Analytics.
We review recently released report on Healthcare perceptions towards BI/Analytics and share key insights into who is leading healthcare analytics in different categories and what are the key dominant trends.
We summarize the key findings in the recently released White House report on Big Data, highlight the key opportunities and concerns, and list the recommendations made to the President.
Highlights from the presentations by big data technology practitioners from Caspida, Datastax, ElephantScale, Hortonworks, MapR and Qubole at Big Data Bootcamp 2014 in Santa Clara.
We discuss traditional analytics vs. modern analytics, avoiding over-simplification, human-technology interaction for Big Data, challenges in democratizing analytics and more.
We discuss data mining of cancer clinical data, LDA topic model, challenges in mining clinical notes, big data in healthcare and more.
We discuss the responsibilities of Data Science and Analytics teams, significance of programming knowledge for data scientists, important soft skills, talent landscape in data science and more.
Alteryx Analytics 9.0 blends new sources of customer Insight such as Social Media, Google Analytics, and Marketo with data from legacy environments such as SAS Analytics.
We discuss challenges in analyzing bursty data, real-time classification, relevance of statistics and advice for newcomers to Data Science.
Massachusetts Big Data Report 2014 (free download) highlights state successes, including almost 500 Big Data companies, $2.5B invested, 5600 students graduating from 14 data science-related programs, and identifies key priorities and growth opportunities.
We discuss Randomized Controlled Experiments, common errors during A/B testing, Correlation vs. Causality, Big Data Myths and setting up realistic expectations from Big Data and more...
Big Data related skills led the list of top paying technical skills (six-figure salaries) in 2013. Several other useful insights are available in the Dice Tech Survey Report, available for free download.
Highlights from the presentations by big data technology practitioners from NYSE, Glassdoor, Slice and Paychex on day 2 of Big Data Innovation Summit 2014 in Santa Clara.
We discuss Analytics for Public Policy decisions, responsibilities of Utah Chief Data Officer, crowdsourcing analytics for resolving Government problems and most important skills for data science practitioners.
Highlights from the presentations by big data technology practitioners from eBay, YarcData, LinkedIn, Trulia, and other leading companies on day 1 of Big Data Innovation Summit 2014 in Santa Clara.
Highlights from keynote speeches by big data experts from Facebook, RedPoint Global, Quintiles, Samsung, GMU, PayPal, and others on Day 2 of Big Data Innovation Summit 2014 in Santa Clara.
Highlights from keynote speeches by big data technology leaders from industry and academia on first day of Big Data Innovation Summit 2014 in Santa Clara.
Candid advice from an industry veteran, Paco Nathan reveals the true picture behind the much-talked-about Data Scientist "glamour" and helps people have the right expectations for a Data Science career.
Anmol talks with Sriram Sankar, Principal Staff Engineer at LinkedIn about LinkedIn’s “Economic Graph”, Entity-Oriented Search, and the biggest challenges towards delivering relevant, personalized search results.
Anmol talks with Anjul Bhambhri, IBM’s Vice President of Big Data Products about Big Data Trends, developing the Big Data capabilities in-house vs. outsourcing, five crucial steps to adopting a success big data strategy and advice for beginners.
Anmol talks with Daniel Tunkelang, Head of Query Understanding at LinkedIn about search quality, IR, query understanding, and advice for data science enthusiasts. Don't miss: 4 steps to get your LinkedIn profile show up on top of search results.
Anmol talks with Geoffrey Moore, an author, speaker and advisor who splits his consulting time between start-up companies in the Mohr Davidow portfolio and established high-tech enterprises, most recently including Salesforce, Microsoft, Intel, Box, Aruba, Cognizant, and Rackspace. In the interview, we discuss his "Crossing the Chasm" book, his vision for Big Data analytics, when Big Data will cross the chasm, and advice for entrepreneurs.
Anmol talks with Paco Nathan, Chief Scientist at Mesosphere. In the interview, we discuss about Apache Mesos, Cascading, his books and Big Data trends.
Anmol talks with Quentin Clark, Corporate Vice President, Microsoft Data Platform Group. In the interview, we discuss Power BI for Office 365, Big Data trends and Microsoft’s strategic decisions.
Strata 2014 was a great conference, and here are key insights from some of the best sessions on day 3: Data Journalism, Analytics over Real-time Streaming Data, Facebook Graph Analysis with One Trillion Edges, Socializing Search by LinkedIn.
Strata 2014 was a great conference, and here are key insights from some of the best sessions on day 2: Big Data Vendor Landscape, Machine Learning for Social Change, Secrets of Gertrude Stein, and Facebook Exascale Analytics.
Data scientists love numbers, yet not all data is numerical. Qualitative analytics should not be ignored, especially given the unique value it provides.
Highlights from keynote speeches delivered by various eminent big data technology leaders from industry and academia at Strata 2014 Conference held in Santa Clara recently.
Despite great data analytics capabilities, gaming companies are facing an interesting data mining challenge from an unexpected end – their audience.
Why do Big Data projects fail to deliver the promised value, that too despite the “clearly” established potential? What should business managers do to avoid the media hype and focus on achieving sustainable benefits from big data investments?
I strongly prefer emails for all communication, which I would normally respond to within a day. I am trying best to keep my social accounts updated with the latest news.
For any communication related to KDnuggets.com, please use the following email id: anmol AT kdnuggets.com