Knowledge management


In my last blog I spoke about Data and Information. Today I would like to speak about knowledge management. Governments and firms invest more and more on knowledge management and collaboration systems. Knowledge management is one of the fastest growing area in the past decades therefore I think it is a very important topic.


Knowledge

When we speak about knowledge we also have to speak about collaboration. It is obvious if you can not share you knowledge with others effectively than your knowledge does not worth a lot. Therefore knowledge has to be shared to have a value. We speak about knowledge when we use information to discover patterns, rules, and contexts. Wisdom is when we apply knowledge to the solution of problems using our own or collective experience. Wisdom involves where, when, and how to apply knowledge. There are two types of knowledge tacit and explicit. Tacit knowledge is in people’s head and it is not written down however explicit knowledge is documented. Finally Knowledge is also situational and contextual. For example it is not enough to know your policy at your workplace but you also have to know how to use it. Knowledge is important for every company because it is an important asset. Knowing how to manage knowledge effectively might lead a company to a competitive advantage in the marketplace.
When organizations use knowledge management and they adjust their behavior according to it we speak about organizational learning. We speak about organizational learning when organizations create new business processes and they change patterns in decision making. Organizations gather knowledge using a variety of tools and mechanism. They measure data, make experiments, they monitor the inside and outside environment of the organization, they get feedback from costumers and from employee.

Knowledge value chain

As we have seen above knowledge management refers to the set of business processes developed in an organization. There are five value-adding steps in the knowledge management value chain.

Acquisition
Organizations acquire knowledge in a lot of ways. They acquire structured and unstructured knowledge as well. A coherent and organized knowledge system requires sales, payments, inventory, customers,news feeds, industry reports, legal opinions, scientific research, government statistics.

Storage
Once knowledge is acquired Organizations has to store somehow for later use. Generally data and knowledge is stored in databases. Firms normally use document management systems to digitize, index, and tag documents for having a coherent managing system.

Dissemination
Employee and managers share information with each other and gather information with the use of different tools like e-mail, instant messaging, wikis, social business tools, and search engines. They have to evaluate which information is important. A supportive culture might help employees and managers to find the useful information.Training, programs, informal networks, and shared management experience and communication is a key to create a supportive culture.

Application
Knowledge has to applied to solve problems in an Organization and has to add value to the business. To do so knowledge must become a systematic part of management and decision making.  Knowledge must be used in Organization business processes and key application systems by creating new business practices, new products and services, and new markets.

Knowledge management systems

There are three major types of knowledge management systems: enterprise-wide knowledge management systems, knowledge work systems, and intelligent techniques.
Enterprise-wide knowledge management systems make efforts to collect, store, distribute, and apply digital content and knowledge. The firms must deal with three types of data in general. Structured data, such us written documents,semi structured data like email,voice mail etc. and tacit knowledge.
Knowledge work systems (KWS) are specialized systems built for engineers, scientists, and other knowledge workers. Knowledge workers like researchers, designers, architects, scientists, and engineers primarily create knowledge and information for the organization. Knowledge workers  require knowledge work systems with powerful graphics, analytical tools, and communications and document management capabilities. Think about Computer-aided design (CAD), 3D printing or virtual reality systems.
The third part of knowledge management are intelligent techniques, such as data mining, expert systems, neural networks, fuzzy logic, genetic algorithms, and intelligent agents. Data mining are used for knowledge discovery to find patterns,categories and behaviors. Data mining helps organizations capture undiscovered knowledge in large datasets, providing insight for improving business performance and aid for better decision making. Have a look on my previous blog where I represented R studio and I made an experiment with it.
Hope you enjoyed my blog about knowledge management. If you would like to know more please do not hesitate to read the book Management Information Systems by Kenneth C. Laudon and Jane P. Laudon as I also used it for the base of my post.

Astronomy and Big data

Big data and Astronomy

Today I would like to speak about one of my hobby Astronomy. I watched all the series of The Universe so I thought I will write a little bit about Astronomy and big data.

What is big data?
Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit on your database architectures. To gain value from this data, you must choose an alternative way to process it.

What big data used for?
Big data used within the organization for analytic and new products. Big data might give new insight to hidden processes and might lead to introduce new products or actions. In case of astronomy, analyze the data coming from the universe and help scientist to discover for example new planets.

The characteristics of big data?
Big data is big. I hope I am not surprising you. So you can not process it in a regular way and you can not move it conventionally. Big data is also messy and scientist say they spend more time tiding up the data than actually gathering information about it. Volume, velocity, and variety are commonly used to characterize different aspects of big data.

Volume:
Is the amount of data. Data are measured by terabytes, petabytes, and even exabytes. Big data analytic has the ability to processes large amount of information. Since data are too big to process in a normal relational database therefore big data is processed in parallel processing architectures like warehouses or Apache Hadoop. Data warehousing approaches involve predetermined schemas while Apache Hadoop places no conditions on the structure of the data it can process.

Velocity:
Velocity means the speed of producing, transmitting, and analyzing data. Those Organizations who are able to quickly utilize information gain competitive advantage. For example in astronomy scientist are able to predicts where and how many super nova explosions will be on the universe using big data analytics therefore they are able to use there resources more efficiently.

Variety:
A common theme in big data systems is that the source data is diverse, and doesn’t fall into neat relational structures. A common use of big data processing is to take unstructured data and put it in a structured form for consumption either by a human or an application.

Use of big data

Astronomy, Astrostatistics and astroinformatics are using big data to study the universe. So what is the difference between astronomy,astrostatistics and astroinformatics?

Astronomy is the study of the physics, chemistry, and evolution of celestial objects and phenomena that originate outside the Earth’s atmosphere. When scientist watch a supernova explosion, observe a gamma ray burst or just studying the cosmic microwave background radiation than we speak about astronomy.

Astrophysics is the branch of astronomy that studies the physics of the universe.

Astrophysics uses many disciplines from mechanics, electromagnetism, statistical mechanics, thermodynamics, quantum mechanics, relativity, nuclear and particle physics, and atomic and molecular physics to solve astronomical issues.

Astrostatistics applies statistics to the study and analysis of astronomical data.

Astroinformatics uses information technologies to solve the big data problems faced in astronomy.

We can say both astrostatistics and astroinformatics used to help astrophisics ans astronomy. And all of the fields are working together to make more discoveries. How do they do it?
Scientist use different sky surveys based in space and also in the ground. Probably the most known is “Hubble” space telescope named by the astronomer Edwin Hubble. This sky surveys are monitoring and making record of the different part of the universe.They monitor gamma rays and X-rays, ultraviolet, optical, and infrared to radio bands. These technology uses a lot of storage, a lot of petabytes therefore astronomy has to analyze big data. The analysis of big data contribute to a lot of new discoveries in astronomy. It helps scientist to study dark energy and dark matter, the formation and evolution of galaxies, and the structure of our own Milky Way. Scientist also can use big data to study stars and find new planets.

Hope you find it interesting how astronomers use big data. If you are interested in data science you will need three main characteristics. Have an expertise in some scientific field like maths or programming, astronomy physics etc. Have a natural curiosity and be able to go beneath the surface and be creative.

Moneyball and Information Systems

I saw the movie Moneyball for my second time and I also learn data analytic in school so I thought I give a quick recap about the film and how is related to my studies. First of all I have to say I really enjoyed the film. I give 9 of 10 so I recommend you to watch it if you are interested in data analytic and sport. So what is Moneyball about?
Moneyball is a 2011 American sport drama directed by Bennett Miller. Is based on the book by the same name by Michael Lewis. True events that occurred in 2002 provide the basis for the book and movie. So what happened in the film?
Billy Beane (Brad Pitt) has been the A’s general manager for four years. He loses three key players from his team because the team is not able to keep up with the demand for good players. Bean need to buy new players for his team but he has a very limited budget. Billy realize he has to think differently to replace the three lost players. He has to do something different to stay in the game. Billy meat Peter (Paul Deposta) a fresh graduate in economics. They create a system to evaluate baseball players by using performance statistics called “sabermetrics.”
I am not going to tell you the whole story but basically Billy and Peter changed the way of thinking in sport.How did they achieve that?
Traditional baseball scouts relied on several sight-based scouting prejudices. Basically Billy and Peter only looked the raw data instead of looking the players. They left behind intuition and they used data analysis. So data analytic might help your business making a better decision. And when you implement some change in your business than is better to rely on facts than intuitions. So this is the first lesson from the movie.
I mentioned decision making above. It is very important how you make your decisions. It is very important to support your decisions with information and not the opposite. I mean you can not make your decision first than support it with information. This is where Information system comes in.
An information systems collect, store and distribute information to support decision making and control for a business. A good information system also help to analyze and visualize data. An Information system contains data about the Organization and the Environment around the organization. So in this case Billy and Peter analyzed the data of baseball players and visualized it in an application. In order to win they had to gather information by their own team and the surrounding environment. I mean the other teams and players here.
When we tried to define Information System we used the term data and information. But when do we speak about data and when do we speak about information? Data are raw facts. Data is unorganized and it is not in a meaningful form. So basically we can not read any information about it. However we speak about information when we data is organized and it is useful for us. I do not know anything about baseball but the film is basically about how you use raw data, turn it into information and how you use them.
How to turn data into information? It require basically three activities. Input collect raw data. Processing converts into a meaningful form and output transfer information into people and collect feedback.
Using information systems effectively requires an understanding of the organization, management, and information technology. The key elements of an organization are its people, structure, business processes, politics, and culture.
And this is exactly what you can see in the movie. Billy had to understand his organization. He had to understand he can not compete with bigger team with bigger budget to buy individual players. He changed the management to rely on facts instead of intuitions and he used an information system to win more games. He also had to change the culture of his team and the people inside it.
This was just one example how an information system might be used to support business and decision making. Now data analytic are widely used in NBA, NFL, Soccer or any kind of sport.

Try R

Data Analytics with R


In this post I will explain what is R programming language and I will demonstrate how R studio could be used for data analytic.

R

R is a language for statistical computing and graphics. With R you are able to use a variety of statistical a graphical technique. One of the best feature of R are the use of different plots to visualize data. But R could be also used for different mathematical calculations or analysis. The other very good thing about R is free. Does not require any license fee and runs in LINUX, OSX, and WINDOWS. R programming language include data handling and storage facility, tools for data analysis, graphical facilities and display, input-output facilities and different functions.

R studio

RStudio is basically the programming interface of the R programming language. It includes a console, syntax-highlighting editor that supports direct code execution, and a variety of robust tools for plotting, viewing history, debugging and managing your work-space.

Using R sometimes could be really frustrating until you don’t have the necessary knowledge to work by yourself. However you can make really incredible things using only just a few commands. So I recommend to try R if you are interested in programming or data analysis or data-mining. Or if you like statistics and would like to compare different data to find some correlation than R is for you. Data analysis is also used by almost all enterprises to improve business. So if you would like to know more about R than here is a tutorial which brings you through the basics. http://tryr.codeschool.com/

The titanic assignment

After completing the tutorial above, I decided to challenge my skills in RStudio. Our lecturer mentioned a website where you can find different challenges for people who are enthusiastic for data analytic and data mining. You find the website here: https://www.kaggle.com/c/titanic

 I choose a challenge where I have to predict the chances of survival on the Titanic. More precisely, predict what sort of people survived this tragedy. If you would like to have more information about the challenge or the tragedy please go to the link above.

To be honest at the beginning I did not have a clue where to start it or what to do but fortunately there are a tone of enthusiastic people online who share their technique with you. I watched a ton of videos before I could start the project by myself but I was able to achieve this basic plot.

In this plot you can see the number of passengers traveled on the Titanic on the first, second and third class. 0 represent people who did not survive, 1 represent people who survived. As you can see if you traveled on the first class your chances of survival was around 70%, if you traveled on the second class than only little bit more than 50% and if you traveled on the third class than you was more likely to die.

I will show you how did I get this simple graphic.

1. Download the required data from here: https://www.kaggle.com/c/titanic/data
2. Download and installed RStudio from here: https://www.rstudio.com/products/rstudio/download/
3. Create a new R script and set up working directory (click on the session tab)

4. read the csv file:
train <- read.csv(“train.csv”, header = TRUE)

test <- read.csv(“test.csv”, header = TRUE)

5.Have a look on your data:
str(data.combined)

6.Convert your pclass and survived variable to factor:
train$pclass <- as.factor(train$pclass)
train$survived <- as.factor(train$survived)

7. Check how many people survived the tragedy
table(train$survived)

8. Check in which classes how may passenger traveled
table(train$pclass)

9.Load up ggplot2 package to use for visualizations
library(ggplot2)

10. Run ggplot
ggplot(train, aes(x = pclass, fill = factor(survived))) +
geom_bar() +
xlab(“Pclass”) +
ylab(“Total Count”) +
labs(fill = “Survived”)

Conclusion

So this is my basic concept about how could you survived the tragedy of the titanic. I was not satisfied with my knowledge so I decided to go deeper on this analysis with help and I was able to create another plot which is more detailed. As you can see on the chart below class was not only a factor, gender and age and marital status also had a big role.But I still just scratched the surface.



So now probably you know more about R. I hope I successfully demonstrated the power of R studio and I encourage you to try R because you can do very amazing things if you keep practicing.

 References:

https://www.kaggle.com/
https://www.rstudio.com/
https://www.r-project.org/about.html
Rstudio (?…..)



How to use Google fusion Table

I have a good news for you. It is not a rocket science! All you have to do is to create a Google account, (probably you already have one). Than go to your Google drive, search for Google fusion table and connect it to your drive. Great! Than you will need some data what you are going to use before you can begin.


A good source of information is cso. So i went to its site and I exported some data to exel and saved it on my local drive. (You can input data to exel manually or copy and paste.) Than you have to open your Google drive and click “new” and click fusion table. Than choose file from your computer and click next. Than you can set in which row are the column name’s are and click next. Than click file and geocode. Than I created a few buckets clicking on change style and I got this.

https://fusiontables.google.com/embedviz?q=select+col0+from+18h1vLz8EWrnEUjZ4ZZhjzbydk51B4jXBjGc82qmU&viz=MAP&h=false&lat=52.84212564207156&lng=-7.07087744140631&t=1&z=7&l=col0&y=2&tmplt=2&hml=ONE_COL_LAT_LNG

The point of colors on the map refers to number of the population of the county by males and females.
Than i downloaded a kml file from idependent’s website and I followed the same steps. KML is a Keyhole Markup Language, an XML notation for expressing geographic annotation and visualization within Internet-based, two-dimensional maps and three-dimensional Earth browsers.
https://fusiontables.google.com/data?docid=1rStJn0RE-SmugK115yZHitdRho6hkYpN0r8w5OKK#map:id=3

Than I merged the two tables and I got this.

https://fusiontables.google.com/embedviz?q=select+col2%3E%3E1+from+18AoPQgeGggsMEa-j1tTZ5V3g8H8wufrJDQ_z7dXq&viz=MAP&h=false&lat=53.50030984792301&lng=-7.96208334375001&t=1&z=7&l=col2%3E%3E1&y=2&tmplt=2&hml=KML

We can see on the map that the most populated county is Dublin. I am not looking for the reasons now why there is a huge difference in the population of the Capital and the counties because this blog is not about the economy of Ireland. But feel free to create your own fusion table about unemployment, number of new jobs created or  even on the number of universities in each county and merge it with this table. I am sure you will get interesting figures.

I was interested about how many accidents are on the irish roads so I retrieve some data from rsa’s website and I created a fusion table.

https://fusiontables.google.com/embedviz?q=select+col2%3E%3E1+from+1nuh6CH9806TbosI_KHgW7DbbR6ABhvVJFEON6Ebn&viz=MAP&h=false&lat=53.12618376086364&lng=-8.11589193750001&t=1&z=7&l=col2%3E%3E1&y=3&tmplt=3&hml=KML

Conclusion

Google fusion tables allow you to visualise and understand your raw data on a map or chart. It helps you to understand the corellation between two or more datasets. But it does not give you answers so I reccommend it to use with caution if you are looking for correlation.  However is a great tool which can be used widely to visualise data geographically.

What is Google fusion table for

Google recently introduced a new data management tool called Google fusion table. It is a cloud based API (application programming interface) for visualizing, merging and collaborating on data. It Enables users to retrieve data from any kind of sources, update data and collaborate in real time. It is also has a good function which enables users to embedd data to any blog or site. Still not clear? Than let see an example. Lets say you are watching the election in TV. You see a nice chart on the screen about the popularity of different canditates in different counties or cities. This chart can be easily done by Google fusion table. In my next post I will demonstrate how to create a similar map.

Hello world!

This is my first blog. I wanted to start blogging a long time ago about different topics and researches what I am interested about but never had enought motivation to begin with. OK, tell me I am lazy! I am a student in DBS and we got a project to write a blog about Google fusion table. Are you exited? Great, because a lot of people doesn’t know about this exellent app by Google. So in the next chapters I am going to explain what is Google fusion Table, i will demonstrate how to use it and I will discuss what potentials are in this app for future use.