• About dunkablog
  • My Creative Output

dunkablog

~ The official website of author Duncan MacLeod

dunkablog

Category Archives: Data Mining

How Big Data can save the world

Data Mining for Naughty Letters?

31 Thursday Mar 2016

Posted by dunkablog in Data Mining

≈ Leave a comment

I am an advanced beginner user of Tableau working to become an intermediate user.  I have real data sets at work that I viz mercilessly. It is probably a sign that I am deranged, but I actually find it FUN to dig into data and find the stories hidden in the numbers.

This week, I decided to get better at geographic visualizations.  To that end, I was able to find a really interesting set of data containing crime statistics between 2012-2015 in Los Angeles, the city I call home.  It included a field that represented longitude and latitude for each of the crimes.  After removing parentheses and converting text to columns, I was able to create a unique shape and color for each type of crime in the LAPD list – of which there were about 100, and place them in the exact location on a map of Los Angeles where they occurred.

The first hurdle I discovered was that a few of the entries were missing the latitude and longitude, so they mapped the crimes as taking place in Sierra Leone (0,0).  I highlighted them and excluded them, and the map snapped to a map of California.  I cleaned up a handful of outliers in parts of the Southland that weren’t really relevant, like Tehachapi, Big Bear, etc.  Next I discovered that due to limitations of Excel, the dataset cuts off before it reaches 2015 – only showing 460,000 pieces of data through August of 2013.  Like I said, I’m an advanced beginner, so I settled for what I got.

What resulted was a nice concentrated map of all the reported crimes in all of the relevant locations within or immediately adjacent to the city of Los Angeles from the period January 1, 2012 to August 18,2013.  Here is what it looks like:

LA1

I was able to focus on my part of town, the Van Nuys division, and by deselecting all crimes and only selecting violent crimes, I was able to determine that my neighborhood was far safer from violent crimes than many of the areas around it, particularly Central Van Nuys.  Here is the visual proof:

VanNuys1

Then I thought I would look at a map of homicide in LA 2012-2013.  There were 762 entries.  Here is what that looked like:

LAHomicide

Then I saw a crime that I had never heard of called “Letters, Lewd.” When I clicked it, I was astonished by the result.  There were over 3,450 “Letters, Lewd” crimes reported in the period between January 1, 2012 and August 18, 2013.  Here is what that map looked like:

LettersLew

What the heck? I looked up lewd letters on Google, and there were no mentions of this hideous crime wave anywhere.  There was one news item from Sacramento about one lewd letter being sent and a local gentleman there was hauled in on suspicion.  There was no mention of the LA lewd letter bombs of 2012-2013.

I noticed how democratically distributed these lewd letters were.  No one area had been spared the scourge of naughty mail.  This was a truly unusual data mining result. It is the reason I get all excited about data visualizations.

My pet theory is that there is an informal rule in the police department that code 956, “Letters, Lewd” is used when the nature of the crime is not to be disclosed.  It may be code for prostitution or some other sex crime that wasn’t listed elsewhere.  It may be used to reduce the number of reported homicides and other violent crimes.  There were just way too many lewd letter reports to pass my sniff test.  Over 5 times as many lewd letters reported as there were homicides.  Is this a cover-up?  Is it an error in the data set? Why don’t I get lewd letters in the mail?  I think it might actually be entertaining.  I certainly wouldn’t find receiving such a letter worthy of a trip into dangerous downtown Van Nuys to file a police report.

I have no idea how to get to the bottom of this mystery, so I am posting it to this here blog to see if someone out there in cyberspace knows why there were so many lewd letters reported in Los Angeles during 2012 and 2013. Maybe a police woman/man can weigh in on the subject.  Or maybe a vigilante reporter will take up the lewd letter cause to find out the truth behind these bizarre numbers.  Please comment if you have suggestions or answers.

letterskey

Share this:

  • Twitter
  • Facebook
  • Reddit
  • Pinterest
  • Email

Like this:

Like Loading...

Is Tin Can more like Tivo or the Emperor’s New Clothes?

26 Friday Jun 2015

Posted by dunkablog in Business, Data Mining

≈ Leave a comment

The Tin Can API (also called Experience API or xAPI) is a new e-learning interface that is modeled after the data points gathered by social networks like Facebook or LinkedIn. It’s a big buzzword in the online learning community. But I can’t figure out how it is applied practically. I don’t see how it’s going to help me get people certified in the online corporate university that I administer. I keep reading about it, trying to imagine how it will help me. And until I see it in action, I think I will remain skeptical.

That’s what happened with me and Tivo years ago. I asked, “why would anyone want to watch TV later? They’ve missed the show!” I couldn’t see the benefit until I had my first binge of Professional Bull Riding on a good friend’s Tivo device. Then it suddenly became a technology without which I could scarcely survive.

The developers of Tin Can have given numerous examples of how Tin Can is different and better than prior e-learning standards like SCORM. SCORM allows me to import a course from one learning system into another. It’s like magic. The quizzes and all the videos play perfectly on the new platform.

Tin Can, on the other hand, does away with packaging up courses. It instead sees “experience” as the real teacher, and so encourages employees to report their learning activity via apps an d other software. The apps export subject, verb, noun statements like “I ate macaroni salad.” The employee can scan the barcode of an excellent book like 5150 and it will record a statement into the Learning Record Store (LRS) as “I read 5150.” And if they take a course in the traditional learning environment (the Learning Management System or LMS) then that gets recorded as well. “I passed the quiz on Macaroni Salad with a score of 75%.” The problem with the book scanning example is that it would be possible to scan a bunch of barcodes of books you never read. The quiz is what already gets recorded in a more Excel-friendly format. I hardly want to generate learning success rate graphs using subject noun verb statements.

The notion the new xAPI is trying to put forth is that learning is ubiquitous and therefore should be captured at every turn. I have an app I use that is analogous to this type of interface. It’s called “Rescue Time.” It tracks my activity on my work computer and reports back to me on how productive I have been. It decides if the websites I visit are productive or unproductive, and whether the software I use is related to Design, Finance, Marketing or some other discipline. I find that on most days I am around 95% productive and I work about 6.5 hours on the computer during an 8 hour work day. This is a cool app.

I think the Tin Can API is trying to do the same thing for organizations – track its employees, their learning experiences on and off the worksite, and whether or not the learning they do correlates with real world results. But how we get there from where it is now is a complete mystery to me. I would love to see an example of a corporation that has adopted the xAPI and has put it to effective use. I still find the ability to move a course around using SCORM a lot more practical than attempting to track when an employee has a brilliant insight on Yammer. But I won’t be the first to say that the Emperor is nude. I think we just need to see what his magic suit can do before we decide whether it is of a fine quality.

Share this:

  • Twitter
  • Facebook
  • Reddit
  • Pinterest
  • Email

Like this:

Like Loading...

Do Business and Hyper-intuition Mix?

22 Thursday May 2014

Posted by dunkablog in Business, Creativity, Data Mining, Uncategorized

≈ Leave a comment

Tags

finance, psychic, puzzles, visions

There are a lot of positive words to express the abilities of intuitives. Sometimes we are called “prescient.” Others call us “creative.” I think we may even be called “visionary.”

Here is what I hear more often: crazy, unfounded, scatterbrained, lack of discipline, chaotic…I didn’t have to struggle to come up with those negative words. In my graduate program, we learned the value of “playing to your strengths.” It is a waste of time to put your strongest ability on the back burner in order to cultivate skills that you don’t possess naturally. I could argue that getting an MBA was an enormous exercise in playing to my weaknesses. But let me offer up a positive spin on this paradox.

I am lucky enough to have a boss who recognizes my abilities and gives me opportunities to use them. Intuitive accounting, for instance, allows me to look at a stack of numbers and immediately recognize an error. A few years ago, all of the accountants were scratching their heads trying to figure out why a business unit was off by a huge sum. I took one look at the workbook and told them that they were showing a different number for the forecast than what was given. I emailed the forecast to the head accountant so she could correct her mistake. Flustered, they begrudgingly thanked me and muttered things like “lucky guess.” It wasn’t a lucky guess. I just happen to know how to do math in my head and have an almost absurd recall for numbers. I had seen the forecast before month end close, and it was much larger. I didn’t know the exact number, but I knew the number they used was the wrong one.

The sad thing is that my ability drives logical sensing people to the edge of sanity. My boss knows how to keep his distance from me, as he is an extreme sensing person. He doesn’t know how I know what I know, but he does listen. I warned him of several efforts to undermine his plans based on a few snippets of conversation I had overheard. All of them were real, but he didn’t want to take action until the actual coup d’état was right before him. I have stopped offering up my psychic knowledge to him to preserve his sanity. He cannot understand how I know things in advance. I can read people’s subtle energy and he can’t.

I untangle a lot of financial knots. I love doing reconciliations. They are like an Amish puzzle for me. I love getting things started, but prefer to hand them off once there is momentum. When I hear the words “attention to detail,” I sigh, because I can only pay attention to important details…and what I deem important is rarely, if ever, what sensible business people consider important. If I have to pay attention to unimportant details, I will fall asleep at my desk.

I guess I would ask, gentle readers, that you weigh in on whether hyper-intuitive, psychic people belong in the world of business, and if not, where do we belong?

20140522-083501-30901033.jpg

Share this:

  • Twitter
  • Facebook
  • Reddit
  • Pinterest
  • Email

Like this:

Like Loading...

Is Waze a good example of Web 3.0?

05 Tuesday Nov 2013

Posted by dunkablog in Business, Data Mining, Uncategorized

≈ Leave a comment

No one is entirely sure yet what will constitute Web 3.0.

According to the slightly unreliable Web 2.0 website Wikipedia, there are several theorems and proposals put forth by “web experts” and futurists.

Two that seem unlikely and/or unpleasant suggested either that it would be
A. A return of experts to the web, quashing unreliable data and insinuating their right to charge for reliable information.
Or
B. Something involving a meta verse of 3d imagery and holo decks using live cams placed around the world and a few more things too Trekkie for me to understand, but definitely candidates for causing widespread migraines and nervous breakdowns.

The thing that stuck out for me was the notion put forth by John Smart that it was going to be an evolution of geo-social apps like four square. Combined with that would be the notion, attributed to Conrad Wolfram, that it would come when the web begins generating its own data and catering to the user based on algorithms that indicate what you like.

Waze is a geosocial app, in real time 3d, that uses all the data out there about how fast cell phones are moving to generate the fastest route from point a to point b. it needs no algorithm to understand what you like, because it is universally assumed that every user wants the same thing…more time at their given destination and less time in traffic. It quickly learns the locations you travel to, and also learns from you if you stray off it’s suggested course and still get home on time.

Waze users can flag a speed trap or an object in the road, or an accident. This, sadly, may lead to yet another accident. Other users can give a little thumbs up to indicate that the tip was valid, or indicate that the nuisance ahead is no longer present.

Driving hazards aside, Waze is the most revolutionary app to arrive since google maps. Thanks to the network effect, it gets increasingly more astute and tuned in to your driving needs as time goes on.

I have yet to flag an incident in the roadway…mainly for fear of becoming one myself in the process. Gen Y folks are probably more comfortable taking their eyes off the road…I suspect they are the most dutiful citizens in Waze World.

Rather than try to outdo or crush Waze, I hope the would-be competitors out there will figure out how to create modules that you can install to narrow the purpose of your drive. Today at 2pm in the greater Los Angeles area 6,350 people were on Waze, all with a common goal of reducing their time spent in traffic. What if you could find all the Waze users who need a lunch date? Okay, that already exists in a static format (I am thinking of gay apps like Scruff, Grindr, and Growlr). But maybe it can integrate with your home security system and advise you to return home and call 911 because a burglar is stealing your iPad.

I don’t think I can single handedly dream up what Web 3.0 will be, but I think Waze is the first glimpse of the new future into which we are rapidly speeding.

34.184573 -118.467530

Share this:

  • Twitter
  • Facebook
  • Reddit
  • Pinterest
  • Email

Like this:

Like Loading...

The D-Generation and the “Internet of Things”

19 Thursday Sep 2013

Posted by dunkablog in Business, Data Mining

≈ Leave a comment

Tags

Big Data, D-Generation, Data mining, data vs. information, information vs. knowledge, Internet of Things, n vs. knowledge, Tableau, technology, transportation

The generation to follow the Millennials have been playfully named the D-Generation in which ‘D’ stands for ‘Device’. These kids will grow up expecting their possessions to report back to them on a regular basis. They will not need to take so many hours of drivers education because their cars will be like little miniature buses that drive themselves from point to point along one of three chosen paths from Google maps. The “D-Generates” will probably be unemployed – machines will be doing many of their summer jobs, like taking orders, building widgets, and driving crabby passengers to and from their destination.

Currently, there are more active IP addresses than there are humans on the planet. This is expected to grow rapidly, so that within five years, there will be 50-billion or more active nodes on the net, only a tiny fraction of which are reserved for a human like me. By the end of the 21st century, provided there is no cataclysmic disruption to the energy supply, the internet will essentially be a network of devices talking to one another, and we will be guests on their network.

These machines generate data – so much data, in fact, that the traditional methods of searching through data no longer work. The “signal to noise ratio” is approaching zero. And yet within that sea of noise is sunken treasure. The great discoveries of tomorrow will no doubt take place in the server farms on earth, not in far off outer space.

Doctors are already analyzing data from little wristbands connected to smartphones to see if they can predict when a heart attack will occur. Once they know what to look for, they can advise their patients to come in and be treated for a heart attack that will happen some time the following week. Hopefully they will have an appointment available.

One key to preventing data from becoming noise is to provide metadata. In the world of devices, metadata provides noise with a first and last name. Noise becomes searchable because it is tagged with the information that distinguishes it from the crowd.

Data is nothing until it becomes “information.” Information is vital, but it is nothing until it is analyzed, at which point it becomes knowledge. With machines spewing out data and metadata at an alarming rate, there is already a need for meta-metadata. This is information about information.

It reminds me of that horrible reference book I had to use to do research in the dark ages called “The Reader’s Guide to Periodical Literature.” This was analog metadata. Within the volumes, there was also an index – this was meta-metadata. And the legend that told me what all the indecipherable symbols represented, i.e. Mmlle = “Mademoiselle Magazine” – that was something that may yet need to be invented.

Tableau and other big data software are the closest thing out there to the “Legend” in my analog example. They take data and metadata and convert it instantly into graphs, charts, maps and other helpful tools for visualizing the data. They allow you to identify noise and exclude it from the example. Outliers appear visually, and can be instantly drilled into to see what they are and why they exist.

The big data analysis tools help get us from noise to knowledge quickly. These are the covered wagons that will take us to the next frontier of knowledge. Let’s hope the ‘D-Generates’ are not so coddled by their toasters and coffee machines that they grow idle and choose to ignore the vast wealth of bits and bytes that may yet reveal truths never before known.

20130919-195821.jpg

Share this:

  • Twitter
  • Facebook
  • Reddit
  • Pinterest
  • Email

Like this:

Like Loading...

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 640 other subscribers
Follow dunkablog on WordPress.com

Me

dunkablog

dunkablog

Writer, filmmaker, doodler, musician, data miner

Personal Links

  • dunkablog
  • My author page on Amazon.com
  • My Smashwords Page
  • My Facebook Author Page

View Full Profile →

What I Wrote About

  • Duncanwritesbooks.com
  • Ready, Fire, AIM!
  • Goodbye to a fellow writer and mentor
  • I Wrote ‘5150’ To Help Others
  • The whole series on Kindle Unlimited

Recent Comments

Shara Palmer on I Wrote ‘5150’ To…
Ann Pugh on Goodbye to a fellow writer and…
wolfcanary on The whole series on Kindle…
Duncan MacLeod on Serial Employee
wolfcanary on Take guns away from Trump…

Follow me on Twitter

My Tweets

Create a free website or blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • dunkablog
    • Join 60 other followers
    • Already have a WordPress.com account? Log in now.
    • dunkablog
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    %d bloggers like this: