2014年9月30日火曜日

Silicon Chef - Women's hardware hackathon

I joined Silicon Chef- women's hardware hackathon this weekend.

Our team "Nature Lullaby" made a musical instrument using Adruino, Max, light sensor, flex sensor and leap motion. 

1. Getting Arduino setup:

Install Arduino
http://arduino.cc/en/main/software

FTDI driver
http://www.ftdichip.com/Drivers/VCP.htm

SparkFun Inventor’s Kit Guide
http://dlnmh9ip6v2uc.cloudfront.net/datasheets/Kits/SFE-SIK-RedBoard-Guide-Version3.0-Online.pdf

2. Making the LED blink, and change the parameters based on photo resistor and flex sensor

Silicon Chef

Hello world- making the LED blink:

void setup()
{
  pinMode(13, OUTPUT);
}

void loop()
{
  digitalWrite(13, HIGH);   // Turn on the LED
  delay(100);              // Wait for one second
  digitalWrite(13, LOW);    // Turn off the LED
  delay(100);              // Wait for one second
}

Using photo resistor to change the light level of LED:

const int sensorPin = 0;
const int ledPin = 9;

int lightLevel, high = 0, low = 1023;

void setup()
{
  pinMode(ledPin, OUTPUT);
}

void loop()
{
  lightLevel = analogRead(sensorPin);
  manualTune();  // manually change the range from light to dark
  analogWrite(ledPin, lightLevel);
}

void manualTune()
{
  lightLevel = map(lightLevel, 0, 1023, 0, 255);
  lightLevel = constrain(lightLevel, 0, 255);
}

void autoTune()
{
  if (lightLevel < low)
  {
    low = lightLevel;
  }

  if (lightLevel > high)
  {
    high = lightLevel;
  }
 
  lightLevel = map(lightLevel, low+30, high-30, 0, 255);
  lightLevel = constrain(lightLevel, 0, 255);
  }

3. Hooking up Arduino with Max:

Install Max
http://cycling74.com/downloads/

Fetching data of Arduino pins (in this case, data of photo resister) to Max

Maxuino 
http://playground.arduino.cc/Interfacing/MaxMSP

 /*
 *  Arduino2Max
 *  Copyleft: use as you like
 *  by Daniel Jolliffe
 *  Based on a sketch and patch by Thomas Ouellet Fredericks  tof.danslchamp.org
 */

int x = 0;                              // a place to hold pin values
int ledpin = 13;

void setup()
{
  Serial.begin(115200);               // 115200 is the default Arduino Bluetooth speed
  digitalWrite(13,HIGH);              ///startup blink
  delay(600);
  digitalWrite(13,LOW);
  pinMode(13,INPUT);
}

void loop()
{
if (Serial.available() > 0){         // Check serial buffer for characters      
    if (Serial.read() == 'r') {       // If an 'r' is received then read the pins  
for (int pin= 0; pin<=5; pin++){      // Read and send analog pins 0-5
    x = analogRead(pin);
    sendValue (x);
    }
for (int pin= 2; pin<=13; pin++){     // Read and send digital pins 2-13
    x = digitalRead(pin);
    sendValue (x);
    }
    Serial.println();                 // Send a carriage returnt to mark end of pin data.
    delay (5);                        // add a delay to prevent crashing/overloading of the serial port
  }
 }
}

void sendValue (int x){              // function to send the pin value followed by a "space".
 Serial.print(x);
 Serial.write(32);
}

Added a patch to change photo resister data to audio
(practically this is a theremin)

Fixing the patch to get various output sounds based on the light parameter and flex parameter.



4. Configure Max to read sample audios and remix



5. Setting up Leap Motion

https://developer.leapmotion.com/

Now we can do gesture input.

6. Making Leap Motion talk with Max

https://github.com/derekrazo/o.io.leap
http://cnmat.berkeley.edu/downloads

Adding Leap input to change the pitch of the sound, configured in Max.

Silicon Chef



Result:

We've got 3 levels of thresholds for light sensor, with frog sound and nature sound (as a lullaby to put you into sleep...), with additional strong light from flashlight app of Android we'll get a cat's meow sound (to wake you up), with flex sensor we'll get a rooster sound (if you were not waken up by the cat) and with Leap motion we can make that rooster sound changed to a digital squeak (to wake up the zombies).

It was not my first time playing with Arduino and Leap Motion, but my first time using Max. We had 2 experts on our team so they were able to figure out many things on the fly- I think I should get back to the elementary level and try out using it a bit more. Many thanks to the team for helping me learn!

Photos from the event:

Women building things!


Silicon Chef

Silicon Chef

Silicon Chef

Silicon Chef

Silicon Chef

Demos!

Silicon Chef

Silicon Chef

Silicon Chef

Silicon Chef

Silicon Chef

Silicon Chef

Note from TechTalk:

There are lots of cool events happening all around the world!

Nodebot is a global community of developers working on robots powered by JavaScript.



Nodecopter is an event with developers making something with AR Drone & Javascript



Noderocket- Javascript powered missiles



Robotsconf



JS Conf robot soccer game



Electric Imp

https://electricimp.com/

We set it up and ended up not using it this time, but it enables you to connect to wifi... will use it somehow for other projects :)

Silicon Chef

Max Shortcuts

Useful shortcut info,  thanks to Erin!

n = new object
i = integer object
f = floating point number object
m = message (for sending messages like t (for toggle) or b (for bang) or $1 (for use first incoming piece of data) etc.)
b = button
t = toggle
c = comment box (text only, for commenting)
CMD + E = toggle lock/ unlock patch
CMD + OPT + i = open inspector (use with an object highlighted--will open the inspector window for individual settings for that object)
CMD + SHIFT + h = open help (use w/ an object highlighted--will open help window for selected object)
CMD + SHIFT + r = open reference (same as above, but for reference)

My post from last year's Silicon Chef

http://fumiopen.blogspot.com/2013/10/silicon-chef-hardware-hackathon.html

Disclaimer: The opinions expressed here are my own, and do not reflect those of my employer. -Fumi Yamazaki

2014年9月4日木曜日

Code for America SF brigade hacknight memo

From my notes at meetup in San Francisco tonight: Jen Pahlka and Tim O'reilly were the speakers.

CfA brigade

- Let's put technology to the rightful place. It's not just about making money -technology can make society better.

- Should we be building something or reusing? Should governments be using taxpayers' money to build custom-made apps for each city?

- Change in vendor ecosystem is needed. 21st century governments can't survive without digital skill sets, tinkerers and builders.

- Government was our original means of collective action. It was built to do things that none of us could do individually. It’s wonderful to work together.

- Q: Technology will take away government people's jobs? A: Redeploy people to focus on outcome and actual people, don't get rid of them.

- Looking back on how Open Government advocacy started- it started by showing off people who are doing great things in the government, celebrate, and tell stories. Then, others who saw it will start doing it. That's how movements starts. That's how Tim started Open Source movement and other movements.

- 18F is deployment and USDS (US Digital Service) is about strategy and oversight. UK's GDS (Government Digital Service) actually does both.  Government realized they must spend less time just talking and more getting shit done ;)


 Disclaimer: The opinions expressed here are my own, and do not reflect those of my employer. -Fumi Yamazaki

2014年9月2日火曜日

Guides on "How to run Civic Hack Nights"

There are many guides on "how to run civic hack nights" recently, so I decided to compile a list here:


How to: Hack Night



Civic Hacking 101 by Christopher Whitaker


How to Hack Night panel


Disclaimer: The opinions expressed here are my own, and do not reflect those of my employer. -Fumi Yamazaki

Guides to Publishing Open Data

As you can see from US City Open Data Census, many cities are making their data open.
http://us-city.census.okfn.org/



There are many guides and best practices learned from those cities, to publishing open data that countries/states/cities. Summary and links below for those interested!

Open Data Policy Guidelines by Sunlight Foundation

http://sunlightfoundation.com/opendataguidelines/
(CC-BY Sunlight Foundation)

What Data Should Be Public
  1. Proactively Release Government Information Online
  2. Reference And Build On Existing Public Accountability And Access Policies
  3. Build On The Values, Goals And Mission Of The Community And Government
  4. Create A Public, Comprehensive List Of All Information Holdings
  5. Specify Methods Of Determining The Prioritization Of Data Release
  6. Stipulate That Provisions Apply To Contractors Or Quasi-Governmental Agencies
  7. Appropriately Safeguard Sensitive Information
How to Make Data Public
  1. Mandate Data Formats For Maximal Technical Access
  2. Provide Comprehensive And Appropriate Formats For Varied Uses
  3. Remove Restrictions For Accessing Information
  4. Mandate Data Be Explicitly License-Free
  5. Charge Data-Creating Agencies With Recommending An Appropriate Citation Form
  6. Require Publishing Metadata
  7. Require Publishing Data Creation Processes
  8. Mandate The Use Of Unique Identifiers
  9. Require Code Sharing Or Publishing Open Source
  10. Require Digitization And Distribution Of Archival Materials
  11. Create A Central Location Devoted To Data Publication And Policies
  12. Publish Bulk Data
  13. Create Public APIs For Accessing Information
  14. Optimize Methods Of Data Collection
  15. Mandate Ongoing Data Publication And Updates
  16. Create Permanent, Lasting Access To Data
How to Implement Policy
  1. Create Or Appoint Oversight Authority
  2. Create Guidance Or Other Binding Regulations For Implementation
  3. Incorporate Public Perspectives Into Policy Implementation
  4. Set Appropriately Ambitious Timelines For Implementation
  5. Create Processes To Ensure Data Quality
  6. Ensure Sufficient Funding For Implementation
  7. Create Or Explore Potential Partnerships
  8. Mandate Future Review For Potential Changes To This Policy
Open Data Playbook by Code for America- Open by Default [beta]
http://www.codeforamerica.org/governments/capabilities/open-data/
(CC-BY Code for America)

Introduction: What is open data, and why bother?
  • Opendata.gov and 8 principles of open government data 
                http://opengovdata.org/
         1. Complete
    All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.
         2. Primary
    Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.
         3. Timely
    Data is made available as quickly as necessary to preserve the value of the data.
         4. Accessible
    Data is available to the widest range of users for the widest range of purposes.
         5. Machine processable
    Data is reasonably structured to allow automated processing.
         6. Non-discriminatory
    Data is available to anyone, with no requirement of registration.
         7. Non-proprietary
    Data is available in a format over which no entity has exclusive control.
         8. License-free
    Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.

  • Open Data Glossary
          https://docs.google.com/document/d/1ZbkQ2Ad66FKVj-v2T-UHKJbsh0CHV-dm9MEAoy6yT2Y/edit
Laying the Groundwork for Open Data
  • Define the goals of your open data initiative and align it with organizational goals and priorities
     What to expect: How open data has worked for cities of all sizes
     -Open Data examples: Louisville, KY
     -Open Data examples: Chattanooga, TN
     -Open Data examples: Montgomery County, MD
     -Open Data examples: Pittsburgh, PA
     -Open Data examples: Albuquerque, New Mexico
     -Open Data examples: Kansas City, MO
     Read also: "Beyond Transparency" on open data's impact in various cities
     http://beyondtransparency.org/
  • Build departmental support and executive buy-in
     Who needs to be at the table?
      > Executive leadership,  Internal champion, IT leader, GIS Specialist, Departmental Stakeholders

Demonstrating Value: Open data success stories

     Public Safety Open Data examples:
     -San Francisco: Crime Spotting Map
     -New York City: Targeting Illegal Building Conversions Inspections
     Economic Development Open Data examples:
     -Asheville, North Carolina: Empowering Startups
     -Charlotte, NC: Helping Local Organizations Unlock Funding
     Citizen Participation Open Data example:
     -Chicago: Flu Shot Locations
     Health and Human Services Open Data example:
     -Louisville: Restaurant Inspection Scores on Yelp
     -San Mateo County: Aggregating Community Services
     Internal Cost Savings and Efficiency Open Data examples:
     -Albuquerque: Reducing Transit-Related 311 Calls
     -Oakland: Streamlining Public Records Requests
     -Chicago: Eliminating 311 Redundancies
     Transparency and Accountability Open Data example:
     -Boston: Increasing Trust Between Government and Residents

Opening and Publishing Data

  • Prioritizing data for release
     # Former Philadelphia Chief Data Officer Mark Headd recommends starting with the "Three Bs":
        Buses (transit data), Bullets (crime data), and Bucks (budget and expenditure data).
  • 18 recommended datasets
     http://us-city.census.okfn.org/faq/

     1. Asset Disclosure
     2. Budget
     3. Business Listings
     4. Campaign Finance Contributions
     5. Code Enforcement Violations
     6. Construction Permits
     7. Crime
     8. Lobbyist Activity
     9. Procurement Contracts
     10. Property Assessment
     11. Property Deeds
     12. Public Buildings
     13. Restaurant Inspections
     14. Service Requests (311)
     15. Spending
     16. Transit
     17. Zoning (GIS)
     18. Web Analytics
  • Compare major platform options and select an open data platform
    • CKAN -- the Comprehensive Knowledge Archive Network -- is open source software powering open data platforms across the world. Provided by the Open Knowledge Foundation in the UK, CKAN is used at the local, regional, national, and international levels of government as well as in academia.
    • DKAN is a Drupal-based implementation of CKAN that offers an easier installation and support burden while preserving API compatibility.
    • OpenDataCatalog (ODC) is open source software originally created by Azavea for the city of Philadelphia.
    • Socrata is the most popular commercial data platform provider in the United States. Socrata offers a turnkey SaaS cloud-hosted data catalog, paid for on a subscription basis. The Socrata platform includes API abilities and sitewide analytics that track consumption and engagement metrics. Socrata is used by dozens of municipal governments, including Baltimore, Austin, Chicago, Seattle, and New York City.
    • 2014 Code for America Fellows compiled this "Open Data Portal Analysis" and detailed comparison which compares features and costs for some of the most common open data platform providers.
  • Publish your data 
Planning for Sustainability
  • Create an open data policy
     Example Policies:
     -City of South Bend Executive Order No. 2-2013
     -City of Louisville Executive Order No. 1, Series 2013
     -City of Austin Resolution No. 20111208-074
     Comprehensive list created by Sunlight Foundation:
     A Bird's Eye View Of Open Data Policies: http://sunlightfoundation.com/policy/opendatamap/
  • Appoint staff to be responsible for data management
     -Chief Data Officer
     -Open Data Coordinator (ODC)

Making open data useful

  • Use common open data formats
     Examples: General Transit Feed Specification (GTFS), Housefacts SpecificationLocal Inspector Value Entry Specification (LIVES), and other data formats
  • Hold a hackathon
     Socrata's "how to plan a hackathon" doc
  • Deploy apps that use open data
     Recommended apps for redeployment
     -Adopt-a-hydrant
     -Click That Hood
     -To The Trails
     -Look at Cook
     -Flu Shot Finder
     More apps can be found at Code for America Apps page
 

Open Data Field Guide by Socrata

http://www.socrata.com/open-data-field-guide-chapter/
(All rights reserved by Socrata)

0. Introduction to the Open Data Field Guide

1. Why Does My Organization Need Open Data?
-What Is Open Data?
     Is Open Data The Same As Open Government?
     Brief History of Open Data and Key Initiatives to Date
-Why Open Data? Why Now?
     A Changing 21st Century Constituency
     The Changing Nature of Government Work
     Leveraging the Community for Innovation

2. Define Clear and Measurable Goals

-Align Your Open Data Program with Your Mission and Strategic Plan
-Adapt Open Data Goals to Your Local Context
-Common Goals for Open Data Initiatives

3. Assemble a Winning Team
-The Open Data Stakeholders
-Winning Your Chief Executive’s Support
-Invite Every Department

4. Develop Your Open Data Policy
-Why Is an Open Data Policy Necessary?
     The Benefits of Good Policy
-Elements of an Effective Open Data Policy
-The Main Types of Open Data Policies
-Open Data Policy Examples and Resources
     Sample Policies and Implementation Guides
     Sample Resolution Statements

5. The Data Plan
-Which Data Should You Publish First?
     8 Steps to a Successful Data Plan
         1. Identify the data that supports your strategic goals.
         2. Adapt your open data goals to your local context.
         3. Start with the data already on your site.
         4. Analyze your site traffic.
         5. Analyze your FOIA and public information requests.
         6. Request feedback from citizens.
         7. Interview your co-workers.
         8. Don’t reinvent the wheel. Copy what works.
     What Are Open Data Leaders Publishing?
     Data Format and Open Data Standards
-Open Data Standards
-Application Programming Interfaces

6. Open Data Implementation in Six Steps
-Think About a Pilot to Start
-Phase 1 – Start Small
-Phase 2 – Get Transparency Done
-Phase 3 – Bring Developers on Board
-Phase 4 – Increase Agency Participation
-Phase 5 – Optimize for Efficiencies and Cost Savings
-Phase 6 – Federate Data with Neighboring Cities, Counties, and States

7. Engage Your Community
-Promoting Your Open Data Portal
     Examples of Success
     Engagement
-Four Essentials of Developer Evangelism
     Publish Data
     Connect With Civic Developer Organizations
     Host a Hackathon
     Be Humble
-What Apps Are Developers Building?

8. An Outstanding Citizen Experience
-Curating the Data Experience
-Rethinking the Citizen Experience
     From File Downloads to Useful Visualizations
     Say It With Maps!
     Richer Visual Context? Try Map Mashups
     The “App-ification” of Data
     Taking the Experience Mobile

9. Join the Open Data Community
-The Growing Open Data Movement
-How to Stay Connected to the Open Data Community

Acknowledgements & Glossary

Open Data Companion Kit

Project Open Data  
http://project-open-data.github.io/

1. Background
2. Definitions
2-1 Open Data Principles - The set of open data principles.
2-2 Standards, Specifications, and Formats - Standards, specifications, and formats supporting open data objectives.
2-3 Open Data Glossary - The glossary of open data terms.
2-4 Open Licenses - The definition for open licenses.
2-5 Common Core Metadata - The schema used to describe datasets, APIs, and published data at agency.gov/data.
3. Implementation Guidance
Implementation guidance for open data practices.
3-1 U.S. Government Policy on Open Data - Full text of the memorandum.
3-2 Implementation Guide - Official OMB implementation guidance for each step of implementing the policy.
3-3 Public Data Listing - The specific guidance for publishing the Open Data Catalog at the agency.gov/data page.
3-4 Frequently Asked Questions - A growing list of common questions and answers to facilitate adoption of open data projects.
3-5 Open Data Cross Priority (CAP) Goal - Information on the development of the Open Data CAP goal as required in the Open Data Executive Order.
4. Tools
This section is a list of ready-to-use solutions or tools that will help agencies jump-start their open efforts. These are real, implementable, coded solutions that were developed to significantly reduce the barrier to implementing open data at your agency. Many of these tools are hosted at Labs.Data.gov and developers are encouraged to contribute improvements to them and contribute other tools which help us implement the spirit of Project Open Data.
4-1 Database to API - Dynamically generate RESTful APIs from the contents of a database table. Provides JSON, XML, and HTML. Supports most popular databases. - Hosted
4-2 CSV to API - Dynamically generate RESTful APIs from static CSVs. Provides JSON, XML, and HTML. - Hosted
4-3 Spatial Search - A RESTful API that allows the user to query geographic entities by latitude and longitude, and extract data.
4-4 Kickstart - A WordPress plugin to help agencies kickstart their open data efforts by allowing citizens to browse existing datasets and vote for suggested priorities.
4-5 PDF Filler - PDF Filler is a RESTful service (API) to aid in the completion of existing PDF-based forms and empower web developers to use browser-based forms and modern web standards to facilitate the collection of information. - Hosted
4-6 Catalog Generator - Multi-format tool to generate and maintain agency.gov/data catalog files. - Hosted Alternative
4-7 A data.json validator can help you check compliance with the POD schema. - Hosted
4-8 Project Open Data Dashboard - A dashboard to check the status of /data and /data.json at each agency. This also includes a validator.
4-9 Data.json File Merger - Allows the easy combination of multiple data.json files from component agencies or bureaus into one combined file.
4-10 API Sandbox - Interactive API documentation systems.
4-11 CFPB Project Qu - The CFPB’s in-progress data publishing platform, created to serve public data sets.
4-12 HMDA Tools - Lightweight tools to make importing and analyzing Home Mortgage Disclosure Act data easier.
4-13 ESRI2Open - A tool which converts spatial and non-spatial data form ESRI only formats to the Open Data formats, CSV, JSON, or GeoJSON, making them more a part of the WWW ecology.
4-14 ckanext-datajson - A CKAN extension to generate agency.gov/data.json catalog files.
4-15 DKAN - An open data portal modeled on CKAN. DKAN is a stand alone Drupal distribution that allows anyone to spin up an open data portal in minutes as well as two modules, DKAN Dataset and DKAN Datastore, that can be added to existing Drupal sites to add data portal functionality to an exist Drupal site.
4-16 DataVizWiz - A Drupal module that provides a fast way to get data vizualizations online.
4-17 Esri Geoportal Server - Open source catalog supporting ISO/FGDC/DC/… metadata with mapping to DCAT to support agency.gov/data.json listings in addition to providing OGC CSW, OAI-PMH and OpenSearch. Supports automated harvesting from other open catalog sources.
4-18 Libre Information Batch Restructuring Engine - Open data conversion and API tool, created by the Office of the Chief Information Officer of the Commonwealth of Puerto Rico.
4-19 JSON-to-CSV Converter - A handy means of converting data.json files to a spreadsheet-friendly format. A similar tool can provide basic CSV-to-JSON functionality.
5. Resources
This section contains programmatic tools, resources, and/or checklists to help programs determine open data requirements.
5-1 Metadata Resources - Resources to provide guidance and assistance for each aspect of creating and maintaining agency.gov/data catalog files.
5-2 Business Case for Open Data - Overview of the benefits associated with open data.
5-3 General Workflows for Open Data Projects - A comprehensive overview of the steps involved in open data projects and their associated benefits.
5-4 Open License Examples - Potential licenses for data and content.
5-5 API Basics - Introductory resources for understanding application programming interfaces (APIs).
5-6 Data Release Safeguard Checklist - Checklist to enable the safe and secure release of data.
5-7 Digital PII Checklist - Tool to assist agencies identify personally identifiable information in data.
5-8 Applying the Open Data Policy to Federal Awards: FAQ - Frequently asked questions for contracting officers, grant professionals and the federal acquisitions community on applying the Open Data Policy to federal awards.
5-9 Example Policy Documents - Collection of memos, guidance and policy documents about open data for reference.
5-10 Example Data Hubs - Collection of department, agency, and program data sites across the federal government.
5-11 Licensing policies, principles, and resources - Some examples of how government has addressed open licensing questions.
6. Case Studies
Case studies of novel or best practices from agencies who are leading in open data help others understand the challenges and opportunities for success.
6-1 Department of Labor API Program - A department perspective on developing APIs for general use and, in particular, building the case for an ecosystem of users by developing SDKs.
6-2 Department of Transportation Enterprise Data Inventory - A review of DOT’s strategy and policy when creating a robust data inventory program.
6-3 Disaster Assistance Program Coordination - The coordinated campaign led by FEMA has integrated a successful data exchange among 16 agencies to coordinate an important public service.
6-4 Environmental Protection Agency Central Data Exchange - The agency’s data exchange provides a model for programs that seek to coordinate the flow of data among industry, state, local, and tribal entities.
6-5 FederalRegister.gov API - A core government program update that has grown into an important public service.
6-6 National Broadband Map - The National Broadband Map, a case study on open innovation for national policy. Produced by the Wilson Center.
6-7 National Renewable Energy Laboratory API program - An agency perspective on developing APIs for general use and in particular building the case for the internal re-use of the resources.
6-8 USAID Crowdsourcing to Open Data - A case study that shows how USAID invited the “crowd” to clean and geocode a USAID dataset in order to open and map the data.
6-9 Centers for Medicare & Medicaid Services (CMS) Data and Information Products - a case study of how CMS is transitioning to a data-driven culture, including the creation of a new office for information products and data analytics, the release of open data summarizing provider utilization and payment information, and the responsible disclosure of restricted use data to qualified parties.
For Developers: View all appendices (and source)
7. Open Data Engagement
Data Jam
Datapalooza
Hackathon
Online Community
FOIA Officers and Ombudsman
Templates and instructions

Open Government Data (The Book) by Joshua Tauberer
http://opengovdata.io/
(All rights reserved by Joshua Tauberer)

Civic Hacking and Government Data 
-Civic Hacking 
-History of the Movement 
-Open Government, Big Data, and Mediators
Civic Hacking By Example 
-Visualizing Metro Ridership
Why I Built GovTrack.us
Applications for Open Government
-Sunlight as a Disinfectant 
-Democratizing Legal Information 
-Informing Policy Decisions 
-Consumer Products
A Brief Legal History of Open Government Data
-Ancient Origins of Open Access to Law
-The U.S. Freedom of Information Act 
-The 21st Century: Data Policy
14 Principles of Open Government Data
-Online and Free, Primary, Timely, Accessible (Principles 1–4) 
     (1) Information is not meaningfully public if it is not available on the Internet for free.
     (2) “Primary: Primary data is data as collected at the source, with the finest possible level of granularity, not in aggregate or modified forms.”
     (3) “Timely: Data are made available as quickly as necessary to preserve the value of the data.” Data is not open if it is only shared after it is too late for it to be useful to the public.
     (4) “Accessible: Data are available to the widest range of users for the widest range of purposes.”
-Analyzable Data in Open Formats (Principles 5 and 7) 
     (5) Analyzable.
     (7) “Non-proprietary: Data are available in a format over which no entity has exclusive control.”
-No Discrimination and License-Free (Principles 6 and 8) 
     (6) “Non-discriminatory: Data are available to anyone, with no requirement of registration.”
     (8) “License-free.” Dissemination of the data is not limited by intellectual property law such as copyright, patents, or trademarks, contractual terms, or other arbitrary restrictions.
-Publishing Data with Permanence, Trust, and Provenance (Principles 9–11)
     (9) Permanent: Data should be made available at a stable Internet location indefinitely.
     (10) Safe file formats: “Government bodies publishing data online should always seek to publish using data formats that do not include executable content.”
     (11) Provenance and trust: “Published content should be digitally signed or include attestation of publication/creation date, authenticity, and integrity.”
-On The Openness Process (Public Input, Public Review, and Coordination; Principles 12–14)
     (12) Public input: The public is in the best position to determine what information technologies will be best suited for the applications the public intends to create for itself.
     (13) Public review
     (14) Interagency coordination
Data Quality: Precision, Accuracy, and Cost
Bulk Data or an API?
A Maturity Model for Prioritizing Open Government Data
Case Studies
     -U.S. Federal Open Data Policy 
     -Transparency, Participation, and Collaboration 
     -The Later Memorandums 
     -House Disbursements 
     -State Laws and the District of Columbia Code
Paradoxes in Open Government
     -The Bhoomi Program and Digital Divides
     -Unintended Consequences and the Limits of Transparency 
     -Looking for Corruption in All the Wrong Places 
     -Conclusion
Example Policy Language
     -Open Government Data Definition: The 8 Principles of Open Government Data 
     -OKF’s Open Knowledge Definition 
     -New Hampshire HB 418

Open Data Handbook by Open Knowledge Foundation
http://opendatahandbook.org/
(CC-BY Open Knowledge Foundation)

Introduction
Why Open Data? 
What is Open Data? 
How to Open up Data 
So I’ve Opened Up Some Data, Now What? 
Glossary 
Appendices

Open Data Ireland: Open Data Publication Handbook
(CC-BY Deirdre Lee, Richard Cyganiak & Stefan Decker at Insight Centre for Data Analytics, NUI Galway)
https://www.insight-centre.org/sites/default/files/publications/open-data-publication-handbook.pdf

Step-by-Step Guide to Open Data Publishing 
Step 1 Carry out a Data Audit
Step 2 Select what Data to Publish

 [Common High-Value Datasets]

Step 3 Ensure Data Protection Laws are Adhered to
Step 4 Associate Data with an Open License
Step 5 Publish Data as 3- to 5-star Open Data
     *Publish data on the Web under an Open License
     ** Publish data in a machine-readable, structured format
     *** Publish data in a non-proprietary format
     **** Use URIs to identify things, so that people can point at your stuff
     ***** Link your data to other data to provide context

[Machine-Readable and Non-Proprietary Data Formats]


Step 6 Associate Data with Standardised Metadata
Step 7 Use Data Standards
Step 8 Use Unique Identifiers
Step 9 Provide Access to the Data
Step 10 Publish Data on the National Open Data Portal

...and more resources:
"Open Government - Collaboration, Transparency, and Participation in Practice" by Daniel Lathrop and Laurel Ruma
"Open Data Guidebook" by City of Philadelphia
"Open Source for Government" by Ben Balter
"Open Government Briefing Guide" by Open Austin


Disclaimer: The opinions expressed here are my own, and do not reflect those of my employer. -Fumi Yamazaki

2014年8月20日水曜日

GDELT data and BigQuery

It's fascinating to be able to access quarter-billion-record GDELT Event Database - it is now available as a public dataset in Google BigQuery.

A couple of months ago, those data were published on BigQuery.

"World's largest event dataset now publicly available in BigQuery"
http://googlecloudplatform.blogspot.com/2014/05/worlds-largest-event-dataset-now-publicly-available-in-google-bigquery.html

"More than 250 million global events are now in the cloud for anyone to analyze"
http://gigaom.com/2014/05/29/more-than-250-million-global-events-are-now-in-the-cloud-for-anyone-to-analyze/

I am not an expert of BigQuery, but just testing the sample queries on the blog posts on BigQuery gives an idea of what we possibly can do.

Example query on the 250 million records detailing worldwide events from the last 30 years and discovered the top defining relationship for each year.

SELECT Year, Actor1Name, Actor2Name, Count FROM (
SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY Count DESC) rank
FROM 
(SELECT Actor1Name, Actor2Name,  Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode),  (SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name  and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode),
WHERE Actor1Name IS NOT null
AND Actor2Name IS NOT null
GROUP EACH BY 1, 2, 3
HAVING Count > 100
)
WHERE rank=1
ORDER BY Year

 And this is the result you get. SUPER fast, and get interesting results.






Next sample query is this one, compiling every protest in Ukraine that GDELT found in the world’s news media, by month, from 1979 to present. Since there is a lot more news media in 2014 than in 1979, the raw count of protests per month is normalized.

SELECT MonthYear MonthYear, INTEGER(norm*100000)/1000 Percent
FROM (
SELECT ActionGeo_CountryCode, EventRootCode, MonthYear, COUNT(1) AS c, RATIO_TO_REPORT(c) OVER(PARTITION BY MonthYear ORDER BY c DESC) norm FROM [gdelt-bq:full.events]
GROUP BY ActionGeo_CountryCode, EventRootCode, MonthYear
)
WHERE ActionGeo_CountryCode='UP' and EventRootCode='14'
ORDER BY ActionGeo_CountryCode, EventRootCode, MonthYear;

Query result has 335 pages...


... so I'll just click chart view. You can see there were protests series of protests in 1989 (“Revolutions of 1989”) another peak in October 1995 (not sure what that was), another in March 2001(“Ukraine without Kuchma protest”), big spike in November 2004 (“Orange Revolution”) and the recent 2014 protests.


Of course you can download the data on csv, and do further analysis.

And using those data, people became able to use all those data and analyze easily using Google's computational power. Some examples:

Kalev Leetau visualized those data to show that Ukraine's protest was not just in Kiev.
It's Not Just Kiev - Using Big Data to map Ukraine's protest violence.
http://www.foreignpolicy.com/articles/2014/02/21/it_s_not_just_kiev_ukraine_protest_map


New Scientist's visualization of civilian violence Syrian Civil War.
http://syria.newscientistapps.com/


"Correlating the Patterns of World History With BigQuery"
http://googlecloudplatform.blogspot.com/2014/08/correlating-patterns-of-world-history-with-bigquery.html

The query on this post made me feel....


According to the author and my friend +Felipe Hoffa, "this query has 2 subqueries: The smaller one finds the timeline of 30 days in Egypt before 2011-01-27, while the left side collects all sets of 30 days events for every country through GDELT's ever-growing dataset. With a cross join between the first set and all the sets on the left side, BigQuery is capable of sifting through this over a million combinations computed in real-time and calculate the Pearson correlation of each timeline pair. For a visual explanation, see the linked IPython notebook."

SELECT
  STRFTIME_UTC_USEC(a.ending_at, "%Y-%m-%d") ending_at1,
  STRFTIME_UTC_USEC(b.ending_at-60*86400000000, "%Y-%m-%d") starting_at2,
  STRFTIME_UTC_USEC(b.ending_at, "%Y-%m-%d") ending_at2,
  a.country, b.country, CORR(a.c, b.c) corr, COUNT(*) c
FROM (
  SELECT country, date+i*86400000000 ending_at, c, i
  FROM [gdelt-bq:sample_views.country_date_matconf_numarts] a 
  CROSS JOIN (SELECT i FROM [fh-bigquery:public_dump.numbers_255] WHERE i < 60) b
) b
JOIN (
  SELECT country, date+i*86400000000 ending_at, c, i
  FROM [gdelt-bq:sample_views.country_date_matconf_numarts] a 
  CROSS JOIN (SELECT i FROM [fh-bigquery:public_dump.numbers_255] WHERE i < 60) b
  WHERE country='Egypt'
  AND date+i*86400000000 = PARSE_UTC_USEC('2011-01-27')
) a
ON a.i=b.i
WHERE a.ending_at != b.ending_at
GROUP EACH BY ending_at1, ending_at2, starting_at2, a.country, b.country
HAVING (c = 60 AND ABS(corr) > 0.254)
ORDER BY corr DESC

What it gives you is a list of all the worldwide periods from the last 35 years, as monitored by GDELT, that have been most similar to Egypt’s two months preceding the core of its revolution.

You get this result with 23,640 pages, so better to download the data and analyze separately :)

But first looking at the first page result: the two most highly correlated periods are Germany 7/8/2009 – 9/6/2009 (r=0.827) and Sweden 10/4/2010 – 12/3/2010 (r=0.824).


You can read the analysis result by Kalev Leetau in this blog post:

"Towards Psychohistory: Uncovering the Patterns of World History with Google BigQuery"
http://blog.gdeltproject.org/towards-psychohistory-uncovering-the-patterns-of-world-history-with-google-bigquery/

Upper chart (Germany & Sweden in green, Egypt in red), with the X axis being the number of days from the start of the period (thus position 0 corresponds with 11/28/2010 for Egypt, 7/8/2009 for Germany, and 10/4/2010 for Sweden). To make it easier to compare each pair of countries, raw volume counts are replaced with “Z scores” (standard deviations from the mean).

Figure 1 – Germany 7/8/2009 – 9/6/2009 (green left of black line) and 9/6/2009 – 11/5/2009 (green right of black line) compared with Egypt (red)


Figure 2 – Sweden 10/4/2010 – 12/3/2010 (green left of black line) and 12/3/2010 – 2/1/2011 (green right of black line) compared with Egypt (red)

He further analyzes data from the 60 days preceding the ouster of former Ukrainian president Viktor Yanukovych and the 60 days after that, and 120-day period from 1999 in Turkey, or another query with post-peak events in Turkey and 120 days from 2007 in Libya.

Kalev's takeaways:
"While it is unlikely that one would build a true political risk forecasting system on an approach this simple, it does suggest that world history, at least the view of it we see through the news media, is highly cyclic and predictable, and that there is much yet to be discovered. Will these patterns hold for every country and time period and is there a certain rolling window size that works better or worse? Does a different time interval or switching to a different set of event types improve or degrade accuracy? Does it work better just before a conflict or only in its first few days? Let your creativity run wild and let us know what you find!"
Gigaom wrote a blog post based on Kalev's analysis:
"This analysis of modern history is a prime example of why big data really matters"
http://gigaom.com/2014/08/13/this-analysis-of-modern-history-is-a-prime-example-of-why-big-data-really-matters/

The author of this post, Derrick Harris concludes:
"The real value of cloud computing is in putting all this data in a centralized place with centralized computing resources so researchers aren’t on the hook for somehow downloading it, storing it and having enough computers to analyze it. It might be there’s nothing of value to be gleaned from Leetaru’s analysis of modern history, or might be there’s a nugget of immense value buried a few layers below the surface. But if we really want to find answers to tough problems, we owe it to ourselves to examine every signal. Done right, big data provides a lot of them."
===========================================================

There are more analysis on people data of GDELT by Kalev (not sure if he used BigQuery for this one), which he put together on this article:

The Tehran Connection
"Iran's nuclear program has been one of the hottest topics in foreign policy for years, and attention has only intensified over the past few days, as an interim agreement was reached in Geneva to limit enrichment activity in pursuit of a more comprehensive deal. The details of the deal itself are of course interesting, but in aggregate the news stories about Iran can tell us far more than we can learn simply by reading each story on its own. By using big data analytics of the world's media coverage, combined with network visualization techniques, we can study the web of relationships around any given storyline -- whether it focuses on an individual, a topic, or even an entire country. Using these powerful techniques, we can move beyond specifics to patterns -- and the patterns tell us that our understanding of Iran is both sharp and sharply limited."
You can see actual visualization on the website here:
http://kalevleetaru.com/dataanddiplomacy/network-iran15.html

"In the diagram below, every global English-language news article monitored by the GDELT Global Knowledge Graph -- a massive compilation of the world's people, organizations, locations, themes, emotions, and events -- has been analyzed to identify all people mentioned in articles referencing any location in Iran between April and October 2013. A list was compiled of every person mentioned in each article, and all names mentioned in an article together were connected. The end result was a network diagram of all of the people mentioned in the global news coverage of Iran over the last seven months and who has appeared with whom in that coverage. 
This network diagram was then visualized using a technique that groups individuals who are more closely connected with each other, placing them physically more proximate in the diagram, while placing individuals with fewer connections farther apart. Then, using an approach known as community finding, clusters of people who are more closely connected with each other than with the rest of the network were drawn in the same color. The specific color assigned to each group is not meaningful, only that people drawn in the same color are more closely connected to one another. Together, these two approaches make the overall macro-level structure of the network instantly clear, teasing apart the clusters and connections among the newsmakers defining Iranian news coverage."


"Because most names in the news occur in just a handful of articles, the visual above shows the result of filtering the network to show only those names that occurred in 15 or more articles. This eliminates the vast majority of names, while preserving names that are more likely to be directly related to Iranian affairs and still capturing a broad swath of the discourse around Iran. The purple cluster is largely the United States and its allies, with Barack Obama right in the center, while the dark blue node towards of the lower center of the entire network is Edward Snowden, capturing the way in which he has become one of the most prominent figures in discussion of U.S. foreign policy. This is a fascinating finding: While Snowden obviously has no part in the Iranian-U.S. nuclear talks, his outsized role in the global conversation about U.S. foreign policy has made him part of the context in which those talks are discussed. In particular, there has been substantial media coverage connecting the approaches Snowden used to defeat the NSA's internal security procedures with some of those used by the United States in its attempts to sabotage Iran's nuclear efforts. The media has also used the materials Snowden has released to reconstruct how U.S. spy agencies may have been involved in the Stuxnet attack on Iran."


"The blue-green cluster in the bottom right largely consists of Israeli reporters and commentators, while the light blue cluster at top left consists of international reporters. The yellow cluster along the left side of the graph is where all of the Iranian names appear, with key figures like Hassan Rouhani, Ali Khamenei, Mohammad Javad Zarif, and Mahmoud Ahmadinejad all playing prominent roles in bridging Iran to the other clusters. Iranian politicians like Esfandiar Rahim Mashaei, Mohammad-Reza Aref, and Gholam Ali Haddad-Adel play central roles internally to the cluster, representing their important roles within Iran, but their limited engagement and contextualization over the last several months with the rest of the world.
The fact that this network accurately distinguishes internal and external leaders is a critical finding. Such resolving power means that this approach of externally mapping the newsmaker network around a country using public news coverage is sufficiently accurate to capture the nuance between newsmakers who operate largely within a country and those who have a more external role, and the external newsmakers with whom they are most closely connected. That such a news-based network would be capable of perceiving such nuanced detail suggests this approach may have powerful applications for mapping the internal structure of countries and organizations that receive considerable media coverage, but for which policymakers lack the detailed leadership diagrams compiled for higher-profile subjects like Iran.
The visual also makes it clear that the discourse around Iran does not focus on Iran itself or its internal politics, but rather on its nuclear ambitions and how they fits into the rest of the world. In particular, there is a strong Western-centric narrative to the English-language coverage around Iran, emphasizing U.S. interests, with Iranian leaders mentioned only in passing as they relate to those interests. In other words, news coverage across the world focuses on what the United States wants from Iran and what Iran needs to do to satisfy those demands, rather than the Iranian perspective on its role in the world. This is a key finding, as it reflects Iran's intense marginalization over its nuclear program and is in contrast to other nations like Egypt."
Another visualization of people with the theme:
"Who is mentioned in the news in reference to Nigeria and corruption?"
http://zoom.it/9BVp


I don't have any knowledge to comment on this one... any Nigerian people able to comment??


Disclaimer: The opinions expressed here are my own, and do not reflect those of my employer. -Fumi Yamazaki