結構化資料是什麼? 數據分析前,重新認識你的資料!

If you can’t measure ityou can’t manage it. – 彼得.杜拉克.

如果你無法衡量,就無法管理;如果你無法管理,就無法進步。因此做數據分析的第一步是要了解資料與蒐集資料了,此篇文章將會帶你們了解結構化資料、半結構化資料與非結構化資料。

 

快速導覽
一、結構化資料 Sturctured Data
二、半結構化資料 Semi-Structured Data
三、非結構化資料 Unstructured Data
四、結語

 

一. 結構化資料 Sturctured Data

假設有一間飲料店,點餐POS機已經與後台的資料庫做連線,而資料庫已經預先定義了每一筆資料要有訂單編號、品項、單價、購買數量與總金額的欄位,接著一筆一筆的訂單就會進到資料庫了,如下面這張表。

所以簡單來說,結構化資料就是如下圖已經整理好的資料表格,隨時可以拿來做數據分析。結構化資料是指儲存在關聯式資料庫(MySQL, Oracle等…)的資料,需要先定義欄位,才能夠儲存資料。

有一些人也會使用Excel 表格來蒐集資料,並事先定義每一個欄位能夠輸入什麼資料,這樣也屬於結構化資料。

結構化資料的優點是查詢資料快速,使用的存儲空間少;缺點是拓展新的欄位比較麻煩,在資料交換上的規定也比較嚴格。

 

我們可以一句話來定義結構化資料,先有結構,再有資料

訂單編號 品項 單價 購買數量 總金額
A001 紅茶 30 5 150
A001 綠茶 30 2 60
A001 奶茶 60 3 180
A002 紅茶 30 1 30
A002 綠茶 30 3 90
A002 奶茶 60 2 120
A003 紅茶 30 4 120

 


二. 半結構化資料 Semi-Structured Data

半結構化資料如CSV、JSON與XML,皆便於資料交換,其特性同時具備欄位概念與欄位可拓展性,因此半結構化資料除了可以透過欄位來查詢到資料外,還能根據使用者的需求來增減欄位。

台灣的政府資料開放平台就經常提供CSV、JSON與XML的資料。從以上的範例拿出兩列來舉例,假設要在第一筆訂單加入糖度欄位,第二筆訂單加入冰塊的欄位,使用半結構化資料就相當的方便,不需要在兩筆都補上糖度與冰塊的欄位,如以下XML與JSON的範例。

我們可以一句話來定義結構化資料,先有資料,再有結構

XML範例

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <row>
    <order>A001</order>
    <item>紅茶</item>
    <sugar_level>25%</sugar_level>  <!-- 只增加糖度欄位 -->
    <unit_price>30</unit_price>
    <quantity>5</quantity>
    <total_amount>150</total_amount>
  </row>
  <row>
    <order>A002</order>
    <item>綠茶</item>
    <ice_level>25%</ice_level>  <!-- 只增加冰塊欄位 -->
    <unit_price>30</unit_price>
    <quantity>3</quantity>
    <total_amount>90</total_amount>
  </row>
</root>

JSON範例

{
  "table": {
    "row": [
      {
        "order": "A001",
        "item": "紅茶",
        "sugar_level": "25%",  <-- 只增加糖度欄位
        "unit_price": "30",
        "quantity": "5",
        "total_amount": "150"
      },
      {
        "order": "A002",
        "item": "綠茶",
        "ice_level": "25%",  <-- 只增加冰塊欄位
        "unit_price": "30",
        "quantity": "3",
        "total_amount": "90"
      }
    ]
  }
}


三. 非結構化資料 Unstructured Data

非結構化資料指的是未經整理過的資料,也就是資料的本質。常見的文字、圖片、音樂、影片、PDF、網頁等…,都屬於非結構化資料。

要有結構化資料才能做數據分析,最近很夯的網頁爬蟲程式就是將網頁上的非結構化資料爬取下來,並整理成結構化資料,以進行後續的數據分析應用。如以下範例,從批踢踢上將發文日期、作者、主題與回復人數爬取下來,並整理成結構化資料。

非結構化資料

 


四. 結語

從以上的介紹大家可以了解到非結構化資料就是資料的本質,再根據需求從非結構化資料擷取出結構化資料,以進行後續的數據分析應用,而半結構化資料大多是用於資料交換。

結構化資料

Facebook Comments

36 Replies to “結構化資料是什麼? 數據分析前,重新認識你的資料!

  1. Thanks for the good writeup. It if truth
    be told was once a enjoyment account it. Glance complex to more delivered agreeable from you!
    However, how could we be in contact?

  2. Hello there! I simply wish to give you a huge thumbs up for your excellent information you have
    right here on this post. I’ll be returning to your website for more soon.

  3. When someone writes an paragraph he/she keeps the plan of a user in his/her brain that how a user
    can understand it. Therefore that’s why this post is amazing.

    Thanks!

  4. you are actually a good webmaster. The web site loading velocity is amazing.
    It sort of feels that you’re doing any distinctive trick.
    Also, The contents are masterwork. you have performed
    a wonderful process on this topic!

  5. I do agree with all of the concepts you’ve presented for your post.
    They are really convincing and can definitely work.

    Nonetheless, the posts are very short for novices.
    May you please lengthen them a little from next time? Thank you for the post.

  6. With havin so much content and articles do you ever run into
    any issues of plagorism or copyright violation? My blog has a lot of unique content I’ve either written myself or outsourced but it appears a lot of it is popping it up all
    over the web without my agreement. Do you know any solutions to help protect against content from being stolen? I’d truly appreciate it.

  7. I’m not positive the place you’re getting your information, however great topic.
    I must spend some time studying more or working out more.
    Thanks for fantastic information I used to be in search of this info for my mission.

  8. Thank you, I’ve just been looking for info about this
    subject for a long time and yours is the best
    I have discovered so far. However, what concerning the conclusion?
    Are you positive in regards to the source?

  9. Good day! This is my first visit to your blog!
    We are a team of volunteers and starting a new project in a community
    in the same niche. Your blog provided us useful information to
    work on. You have done a wonderful job!

  10. naturally like your website but you have to take a look at the
    spelling on several of your posts. A number of them are rife with spelling issues and I
    in finding it very troublesome to inform the truth nevertheless I’ll certainly
    come back again.

  11. Hey would you mind letting me know which hosting company you’re working with?

    I’ve loaded your blog in 3 different internet browsers and I must say this blog loads
    a lot faster then most. Can you recommend a good hosting provider at a honest
    price? Kudos, I appreciate it!

  12. I know this if off topic but I’m looking into starting my own weblog and was
    curious what all is needed to get setup? I’m assuming having a blog like
    yours would cost a pretty penny? I’m not very web savvy so I’m not 100% positive.

    Any tips or advice would be greatly appreciated. Thank you

  13. Thanks for a marvelous posting! I actually
    enjoyed reading it, you could be a great author. I will make certain to bookmark your blog and may come
    back at some point. I want to encourage continue your great posts, have a nice afternoon!

  14. May I simply just say what a relief to find an individual who truly understands what they’re discussing on the internet.
    You definitely know how to bring an issue to light and make
    it important. A lot more people should check this out and understand this side of the story.
    It’s surprising you are not more popular since you certainly have the gift.

  15. After exploring a number of the blog posts on your web page, I really
    appreciate your way of blogging. I saved it to my bookmark
    site list and will be checking back soon. Please visit my website as well and let me know what
    you think.

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。