Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change labels to be just user defined tags #37

Closed
jordan2175 opened this issue Oct 25, 2017 · 31 comments
Closed

Change labels to be just user defined tags #37

jordan2175 opened this issue Oct 25, 2017 · 31 comments

Comments

@jordan2175
Copy link

jordan2175 commented Oct 25, 2017

During 2.0 development there was a view that we should just use labels to track the object classification data, like indicator type, report type, malware type, etc.. On the surface and at the time, this seemed like a good thing to do.

The problem I am seeing now is that I have no way of distinguishing in an automated way the content in the labels property. Meaning, are the values extra entries to the open-vocab or are they just extra user-defined tags? As such I have no way to pivot off this data because I do not know what type of data it represents.

I would propose for those few objects that we either move the object type classification out of labels, or that we make a new property called tags and change the text to say user defined "tagging" goes in tags and the object classification/type information goes in labels.

@skelley1
Copy link

I would agree with this.

@johnwunder
Copy link

johnwunder commented Oct 25, 2017

I hear this problem, but I think we should wait to consider a solution until we figure out how the "assertion" stuff that Jason proposed in #22 is going to work. It's possible that we can leave labels as is and move the classification/type stuff to that. I think once we figure out #22, the answer to this becomes clear.

@JasonKeirstead
Copy link

I agree with this as well - and I don't think assertion solves this problem at all. What needs to be done is the pattern needs to be copied... IE we should be making a new property in these cases and put the vocab there.

This would not be a breaking change

@adulau
Copy link

adulau commented Oct 25, 2017

Maybe it's the time to finally use the MISP machine-tags HTML in STIX as proposed in the F2F meeting in 2016.

The machine-tags format is now largely used in various tool including MISP. The format to describe the taxonomies and machine-tags is also an Internet-Draft.

@JasonKeirstead
Copy link

I don't really grock the machine tags approach. Why wouldn't you just use a dictionary?

Anyway I think that's kind of the opposite direction we need to go to solve this problem... We need less generic not more... IE the type of malware should be a top level property, not a tag.

@allant0
Copy link

allant0 commented Jan 23, 2018

It seems that a dictionary approach would be simple and avoids what machine-tags appears to conflate when it includes admiralty ratings, TLP levels and categorization in the same scheme. I think to achieve a good definition you have to start with a clear semantic definition of what the property conveys.
I believe the intent is to convey
"A categorization of the nature of the threat by the creator of the threat intelligence and the specific threat intelligence object. It does not represent the reliability of this information (as this is captured by a separate property) nor the markings associated with protecting the threat intelligence object (as this is captured by marking-definitions in STIX).

or similar.

@jordan2175
Copy link
Author

jordan2175 commented Jan 23, 2018

We discussed this on the working call on 2018-01-23 and while Rich S is unsure about it, no one objected to fixing this in STIX 2.1. JMG was originally unsure, but then after hearing everyone's feedback, he also agreed with Sarah and Jason that this needs to be fixed / changed. The following people were on the working call: Bret Jordan, Trey Darley, John-Mark Gurney, Sarah Kelley, Chris Ricard, Dave Lemire, Jason Keirstead, Nicholas Hayden, Richard Struse, Sunil Ravipati.

@jordan2175
Copy link
Author

jordan2175 commented Feb 13, 2018

We talked about this on the working call on 2018-02-13. The consensus on the call was that labels will be made optional and be used for user defined tags. We will create a new optional list property for the open-vocab / controlled-vocab "type" data. This will be changed on all objects where there is a defined vocab in the labels property.

Those that voted for this are: Bret Jordan, Paul Patrick, Sean Barnum, Chris Ricard, Sarah Kelley, Allan Thomson, Gary Katz

Abstain: Rich Struse. Trey Darley

There were 12 people on the call.

@treyka
Copy link

treyka commented Mar 5, 2018

The call for objections ended COB 02 March with no objections received to the TC mailing list. As this change represents a bit more editorial work than @skelley1 and I can accommodate at this time, we are just noting the TC's consensus. This issue can be closed once the editorial work is done.

@treyka treyka added Severity: Breaking For forward or backwards breaking changes and removed Severity: Breaking labels Mar 7, 2018
@johnwunder
Copy link

@jordan2175 will suggest these changes in the text.

@johnwunder
Copy link

Here are suggested property names from @JasonKeirstead. Is everybody OK with this? Please +1 or suggest something different.

Identity: roles
Indicator: indicator_types
Malware: malware_types
Report: subjects
Threat Actor: threat_actor_types
Tool: tool_types

@sbarnum
Copy link

sbarnum commented Apr 6, 2018

I am fine with indicator_types, malware_types, threat_actor_types, tool_types.

I am also fine with 'roles' for Identity. We model this with its own Role object so short of doing that at this point in STIX, naming the field 'roles' should work well for us.

My one objection would be for Report. I would suggest 'report_type' as that is really what the vocab is conveying. Any one of those report types in the vocab could have a range of subjects. 'subject' is a little more specific than what we are conveying with that vocab.

@jordan2175
Copy link
Author

I just really dislike the "_type" syntax given we already have a "type" property.

@ikiril01
Copy link

ikiril01 commented Apr 6, 2018

For Indicators, maybe something like indicator_classes would be more appropriate? The current vocabulary really seems to be a set of classes (e.g., malicious-activity) rather than types (which would be something like artifact-based, behavioral-based, etc.).

@allant0
Copy link

allant0 commented Apr 6, 2018

Types is not a good idea. The information being conveyed in these 'types' is not actually a type. Its a characteristic or descriptor. If I relate this to an object modelling exercise can an attribute (e.g. malware) be represented by a inheritance model or a reference model

i.e.

base-class-malware
-> windows-malware
-> windows-conficker-malware
-> windows-zeus-malware
-> mac-malware
-> mac-variant-malware
-> platform-less-malware
-> javascript-malware

vs

malware has
-> is-windows
-> is conficker

malware has
-> is-windows
-> is-platform-less
-> javascript-malware

The problem with types is that is requires that you model everything using single inheritance whereas most of our world is actually multi-reference data and therefore its not a type its an descriptor of the data.

Therefore my suggestion

malware_descriptor
actor_descriptor
indicator_descriptor
...etc

where those properties can be multi-valued as necessary.

@JasonKeirstead
Copy link

JasonKeirstead commented Apr 6, 2018

:/

I think it is silliness that in the description of the property we are explicitly saying "The type of malware being described." but we are afraid to call the field "malware_type".

If we want to call it "malware_descriptor" then I guess we should change the description of the field to say "A descriptor of malware" (which really sounds weird doesn't it).

Copy/paste above comment for all the other fields which have similar normative text.

@JasonKeirstead
Copy link

JasonKeirstead commented Apr 6, 2018 via email

@sbarnum
Copy link

sbarnum commented Apr 11, 2018

I would disagree with using 'descriptor'.
Every property of each of those objects is a 'descriptor'.
That term is far too general to characterize what we are talking about on these.

I would agree with Jason here. Over-interpreting the term 'type' to have some sort of coding or inheritence meaning is a mistake.
We should be naming in a way that is consistent, correct and coherent but should be as close as possible to the simple semantic language people would use to specify what is being characterized. In this case, we ARE talking about 'malware_type'. It is the type of malware that the Malware object is characterizing. Same goes for ThreatActor and Tool.
If we split hairs and name it something else that people look at and go 'oh, you mean malware type' then we named it wrong.
I have no objection to Ivan's suggestion of 'indicator_classes'. I think that one works either way.

@johnwunder
Copy link

Options on 24 April Working Call:

  • *_Types: 5 (indicator_types, malware_types, threat_actor_types, etc.)
  • Labels+Tags: 4 (labels is for machine-parsable type values from the vocabulary, tags is for user-defined values)
  • Abstain: 5

@johnwunder johnwunder added this to the STIX 2.1 CSD01 milestone Apr 25, 2018
@johnwunder johnwunder added this to Under Discussion in STIX 2.1 Apr 25, 2018
@cricard
Copy link

cricard commented Apr 27, 2018

Add my vote to *_types

@allant0
Copy link

allant0 commented May 2, 2018

Provided *types (or whatever you like to call it -> descriptor) allows multiple values that can support things that are somewhat orthogonal then I'm fine with *types.

For example:

malware_type: { Spyware, SocialNetworking, EnergySector, FinancialSector, HighPriority }

A malware instance that is spyware focused on social networks that is applicable to energysector and financialsector environments that is high priority for the specific sharing community this intelligence was created in.

If this is not possible using a single type then please suggest alternate approach.

@jordan2175
Copy link
Author

The goal of splitting these two apart is that the malware type would be in one property and the user defined tags would be in another property. But from the looks of it, you are thinking you want some sort of binding between user defined tags and the types?

@johnwunder
Copy link

These new properties would be a list with all of the proposals.

@allant0
Copy link

allant0 commented May 3, 2018

Bret - The issue is that what is one org's definition of user defined is another's definition of machine assigned tags. The problem is that tagging or descriptive terms used to describe an entity (e.g. malware) can be both generated by machines and also enriched by humans. Therefore the concept of something being purely user defined vs purely machine is flawed. Our systems do both where machine assigned tags occur on data but those machine generated tags can be added to by humans.

So I know its not perfect but i come to the conclusion that the general concept of 'tagging' or 'type identification' is really just a list of descriptive terms used to describe the entity and those terms used to describe the thing might represent different aspects of describing the thing in more comprehensive ways.

That said, if we there is a strict definition of what a type attribute represents that is clean and avoid ambiguity then i might agree that we can have multiple attributes. But I'm not sure that this is the case.

@JasonKeirstead
Copy link

@allant0 Tags are not always types / descriptors. They can be anything that is categorizing or aiding a search. Brett often uses examples of "red, green, blue" for tags, and IMO based on what I have seen clients doing it is actually not that far off of reality in terms of how abstract the different ways are that people may want to tag things. Maybe I want to tag it with the department of the user who found the malware, maybe I want to tag it with if it is relevant to some compliance regime (via a boolean tag), maybe I want to tag it based on moon cycles... could be anything...

Thats really the whole reason to split these things apart... so that we have one field with an open vocab that refers to the actual type of that thing, and a different field for user-defined tags (note, user-defined is not the same as "human defined"...)

@emmanvg
Copy link

emmanvg commented May 7, 2018

I disagree with both approaches (*_types and labels+tags) because the underlining functionality stays the same. If we change labels to *_type approach we are just changing the name of the property, but the functionality would still remain the same as labels. For labels+tags I don't think separating it into two different properties will fix the problem. We would still have the chance of introducing terms that are outside of the vocab into the property. I think the way labels was originally defined is ok.

@jordan2175
Copy link
Author

@emmanvg I think you might be missing the fundamental problem. Having one property represent two distinct types of data effectively means that you can not perform automated action (which is the whole point of STIX). The data represents semantically distinct data, and it was a major mistake to lump them both together. I think the community has consensus on splitting them apart. What the community needs to decide is what are we going to call them.

@allant0
Copy link

allant0 commented May 7, 2018

@JasonKeirstead i agree with you. I tried to explain that its more than types/descriptors and just categorization such as the examples you provided. But many times they are combined and interchangable. I dont believe we will have consistent distinction between a user tag vs a type in products without a higher level body defining those labels and that is out of scope of this specification.

@johnwunder johnwunder moved this from Under Discussion to Has Consensus (Needs Specification Text) in STIX 2.1 Jul 18, 2018
@johnwunder johnwunder moved this from Has Consensus (Needs Specification Text) to Done in STIX 2.1 Jul 18, 2018
@StephenOTT
Copy link
Member

I am dealing with this issue at the moment with:
https://github.com/StephenOTT/charon-stix/tree/master/src/main/java/io/digitalstate/stix/vocabularies

I have tried with the following ideas

  • terms
  • keywords
  • tags

The key differentiator I am working from is that there are free form and reserved/managed vocabularies.

Free form tends to be "keywords" or "tags".

And managed lists are categorized: roles, levels, types, category, and even label. The point being that managed vocabularies are designed up front for sorting and freeform are more organic usage.

+1 on not using _types as it starts to get confusing in implementations with setters and getters.

Is there concern with using something like Malware category or class/classification, and threat actor class or classification. Malware is a categorization of the of malware ("typing the malware") and a threat actor class is extremely similar to the identity class hence the reuse.

@StephenOTT
Copy link
Member

Would like to note a usage i noticed today:

#120

<Hashes> implements a requirement that user provided / non-hash-algro OV values be prefixed with x_. This seems like a pretty easy thing to implement and has easy backwards compat + it supports the base requirement of being able to tell what are user provided labels vs labels that are part of a OV. I raise this as a option that is a bit of least resistance vs renaming attributes and/or creating new attributes.

STIX 2.1 automation moved this from Done to In Specification (Needs Implementations and Interop) Feb 27, 2019
@jordan2175
Copy link
Author

This was done a while back, just not closed.

@jordan2175 jordan2175 moved this from In Specification (Needs Implementations and Interop) to Done in STIX 2.1 Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
STIX 2.1
  
Done
Development

No branches or pull requests